Hash Functions Explained: MD5, SHA-1, SHA-256 Comparison
A hash function takes input of any length and produces a fixed-size string of bytes — the digest — that acts as a fingerprint of the input. Change a single bit of the input, and the digest changes completely. Hash functions are foundational to nearly every modern computer system: Git stores objects by their SHA-1, package managers verify downloads by SHA-256, blockchains chain blocks by SHA-256, and operating systems detect file corruption by checksums. Understanding which hash to use for which job, and which are no longer safe for which purpose, is a working necessity.
This article compares MD5, SHA-1, and SHA-256 on the dimensions that actually matter: output size, collision resistance, performance, and current standing for security versus integrity use cases.
The Three Properties of a Good Hash
A cryptographic hash function must satisfy three properties:
- Preimage resistance: given a digest, it must be infeasible to find any input that produces that digest.
- Second preimage resistance: given an input and its digest, it must be infeasible to find a different input with the same digest.
- Collision resistance: it must be infeasible to find any two inputs producing the same digest.
Of the three, collision resistance is the weakest property because of the birthday paradox: for a digest of N bits, finding a collision requires roughly 2^(N/2) operations, not 2^N. So a 128-bit hash provides only 64 bits of collision resistance — not enough against modern attackers. The other two properties degrade more slowly.
MD5: Fast, Broken for Security, Still Useful
MD5 was designed by Ron Rivest in 1992. It produces a 128-bit (32-character hexadecimal) digest. It was the workhorse of integrity checking for years and is still widely seen in download checksums, log fingerprints, and cache keys. But MD5's collision resistance was broken in 2004 — researchers can now produce two different files with identical MD5 hashes in seconds on a laptop. The famous Flame malware in 2012 used an MD5 collision to forge a Microsoft signing certificate.
This does not mean MD5 is useless. It means MD5 must never be used where collision resistance is a security property. Acceptable uses today:
- File integrity in a non-adversarial setting: detecting accidental corruption in storage or transmission. The chance of an accidental MD5 collision is still negligible.
- Hash table keys: distributing data across buckets where adversarial input is not a concern.
- Cache keys, content addressing in trusted contexts: CDN edge caches, deduplication of files you control.
Unacceptable uses: signing, certificate fingerprints, password hashing, anything where an attacker could craft an input. For those, use SHA-256 via a SHA-256 generator or higher. A quick reference computation with an MD5 generator is fine for legacy compatibility.
SHA-1: Phasing Out
SHA-1 produces a 160-bit (40-character hex) digest. Designed by the NSA and published by NIST in 1995, it dominated the 2000s. Theoretical attacks against SHA-1 appeared in 2005; the first real-world collision (Google's "SHAttered" attack producing two PDFs with the same SHA-1) was demonstrated in 2017 at a cost of around $110,000 in cloud compute. By 2020 the cost was a fraction of that.
All major browser vendors stopped trusting SHA-1 in TLS certificates by 2017. Git still uses SHA-1 for object hashing but is in the process of migrating to SHA-256, and the SHAttered attack does not directly break Git because of Git's prefix counter and the difficulty of producing meaningful colliding commits. Nonetheless, treat SHA-1 the way you treat MD5: fine for non-adversarial fingerprinting, never for signatures or certificates.
SHA-256: The Modern Default
SHA-256 is part of the SHA-2 family (introduced 2001), and it produces a 256-bit (64-character hex) digest. No real-world attacks on its collision resistance exist after twenty-plus years of scrutiny — the best known attack remains a generic 2^128 birthday search, which is computationally infeasible.
SHA-256 is the recommended default for any new system needing a cryptographic hash. It is used by Bitcoin, by certificate authorities for TLS, by Linux package managers, and by modern Git repositories. SHA-512 (also SHA-2) and SHA-3 (a different design family from 2015) are both fine alternatives, but SHA-256 has the widest hardware acceleration and library support.
| Hash | Output size | Year | Collision status | Use today |
|---|---|---|---|---|
| MD5 | 128 bits | 1992 | Broken (seconds) | Non-security integrity only |
| SHA-1 | 160 bits | 1995 | Broken ($100k+) | Legacy compatibility only |
| SHA-256 | 256 bits | 2001 | No practical attack | Default for new systems |
| SHA-512 | 512 bits | 2001 | No practical attack | Faster than SHA-256 on 64-bit CPUs |
| SHA-3 | Variable | 2015 | No practical attack | Alternative design family |
Passwords Are a Special Case
Do not use SHA-256 directly for password hashing. General-purpose hashes are designed to be fast — the opposite of what you want for passwords. An attacker who steals a database of SHA-256 password hashes can compute billions of guesses per second on a GPU.
Password hashes need three properties general hashes lack: deliberate slowness (memory-hard or compute-hard), per-user salt, and tunable cost parameters. Use Argon2id (current best practice), bcrypt (still acceptable), or scrypt. These are sometimes called "key derivation functions" or "password hashing functions." See the password security guide for more.
Performance Notes
For pure throughput, MD5 is fastest, SHA-1 close behind, SHA-256 slower because of the larger state and round count. On modern x86 CPUs with the SHA Extensions instruction set (Intel SHA-NI, supported since Goldmont and Zen), SHA-256 narrows the gap dramatically and can match or beat SHA-1 in software. On 64-bit CPUs, SHA-512 is paradoxically often faster than SHA-256 because it processes data in 64-bit words.
For most applications the performance difference is irrelevant — hashing a few MB of data takes milliseconds regardless of the algorithm. The exceptions are streaming throughput-sensitive workloads (deduplication at PB scale, network packet hashing), where it pays to benchmark on your target hardware.
Frequently Asked Questions
Can I "decrypt" an MD5 hash?
No. Hashing is one-way by design — there is no inverse function. What rainbow-table sites do is look up the hash in a precomputed dictionary of common inputs. If your input was a common password or short string, it may appear in such a table. For unique random inputs (UUIDs, random keys), no lookup will work.
Is SHA-256 quantum-safe?
Grover's algorithm gives a quantum speedup for hash inversion, effectively halving the security level: SHA-256 would offer 128-bit preimage security against a sufficiently large quantum computer instead of 256-bit. 128-bit security is still considered safe. For very long-term archival you might prefer SHA-512, but for the foreseeable future SHA-256 is comfortable.
Should I use HMAC?
If you need a keyed hash (authentication, not just integrity), yes — use HMAC-SHA-256 rather than concatenating a key and hashing. HMAC has provable security properties that ad-hoc constructions do not. APIs that need request signatures should use HMAC, not a bare hash of the secret plus payload.