MD5 vs SHA-256: Choosing the Right Hash Function for Your Use Case
When MD5 is still acceptable, when SHA-256 is the minimum bar, and why using the wrong hash function for the wrong job has caused some of the largest data breaches in history.
What a hash function does and why the algorithm matters
A cryptographic hash function takes an input of any size and produces a fixed-length output — called a digest, hash, or checksum — that has three essential properties. First, it is deterministic: the same input always produces the same output. Second, it is one-way: given the hash, you cannot feasibly reconstruct the original input. Third, it is collision-resistant: it should be computationally infeasible to find two different inputs that produce the same hash.
Hash functions are used everywhere in software: verifying file integrity, storing passwords, signing data, generating ETags for HTTP caching, deduplicating data in storage systems, and creating content-addressed identifiers. The algorithm you choose matters because these three properties hold to different degrees for different algorithms — and MD5 and SHA-1, which are still widely deployed, have broken collision resistance. That single fact changes their appropriate use cases entirely.
MD5: what it is and what broke
MD5 (Message Digest 5) was designed by Ron Rivest in 1991. It produces a 128-bit (16-byte) digest, typically displayed as a 32-character hexadecimal string. For a decade, it was the standard hash function for everything from password storage to digital signatures.
In 2004, researchers demonstrated practical MD5 collision attacks — meaning they could generate two different inputs that produce the same MD5 hash. By 2008, researchers used MD5 collision attacks to forge a rogue SSL certificate that browsers trusted, allowing a fake certification authority to issue certificates for any domain. This is not a theoretical weakness: it is a broken property that invalidates any use of MD5 that depends on collision resistance.
Speed is MD5's other notable property — and it is a liability for security uses. MD5 is extraordinarily fast, which is good for checksumming large files, but catastrophic for password hashing. A modern GPU can compute billions of MD5 hashes per second, which makes brute-force dictionary attacks against MD5-hashed passwords practical in hours. LinkedIn's 2012 breach exposed 117 million MD5-hashed passwords (without salt), most of which were cracked within days.
SHA-256: the current baseline for security-sensitive uses
SHA-256 is part of the SHA-2 family, standardised by NIST in 2001. It produces a 256-bit (32-byte) digest, displayed as a 64-character hex string. No practical collision attack against SHA-256 has been demonstrated, and the computational cost of a brute-force preimage attack against a properly salted SHA-256 hash is far beyond current hardware capabilities.
SHA-256 is the standard algorithm for TLS certificate signatures (TLS 1.3), HMAC authentication in API security, code signing, blockchain proof-of-work, JWT signatures (HS256, RS256), and file integrity verification where security matters. If you are choosing a hash algorithm for a new system and do not have a specific reason to use something else, SHA-256 is the correct default.
Quick comparison
| Property | MD5 | SHA-256 |
|---|---|---|
| Output size | 128-bit (32 hex chars) | 256-bit (64 hex chars) |
| Collision attacks | Practical (broken) | None known |
| Speed | Very fast | Fast (slower than MD5) |
| Password storage | Never use | Acceptable (bcrypt/Argon2 better) |
| File integrity (non-security) | Acceptable | Preferred |
| TLS / code signing | Deprecated | Standard |
| HMAC authentication | Avoid | Standard (HS256) |
When MD5 is still acceptable
MD5's collision weakness only matters when an attacker can craft inputs to produce a specific hash. In non-adversarial contexts — checksumming files for accidental corruption detection, deduplicating identical files in a personal backup system, generating cache keys where collision resistance is not a security property — MD5 remains fast and sufficient.
The classic acceptable use is checking download integrity in a trusted context where the MD5 is published on the same server as the file. If the server is compromised, the attacker can replace both the file and the MD5 — so the MD5 provides no security guarantee anyway. It is only useful for detecting accidental corruption in transit, which MD5 does fine.
Where MD5 is definitively not acceptable: password storage (ever), digital signatures, TLS certificates, any context where a malicious actor controls one of the inputs being hashed, or any context where the hash is used to make a security decision.
Why SHA-256 is not right for password storage either
SHA-256 is not broken, but it is still the wrong choice for password hashing. The reason is speed. SHA-256 is designed to be fast — a single SHA-256 computation takes microseconds. This is desirable for file checksumming, but it means an attacker with a GPU can still test billions of password candidates per second against a stolen SHA-256 hash database.
For password storage, use bcrypt, scrypt, orArgon2 — algorithms specifically designed to be slow and memory-intensive. Argon2 won the Password Hashing Competition in 2015 and is the current recommendation. These algorithms have a cost factor you can tune over time as hardware gets faster, ensuring that even as GPUs improve, the per-hash cost remains high enough to make brute-force attacks impractical.
Password hashing in practice
// Node.js — bcrypt import bcrypt from 'bcrypt'; // Hash (slow by design — ~100ms at cost factor 12) const hash = await bcrypt.hash(plainPassword, 12); // Verify const valid = await bcrypt.compare(inputPassword, hash); // Python — argon2 from argon2 import PasswordHasher ph = PasswordHasher() hash = ph.hash(plain_password) # includes salt automatically ph.verify(hash, input_password) # raises VerifyMismatchError if wrong
SHA-3 and beyond: what to know
SHA-3 (Keccak) was standardised by NIST in 2015 as a backup to SHA-2. It uses a completely different internal structure (a sponge construction rather than a Merkle–Damgård construction), which means that even if a theoretical weakness were found in SHA-2, SHA-3 would be unaffected. SHA-256 is not broken and SHA-3 is not a replacement — it is an alternative for high-security applications or environments that require algorithm diversity.
For practical purposes: SHA-256 is the correct choice for new systems unless you have a specific reason to choose otherwise. BLAKE2 and BLAKE3 are faster alternatives to SHA-256 that maintain strong security properties — useful for high-throughput checksumming or content-addressed storage systems where hashing performance is a bottleneck.
Practical decision guide
Passwords: Argon2id, bcrypt (cost ≥ 12), or scrypt. Never SHA-256, never MD5.
File integrity (security context): SHA-256 minimum. SHA-512 for extra margin. Publish the hash out-of-band from the file download.
File integrity (non-security, e.g. dedup or cache keys): MD5 is acceptable. BLAKE3 is faster and stronger if you are checksumming at scale.
HMAC (API authentication, message signing): HMAC-SHA256. This is HS256 in JWT terms. Never HMAC-MD5 for new systems.
TLS and certificate signing: SHA-256. SHA-1 certificates are deprecated and rejected by modern browsers.
Content-addressed storage / Git-style identifiers: SHA-256 (Git migrated from SHA-1 to SHA-256 precisely because of collision attacks).