~/guides/-blog-base64-encoding-explained-

guides · Encoding

Base64 Encoding Explained: What Every Developer Should Know

Why Base64 exists, how the encoding works under the hood, when to use it, and the mistakes that trip up developers every week.

last updated · June 13, 2026by @vultio

Why Base64 was invented

Base64 was born out of a problem that is easy to forget in the age of Unicode and binary-aware protocols: the early internet ran on systems that could only reliably transmit 7-bit ASCII characters. SMTP, the protocol that carries email, was designed in the early 1980s when most text was plain English. Characters above decimal 127 — and some control characters below it — were either stripped, altered, or caused transfer agents to fail outright. Binary data like images, audio files, and executables was simply not safe to send.

MIME (Multipurpose Internet Mail Extensions), standardised in RFC 2045 in 1996, introduced Base64 as the canonical way to encode binary data so it survives text-only channels. The idea is simple: represent every possible byte value using only characters from a safe 64-character alphabet. Once data is Base64-encoded, any ASCII-safe transport can carry it without loss. The receiving end decodes it back to the original bytes. The problem of binary-hostile channels did not go away with email — it shows up today in JSON payloads, HTTP headers, environment variables, YAML configuration files, and anywhere else text is the only medium available.

How Base64 encoding actually works

Every byte of input is 8 bits. Base64 regroups those bits into 6-bit chunks. Because 2⁶ = 64, each 6-bit group maps to exactly one character in a 64-character alphabet. That alphabet is:

A–Z  (values  0–25)
a–z  (values 26–51)
0–9  (values 52–61)
+    (value  62)
/    (value  63)

Because 3 input bytes provide exactly 24 bits, and 24 bits divide evenly into four 6-bit groups, the natural block size is 3 bytes in → 4 characters out. That ratio is where the ~33% size increase comes from: every 3 bytes become 4 characters.

When the input length is not a multiple of 3, padding characters (=) are appended to bring the output to a multiple of 4 characters. One missing byte yields one =; two missing bytes yield two ==. Padding is a structural requirement of the standard Base64 format, though some variants (Base64url, for example) omit it for compactness.

Step-by-step example: encoding "Man"

The word "Man" is a classic textbook example because it encodes to exactly 4 characters with no padding (3 bytes, no remainder). Here is every step:

Character:   M          a          n
ASCII dec:   77         97         110
Binary:      01001101   01100001   01101110

Concatenated 24 bits:
  010011  010110  000101  101110

Split into four 6-bit groups:
  010011 = 19  →  T
  010110 = 22  →  W
  000101 =  5  →  F
  101110 = 46  →  u

Result: TWFu

No padding is needed because the input was exactly 3 bytes. If you encoded "Ma" (2 bytes), the output would be "TWE=" — one = to pad to 4 characters. Encoding "M" alone (1 byte) gives "TQ==" — two padding characters.

When to use Base64

Base64 is the right tool whenever you need to move binary data through a channel that only accepts printable text. The most common situations developers encounter:

Embedding images in HTML and CSS

Small icons and SVGs can be inlined as data URIs to eliminate HTTP requests:<img src="data:image/png;base64,iVBORw0KGgo..." />

JSON Web Tokens (JWTs)

A JWT is three Base64url-encoded segments joined by dots: header.payload.signature. The payload segment holds the claims (user ID, roles, expiry). Decoding the middle segment with any Base64 decoder instantly reveals what a token contains — which is exactly why JWTs are not suitable for storing sensitive data without additional encryption.

API tokens and credentials in headers

HTTP Basic Authentication encodes username:password as Base64:Authorization: Basic dXNlcjpwYXNzd29yZA==The server Base64-decodes the value before verifying. This provides zero confidentiality on its own — always pair it with HTTPS.

Binary data in JSON payloads

JSON has no native binary type. When an API response must include a file, a cryptographic signature, or raw bytes, Base64 is the conventional way to represent that data inside a JSON string field. Many cloud providers (AWS Lambda event bodies, Google Cloud Pub/Sub messages) use this pattern by default.

Secrets and certificates in environment variables

PEM certificates and RSA private keys contain newlines and special characters that break environment variable syntax in many shells and CI systems. Base64-encoding them produces a single-line string. Kubernetes secrets use Base64 for exactly this reason — every value in a Secret manifest is a Base64-encoded string.

Email attachments (MIME)

SMTP is text-based. When you attach a PDF to an email, your client Base64-encodes the file and embeds it inside a MIME multipart message. The recipient's client decodes it before presenting the attachment. This is the original use case Base64 was designed to solve.

When NOT to use Base64

Base64 solves one problem: representing binary data as text. It does not solve security, and it does not reduce size. Two mistakes appear constantly in production code:

Base64 is not encryption

Any developer — or attacker — can decode a Base64 string in under a second using a browser console, command line, or any of thousands of online tools. There is no key, no secret, no protection. Encoding a password or API key in Base64 and passing it around in a URL parameter or a log file provides no security benefit whatsoever. Use actual encryption (AES-GCM, ChaCha20) or hashing (bcrypt, Argon2) when protection is the goal.

Base64 is not compression

Encoding data as Base64 makes it larger, not smaller. Every 3 bytes of input become 4 bytes of output — a size increase of approximately 33%. For large files, images above a few kilobytes, or high-volume API traffic, this overhead is significant. Use gzip or Brotli if size is the concern; use a binary transport (multipart/form-data, gRPC, WebSocket binary frames) if both size and fidelity matter.

URL-safe Base64 (Base64url)

Standard Base64 uses + and / as its 62nd and 63rd characters. Both of these have special meaning in URLs: + is a space in query strings, and / is a path separator. Putting standard Base64 output directly into a URL parameter corrupts the value when the browser or server interprets those characters.

RFC 4648 defines the URL-safe variant, commonly called Base64url, which makes two substitutions:

Standard Base64  →  Base64url
+                →  -   (hyphen)
/                →  _   (underscore)
= (padding)      →  omitted (in most implementations)

JWTs always use Base64url. OAuth access tokens and PKCE code challenges use Base64url. Any time you see a token that contains hyphens and underscores but looks otherwise like Base64, it is almost certainly Base64url. When decoding such a token with a standard Base64 library, remember to swap the characters back or use a library that has a dedicated URL-safe mode — otherwise decoding will fail or produce wrong results.

Common mistakes that trip up developers

Double-encoding

This happens when a string that is already Base64-encoded gets encoded again — usually because one layer of a stack encodes before handing data to another layer that also encodes. The result looks like valid Base64 but decodes to another Base64 string rather than the original data. If you are decoding a token and get a result that still looks like Base64, run it through the decoder a second time. Then find where the double-encode is happening and fix the source.

Treating Base64 as a security measure

Described above, but worth repeating because it shows up in code reviews regularly. If a developer says "we Base64 the value before storing it," that is not a security control. It is a fire hazard — it creates a false sense of protection while adding zero actual resistance to inspection.

Forgetting or mishandling padding

Standard Base64 output length is always a multiple of 4, with = characters padding the end. Some libraries require correct padding to decode successfully; others are lenient and accept strings without it. Problems arise when:

You strip = characters when storing a token, then try to decode it with a strict library later.
You receive a Base64url string (no padding) and pass it to a standard decoder that expects padding.
You compare two Base64 strings that encode the same data but have different padding, expecting them to match.

The fix is to canonicalise before comparing: always strip padding when storing, always re-add it (length % 4 determines how many = to add) before decoding with a strict library.

Using standard Base64 in URLs without URL-encoding

If you must use standard (non-URL-safe) Base64 in a URL, percent-encode the + characters as %2B and / as %2F. Better practice is to switch to Base64url from the start and avoid the issue entirely.

Quick reference: input size vs encoded size

The encoded output size is always ceil(n / 3) * 4 characters, where n is the number of input bytes. Padding brings every output to a multiple of 4.

Original bytes	Encoded characters	Padding `=` chars	Size increase
1	4	2	+300%
2	4	1	+100%
3	4	0	+33%
100	136	0	+36%
1,000	1,336	0 or 1–2	+33.6%
1 MB	~1.37 MB	varies	~+33%
10 MB	~13.7 MB	varies	~+33%

The overhead converges to exactly 33.3% for large inputs. For 1–2 byte inputs the percentage looks extreme because you are comparing a tiny numerator against itself.

Try it now

The fastest way to understand Base64 in practice is to encode and decode a few strings yourself. Paste text, upload a small file, or decode a JWT segment:

Open Base64 Encoder / Decoder →

The tool handles standard Base64, Base64url, and file encoding. It also shows the decoded byte length alongside the encoded output so you can see the size overhead in real time.