~/guides/-blog-url-encoding-explained-

guides · Encoding

URL Encoding Explained: When to Encode and Why It Matters

Why URLs have reserved characters, how percent-encoding works, the difference between encodeURI and encodeURIComponent, and the bugs that happen when you get it wrong.

last updated · June 13, 2026by @vultio

Why URLs have reserved characters

A URL is not just a string — it is a structured format defined by RFC 3986. That structure depends on certain characters having fixed, unambiguous meanings. The forward slash / separates path segments. The question mark ? separates the path from the query string. The ampersand& separates query parameters from one another. The equals sign = separates a parameter name from its value. The hash # begins a fragment identifier.

Because these characters carry structural meaning, they cannot appear freely in the data portions of a URL. If a query parameter value contains a literal &, the parser cannot tell whether that ampersand is part of the value or the start of a new parameter. The URL specification also restricts the overall character set to 7-bit ASCII. Characters outside that range — accented letters, emoji, CJK characters — have no representation in a raw URL and must be encoded before the URL is transmitted. The mechanism that handles both of these problems is percent-encoding, also called URL encoding.

How percent-encoding works

Percent-encoding replaces any byte that cannot appear literally in a URL with a three-character sequence: a percent sign followed by two uppercase hexadecimal digits representing that byte's value. The space character is byte 0x20, so it becomes %20. The ampersand is byte 0x26, so it becomes%26. The percent sign itself is byte 0x25, so to include a literal percent sign you write%25.

space  →  %20
&      →  %26
=      →  %3D
?      →  %3F
#      →  %23
%      →  %25
+      →  %2B
/      →  %2F
@      →  %40
:      →  %3A

For multi-byte characters (anything outside ASCII), the process has an extra step. The character is first encoded as UTF-8, producing one to four bytes, and then each of those bytes is percent-encoded individually. The euro sign € is U+20AC, which in UTF-8 is the three-byte sequence 0xE2 0x82 0xAC, so it becomes %E2%82%AC in a URL.

What gets encoded and what does not

RFC 3986 defines a set of unreserved characters that are safe to use literally anywhere in a URL without encoding. These are:

A–Z  a–z  0–9  -  _  .  ~

Everything else — reserved characters like / ? # & = : @ ! $ ' ( ) * + , ;, and all characters outside ASCII — must be percent-encoded when they appear in the data parts of a URL (as opposed to their structural positions). In practice, which characters you encode depends heavily on which part of the URL you are constructing.

A slash in a path segment means "move to the next segment," so it must be encoded as %2Fwhen the slash is part of the data rather than a separator. A slash in a query parameter value should technically be encoded too, though many servers accept it unencoded because query string parsing is generally less strict about slashes.

encodeURI vs encodeURIComponent

JavaScript provides two built-in functions for URL encoding, and choosing the wrong one is one of the most common URL-related bugs in frontend code.

encodeURI is designed for encoding a complete URL. It leaves all characters that have structural meaning in URLs — / ? # & = : @ ! $ ' ( ) * + , ; — untouched, because those characters are assumed to be doing their structural job. It only encodes characters that have no valid place in any URL at all.

encodeURIComponent is designed for encoding a single value that will be placed inside a URL — a query parameter value, a path segment, or a fragment. It encodes everything that is not an unreserved character, including / ? # & = + @ and all other reserved characters. This is almost always what you want when constructing a URL programmatically.

// You want to pass a URL as a query parameter value
const redirectUrl = 'https://example.com/path?foo=bar&baz=1';

// WRONG: encodeURI leaves &, ?, = untouched
const bad = '/login?redirect=' + encodeURI(redirectUrl);
// Result: /login?redirect=https://example.com/path?foo=bar&baz=1
// The parser sees two query parameters: redirect and baz — the URL is broken.

// CORRECT: encodeURIComponent encodes every special character
const good = '/login?redirect=' + encodeURIComponent(redirectUrl);
// Result: /login?redirect=https%3A%2F%2Fexample.com%2Fpath%3Ffoo%3Dbar%26baz%3D1
// The parser sees one query parameter: redirect — its value is the full URL.

As a rule of thumb: use encodeURIComponent on individual values before inserting them into a URL. Reserve encodeURI for the rare case where you have a fully-formed URL that might contain non-ASCII characters and you need to clean it up without disturbing its structure.

The + vs %20 difference

There are two encoding conventions for spaces, and they are not interchangeable.

%20 is the correct percent-encoding of a space character per RFC 3986. It is valid everywhere in a URL — path segments, query strings, and fragment identifiers.

+ as a space is a convention from the older application/x-www-form-urlencodedformat, which is what HTML forms use when they submit via GET or POST. In that encoding, spaces become+ and + literals become %2B. This convention is only valid inside query strings of form submissions — it has no meaning in URL path segments.

// In a URL path: %20 is correct, + is a literal plus sign
/search/hello%20world    ✓  (path segment: "hello world")
/search/hello+world      ✗  (path segment: "hello+world", not "hello world")

// In a query string (form-encoded): both work, but mean the same thing
?q=hello%20world         ✓  (query value: "hello world")
?q=hello+world           ✓  (query value: "hello world", form-encoding convention)
?q=hello+world&sign=2%2B2  ✓  (query: q="hello world", sign="2+2")

The safest approach is to always use %20 for spaces and %2B for literal plus signs. Using encodeURIComponent in JavaScript does this correctly by default — it produces%20 for spaces. The older escape() function (now deprecated) and some form serialisers produce +, which is where the confusion enters codebases.

Common bugs caused by incorrect URL encoding

Double encoding (%2520 instead of %25)

Double encoding happens when a value that has already been percent-encoded is encoded a second time. The percent sign from the first encoding — which is itself byte 0x25 — gets encoded to %25, so %20 becomes %2520. The server decodes once and receives %20as a literal string rather than a space. This typically happens when one layer of a stack encodes before passing to another layer that also encodes, or when a developer encodes a value and then concatenates it into a URL that gets encoded again.

const value = 'hello world';
const encoded = encodeURIComponent(value);   // "hello%20world"
const doubleEncoded = encodeURIComponent(encoded);  // "hello%2520world" — bug!

// Fix: only encode the raw value, never re-encode already-encoded strings
const url = '/search?q=' + encodeURIComponent(value);  // correct

Not encoding query parameter values that contain &

If a parameter value contains a literal & and you do not encode it, the URL parser splits the value at that ampersand and creates a phantom parameter. The value AT&Tin a query string must be encoded as AT%26T. Failing to do so turns?company=AT&T&plan=basic into two parameters — company=AT andT&plan=basic — neither of which is what was intended.

Not decoding on the server

Most frameworks decode URL parameters automatically before surfacing them to application code, but raw HTTP handling or custom middleware occasionally skips this step. The symptom is that the application receives a literal string like hello%20world where it expected hello world. Always verify which layer is responsible for decoding and ensure it is not being skipped or called twice. In Node.js, decodeURIComponent is the correct decoding function;unescape() is deprecated and handles a different legacy encoding.

Encoding characters that must not be encoded

Encoding structural characters that should remain literal breaks the URL structure. Running an entire URL through encodeURIComponent will encode the :// in the scheme, the slashes in the path, and the ? and & in the query string, producing a string that is unusable as a URL. This is why you must encode individual values, not entire URLs.

URL encoding in different contexts

HTML attributes (href, src, action)

URLs inside HTML attributes must be both URL-encoded and HTML-entity-encoded. The ampersand in a query string — which in a URL is already a structural separator — must be written as &in HTML because & starts an HTML entity. Writing href="/search?a=1&b=2"is technically invalid HTML; the correct form is href="/search?a=1&b=2". Browsers are lenient and accept the bare ampersand, but validators flag it, and some XML-based HTML parsers will fail outright.

HTTP headers

HTTP headers are also ASCII-only and have their own encoding rules separate from URL encoding. TheLocation header used in redirects must contain a valid URI, so non-ASCII values in redirect targets must be percent-encoded. The Content-Disposition header uses a different encoding scheme (RFC 5987) for non-ASCII filenames, using filename*=UTF-8''encoded-name syntax rather than plain percent-encoding.

Form submissions

When an HTML form submits via GET, the browser serialises the form fields usingapplication/x-www-form-urlencoded: spaces become +, and most other special characters are percent-encoded. The resulting string is appended to the action URL as the query string. POST forms with the default encoding type use the same format in the request body. Multipart forms (enctype="multipart/form-data") use a different format entirely — fields are not URL-encoded but instead separated by a boundary string in the body.

REST API path segments

When a resource identifier that may contain special characters is placed in a URL path segment, it must be percent-encoded. A file named report/2026 Q1.pdf placed in a REST path must become/files/report%2F2026%20Q1.pdf — with the slash encoded as %2F, not left literal (which would create an extra path segment). Some web frameworks and proxies incorrectly decode%2F before routing, treating it as a path separator; this is a known security issue (path traversal via encoded slashes) and should be handled carefully.

How to decode a URL you received

When a URL arrives encoded — from a server log, a webhook payload, a browser copy-paste, or an API response — you need to decode it to read the original values. The correct approach depends on whether you are looking at a full URL or just a parameter value.

// In JavaScript: decode a single value
decodeURIComponent('hello%20world')       // "hello world"
decodeURIComponent('AT%26T')              // "AT&T"
decodeURIComponent('%E2%82%AC')           // "€"

// Parse a full query string safely
const params = new URLSearchParams('?q=hello%20world&city=S%C3%A3o+Paulo');
params.get('q')      // "hello world"
params.get('city')   // "São Paulo"  (handles both %20 and + for spaces)

// In Python
from urllib.parse import unquote, parse_qs
unquote('hello%20world')     # "hello world"
parse_qs('q=hello+world')    # {'q': ['hello world']}

The browser's built-in URL class in JavaScript is the most reliable way to dissect a full URL. Constructing a new URL(href) and then reading .pathname,.searchParams, and .hash gives you already-decoded values without any manual parsing. The URLSearchParams class handles both + and %20 for spaces, which raw decodeURIComponent does not.

Real-world example: building a search URL

Suppose you are building a search feature that lets users search for text containing special characters — for example, the query C++ & Java (beginner's guide). Here is how to construct the URL correctly in JavaScript:

const query = "C++ & Java (beginner's guide)";
const page = 1;
const sort = 'relevance';

// BAD: string concatenation without encoding
const badUrl = '/search?q=' + query + '&page=' + page + '&sort=' + sort;
// /search?q=C++ & Java (beginner's guide)&page=1&sort=relevance
// Parser sees: q="C++ ", then "Java (beginner's guide)" as junk, page=1, sort=relevance

// GOOD: use URLSearchParams
const params = new URLSearchParams({ q: query, page: String(page), sort });
const goodUrl = '/search?' + params.toString();
// /search?q=C%2B%2B+%26+Java+%28beginner%27s+guide%29&page=1&sort=relevance
// Parser sees: q="C++ & Java (beginner's guide)", page="1", sort="relevance"

// ALSO GOOD: manual encodeURIComponent
const manualUrl = '/search?q=' + encodeURIComponent(query)
  + '&page=' + encodeURIComponent(page)
  + '&sort=' + encodeURIComponent(sort);
// /search?q=C%2B%2B%20%26%20Java%20(beginner's%20guide)&page=1&sort=relevance

URLSearchParams is the preferred approach in modern JavaScript because it handles encoding for every value automatically, deals with arrays and repeated keys correctly, and produces consistently valid query strings. Note that it uses + for spaces (form-encoding convention), whileencodeURIComponent produces %20. Both are decoded correctly by any standard server-side URL parser.

Quick reference: common characters and their encoded forms

Character	Name	Encoded form	Notes
	Space	%20 or +	+ only valid in query strings (form-encoded)
!	Exclamation mark	%21	Safe to leave unencoded in some contexts
"	Double quote	%22	Always encode in URLs
#	Hash	%23	Structural: starts fragment; encode in values
$	Dollar sign	%24	Reserved; encode in query values
%	Percent sign	%25	Must encode to include a literal percent
&	Ampersand	%26	Structural: separates query params; always encode in values
'	Single quote	%27	Encode to avoid HTML parsing issues
(	Open parenthesis	%28	Encode in path segments and query values
)	Close parenthesis	%29	Encode in path segments and query values
+	Plus sign	%2B	Encode to avoid confusion with space in query strings
,	Comma	%2C	Reserved; encode in query values
/	Forward slash	%2F	Structural: path separator; encode in data segments
:	Colon	%3A	Structural in scheme; encode in query values
;	Semicolon	%3B	Reserved; encode in query values
=	Equals sign	%3D	Structural: separates key/value; always encode in values
?	Question mark	%3F	Structural: starts query string; encode in values
@	At sign	%40	Structural in authority; encode in query values
[	Open bracket	%5B	Reserved for IPv6 addresses
]	Close bracket	%5D	Reserved for IPv6 addresses
~	Tilde	~	Unreserved; never needs encoding (RFC 3986)
€	Euro sign	%E2%82%AC	UTF-8 encoded then percent-encoded (3 bytes)
é	e acute	%C3%A9	UTF-8 encoded then percent-encoded (2 bytes)

Try it now

The fastest way to understand URL encoding in practice is to run a few values through an encoder and observe the output. Paste a query string containing ampersands and spaces, or try a non-ASCII character like an accented letter or emoji, and see exactly which bytes are encoded:

Open URL Encoder / Decoder →

The tool handles both encoding and decoding, so you can paste a percent-encoded URL from a server log and immediately read the original values, or encode a raw string to use safely in a URL you are building.