URL Encoding: When, Why, and How (with Examples)
Every web developer eventually meets URL encoding through a bug. A query parameter with a plus sign turns into a space. A search term with an ampersand splits into two parameters. A redirect URL gets double-encoded and breaks. The root cause is always the same: a character had a special meaning in URL syntax, and something either escaped it that shouldn't have, or didn't escape it that should have. This guide explains the rules in enough depth to fix and prevent those bugs.
URL encoding — formally called percent-encoding and defined in RFC 3986 — is the mechanism by which characters that would otherwise have syntactic meaning in a URL get represented as % followed by their two-digit hexadecimal byte value. A space becomes %20; an at-sign becomes %40; a Chinese character becomes a three- or four-byte UTF-8 sequence, each byte percent-encoded.
Why URLs Need Encoding at All
URLs are not opaque blobs — they are structured strings with grammar. The structure uses certain characters as delimiters: / separates path segments, ? begins the query string, & separates parameters, # begins the fragment, = separates key from value. If the data you want to put inside a URL contains any of these delimiters, the parser will misinterpret your data as structure.
The problem, in one example:
Search query: cats & dogs
Naive URL: example.com/search?q=cats & dogs
Parser sees: query q=cats , then a second parameter dogs.
Correct URL: example.com/search?q=cats%20%26%20dogs
Parser sees: one parameter, value cats & dogs. ✓
Encoding draws a clean line between syntactic delimiters (kept as-is) and data (escaped where ambiguous). Without it, you cannot reliably round-trip arbitrary strings through a URL.
The Character Classes: Reserved vs Unreserved
RFC 3986 splits URL characters into three classes:
| Class | Characters | Behavior |
|---|---|---|
| Unreserved | A-Z a-z 0-9 - _ . ~ | Never need encoding |
| Reserved | : / ? # [ ] @ ! $ & ' ( ) * + , ; = | Encode when used as data, leave as-is when used as delimiter |
| Other | Everything else (spaces, accented letters, <, >, quotes...) | Always encode |
The tricky class is "reserved." Whether they need encoding depends on where they appear. A forward slash in the path is a directory separator and stays unencoded. A forward slash inside a query parameter value should be encoded (%2F) because some servers treat slashes specially even in query strings.
encodeURI vs encodeURIComponent: The Two JavaScript Functions
JavaScript exposes two built-ins that look interchangeable but encode different sets of characters. Choosing the wrong one is the most common URL-encoding bug in front-end code.
| Function | Does NOT encode | Use for |
|---|---|---|
encodeURI(str) | : / ? # & = + , ; @ $ | An entire URL where structure must be preserved |
encodeURIComponent(str) | Only unreserved chars | A single value (query parameter, path segment, form field) |
The rule of thumb: if you are inserting user-provided text into one slot of a URL, always use encodeURIComponent. If you somehow have a full URL string with structure and only need to escape spaces and high-Unicode characters, use encodeURI. In practice, 95% of real use cases want encodeURIComponent.
Building a search URL correctly:
const query = "cats & dogs";
const url = `https://example.com/search?q=${encodeURIComponent(query)}`;
// → https://example.com/search?q=cats%20%26%20dogs
The Plus Sign: Form Encoding vs URL Encoding
There is one historical wrinkle that catches everyone. In the application/x-www-form-urlencoded media type — the format browsers use for HTML form submissions — spaces are encoded as +, not %20. This rule predates RFC 3986 and remains in force for compatibility.
The result: when a server reads a query string, it should decode + as space. But + outside a query string is a literal plus sign. %2B is the unambiguous escape for a literal plus everywhere.
+ was decoded as a space. Encode it as %2B on the way out, and the value survives the round trip.
Common Bugs and Their Fixes
Three failure modes account for the vast majority of URL-encoding bugs in production code.
- Forgetting to encode user input. Any string that goes into a URL from a user, a database, or another system must pass through an encoder. Hard-coded examples that work during development can break against real data.
- Double encoding. If a value arrives at your code already encoded and you encode it again,
%20becomes%2520. Common in redirect chains where each system "helpfully" encodes the next URL. Decode first if you suspect the input is encoded, then re-encode if needed. - Wrong charset. Modern URLs encode non-ASCII as UTF-8 percent-bytes. Legacy systems sometimes used the page's native charset (Latin-1, Shift-JIS, GB2312). If you receive garbled characters, suspect a charset mismatch. UTF-8 is the safe modern default.
The diagnostic loop for any URL-encoding bug is: print the value at every stage, identify where the round trip changes shape, and confirm the encoder/decoder pair is consistent. Tools like a URL encoder or an HTML encoder let you reproduce the encoding manually and compare. For binary payloads inside URLs, encode to Base64URL first, then percent-encode if needed.
Frequently Asked Questions
Do I need to encode characters inside a URL path?
Encode anything that isn't unreserved or a permitted path character. Spaces, parentheses, and non-ASCII characters always need encoding. Browsers will sometimes display unencoded versions of certain characters (like accented letters) in the address bar for readability, but the underlying request uses the encoded form.
Are URL encoding and HTML encoding the same thing?
No. URL encoding uses percent + hex bytes (& → %26). HTML encoding uses named or numeric entities (& → &). They solve different problems — URL syntax vs HTML syntax — and you sometimes need both, applied in the correct order, for a value rendered into an HTML attribute that points to a URL.
What about international domain names?
The hostname portion of a URL has its own encoding called Punycode, not percent-encoding. café.com is represented in DNS as xn--caf-dma.com. Modern browsers handle this transparently in the address bar but you may see Punycode in logs, certificates, and legacy systems.