URL Encoding Background
URL Encoding is the process of converting string into valid
URL format. Valid URL format means that the URL contains only
what is termed "alpha | digit | safe | extra | escape" characters. You can read more about the
what and the whys of these terms on the World Wide Web Consortium
site: http://www.w3.org/Addressing/URL/url-spec.html and http://www.w3.org/International/francois.yergeau.html.
URL encoding is normally performed to convert data
passed via html forms, because such data may contain special
character, such as "/", ".", "#", and so on, which
could either: a) have special
meanings; or b) is not a valid character for an URL; or c) could be
altered during transfer. For instance, the "#" character
needs to be encoded because it has a special meaning of that of an html anchor. The
<space> character also needs to be encoded because is not allowed on a valid URL format.
Also, some characters, such as "~" might
not transport properly across the internet.
One of the most common encounters with URL Encoding is when
dealing with <form>s. Form methods (GET and POST)
perform URL Encoding implicitly. Websites uses GET and POST methods to pass parameters between html pages.
As an example, click the form below
to see the string being URL encoded.
<form method="GET" action="example.html">
<input type="text" name="var" size="50" value="This is a simple & short test.">
This sample <form> sends the data in the text field using the GET
method, which means that the data will be appended as query string.
If you click the button and look at the resulting URL in the browser address bar, you
should see something like this (the query string portion, which is
automatically URL encoded by the browser, is shown in blue):
Here, you can see that:
- The <space> character has been URL encoded as "+".
- The & character has been URL encoded as "%26".
<space> character and & character are just some of the special characters that need
to be encoded. Below are some others (click the button to see
the result of the encoding).
Here's the query string portion, which (as before) has been
encoded by the browser automatically:
As you can see, when a character is
URL-encoded, it's converted as %XY, where X and Y is a
number. You will see later where these numbers come
What Should be URL Encoded?
As a rule of thumb, any non alphanumeric character should
be URL encoded. This of course applies to characters that are
be interpreted as is (ie: is not intend to have special meanings)
. In such cases, there's no harm in URL-Encoding the
character, even if the character actually does not need
to be URL-Encoded.
Some Common Special Characters
Here's a table of some of often used characters and their
||%20 or +
Note that because the <space> character is very
commonly used, a special code ( the "+" sign) has been reserved as its URL
encoding. Thus the string "A B" can be URL encoded as
either "A%20B" or "A+B".
Where Does the Numbers Come From?
The number following the % sign is the hexadecimal ASCII code of
the character being encoded. You can find an ASCII table here.
Most web programming languages already provide built in method to perform URL
Encoding and URL Decoding. Here are the common ones, click the method name
to find more info.
The example below uses the escape and unescape functions. Because the escape function does not
properly encode the '+' and '/' character (I've no
idea why it's programmed in that way), these characters need to be converted
manually. This is done using String.replace function.
Type anything on the Not encoded field, when you press the
URL Encode button, the encoded string will be displayed on the Encoded field.
Alternatively, type anything on the Encoded field, when you press the URL Decode button, the decoded string will
be displayed on the Not encoded field.