Thursday 13 August 2009

Url encoding

This morning I was trying to work out how to get my tree menu to work properly in IE6. My menu looked like this:
<ul>
<li>$ &amp; + , / : ; = ? @ ' &quot; &lt; &gt; # % { } | \ ^ ~ [ ] `<ul>

<li><a href="/$-&amp;-+-,-/-:-;-=-%3F-@-'-&quot;-&lt;-&gt;-%23-%25-{-}-|-%5C-^-~-[-]-`/Hawaiian-Milkcaps/2" title="Hawaiian Milkcaps">Hawaiian Milkcaps</a></li>
</ul></li>
<li><a href="/American-Games-Caps/4" title="American Games Caps">American Games Caps</a></li>
<li><a href="/Junior-Caps/7" title="Junior Caps">Junior Caps</a></li>
<li>Pogs<ul>
<li><a href="/Pogs/Series-1/1" title="Series 1">Series 1</a></li>
<li><a href="/Pogs/Series-2/3" title="Series 2">Series 2</a></li>
<li><a href="/Pogs/The-World-Tour/15" title="The World Tour">The World Tour</a></li>
<li>Series 1<ul>

<li><a href="/Pogs/Series-1/Chex/5" title="Chex">Chex</a></li>
</ul></li>
</ul></li>
<li><a href="/UK-Games-Caps/16" title="UK Games Caps">UK Games Caps</a></li>
<li><a href="/Zigs/6" title="Zigs">Zigs</a></li>
</ul>
And then I was using javascript to hide each subcategory (ul) and make it so you would expand a subcategory by clicking the category title (the category titles are the texts not inside an anchor element, and are followed by their subcategories in a ul).

As part of this process, I would enclose the category titles in an anchor element, and attach to this a class name, so that when you rolled over the category titles they would become underlined to show they could be clicked on, e.g. I was adding the className 'title', which had the following CSS properties:

#nav .title{

color: #FFF;

text-decoration: none;

}

#nav .title:hover{

text-decoration: underline;

cursor: pointer;

}

The only problem I was having was that in IE6, the hover effect wasn't working. I found that I needed to set the href property of the anchor element for the hover effect to work. So I set the href property to a blank string. I now got the rollover/hover effect in IE6, but had a new problem.

With my menu, I wrote my js so that after converting the menu to a tree menu, it would then loop the links in the menu until it found the link that matched the current page URL. When it did, it would then 'click' on the parent category titles of that link, so they would be expanded and the menu would show the page that you were currently on:

//Highlight current page and expand parents
var links = document.getElementById('nav').getElementsByTagName('A');
for(var i=0, link; link=links[i]; i++)
{
//alert(link.href +' != '+ window.location.href);
if(link.href == window.location.href)
{
//Highlight the link
var span = document.createElement('span');
span.className = 'LucidaSans';
span.style.color = '#C9C9F3';
span.appendChild(document.createTextNode('\u25C9 '));
span.style.marginLeft = '-1em';
link.parentNode.insertBefore(span, link);
//Assign a var for the parent ul - loop won't break if we just try and change the array val for links[i]
var parent = link.parentNode.parentNode;
while(parent.tagName == 'UL' && parent.parentNode.id != 'nav')
{
alert('gh');
//Click on the parent to expand it
if(document.createEvent) //DOM2 compatible browsers
{
var clicky = document.createEvent('MouseEvents');
clicky.initEvent("click", true, true);
parent.previousSibling.dispatchEvent(clicky);
}
else //IE
{
if(parent.previousSibling)
{parent.previousSibling.fireEvent('onclick');}
}
//parent.previousSibling.click();
parent = parent.parentNode.parentNode;
}
break;
}
}

But when I had added an empty href attribute to the category titles, the parents of the current page link weren't being expanded. After doing some testing, I realised what the problem was - an empty href links to the same page you are currently on, so the first empty href (on the first category title) was matching against the current page url, and then the loop would break.

Setting the href of the category titles to '#' instead of a blank string fixed this problem. But I now found another problem, in IE6, the category title of the current page still wasn't being 'clicked on'. Comparing the href of the link and the window.location.href in IE6 versus FF, I found that IE6 decodes the href, so while the actual link href had the value of
/$-&amp;-+-,-/-:-;-=-%3F-@-'-&quot;-&lt;-&gt;-%23-%25-{-}-|-%5C-^-~-[-]-`/Hawaiian-Milkcaps/2
which matched the window.location.href value, IE6 reckoned that the value of the link's href was
/$-&amp;-+-,-/-:-;-=-?-@-'-&quot;-&lt;-&gt;-#-%-{-}-|-\-^-~-[-]-`/Hawaiian-Milkcaps/2
which doesn't match the window.location.href value.

The solution was to re-encode those specific characters in IE6, so the urls would match correctly.

However, after doing that I found that the page url and link url didn't match in Webkit browsers (Chrome 2 and Safari 4) either, the link href was being evaluated as:
http://www.milkcapmania.com/$-&-+-,-/-:-;-=-%3F-@-'-%22-%3C-%3E-%23-%25-%7B-%7D-%7C-%5C-%5E-~-[-]-%60/Hawaiian-Milkcaps/2
While the window.location.href was being evaluated as:
http://www.milkcapmania.com/$-&-+-,-/-:-;-=-%3F-@-%27-%22-%3C-%3E-%23-%25-%7B-%7D-%7C-%5C-%5E-~-%5B-%5D-%60/Hawaiian-Milkcaps/2
So some characters in the url were being url encoded, whilst the same characters in the link href weren't being encoded.

Eventually I came to the solution of just url decoding (unescape) both the window.location.href and the link href, and this seemed to work okay in all browsers.

After that I was trying to see if I should actually urlencode all characters or not. I found a wikipedia article on Nanjing, which had the characters in Chinese. However, I found that a link to the page using the Chinese characters (http://zh.wikipedia.org/wiki/南京), or the url encoded chinese characters (http://zh.wikipedia.org/wiki/%E5%8D%97%E4%BA%AC), both worked. Using Fiddler2, it looked like Firefox was url encoding the chinese characters before sending the GET request, so they were both sending the same page request.

Viewing source on the Firefox page, I could see that all the links in the page used URL encoded strings rather than the Chinese Characters. However, in testing with my own page, I found that for certain characters, if they are url encoded, then FF will display the hex code in the address bar, rather than the actual character that the hex code represents.

For example, the url
$-&-+-,-/-:-;-=-?-@-'-"-<->-#-%-{-}-|-\-^-~-[-]-`/Hawaiian-Milkcaps/2

When the & < > " are converted to html entities, and the % ? # \ characters are url encoded, this gives:
$-&amp;-+-,-/-:-;-=-%3F-@-'-&quot;-&lt;-&gt;-%23-%25-{-}-|-%5C-^-~-[-]-`/Hawaiian-Milkcaps/2
And displays in the address bar as
$-&-+-,-/-:-;-=-%3F-@-'-"-<->-%23-%25-{-}-|-\-^-~-[-]-`/Hawaiian-Milkcaps/2 in Firefox,
$-&-+-,-/-:-;-=-%3F-@-'-%22-%3C-%3E-%23-%25-{-}-|-%5C-^-~-[-]-`/Hawaiian-Milkcaps/2 in Opera,
$-&-+-,-/-:-;-=-%3F-@-'-%22-%3C-%3E-%23-%25-%7B-%7D-%7C-%5C-%5E-~-[-]-%60/Hawaiian-Milkcaps/2 in Internet Explorer 7,
$-&-+-,-/-:-;-=-%3F-@-'-"-<->-%23-%25-{-}-|-\-^-~-[-]-`/Hawaiian-Milkcaps/2 in Google Chrome,
$-&-+-,-/-:-;-=-%3F-@-'-%22-%3C-%3E-%23-%25-{-}-|-%5C-^-~-[-]-`/Hawaiian-Milkcaps/2 in Safari 4, and
$-&-+-,-/-:-;-=-%3F-@-'-"-<->-%23-%25-{-}-|-%5C-^-~-[-]-`/Hawaiian-Milkcaps/2 in IE6

The same url, but url encoded, and with forward slashes converted from hex back to forward slashes), gives:
/%24-%26-%2B-%2C-/-%3A-%3B-%3D-%3F-%40-%27-%22-%3C-%3E-%23-%25-%7B-%7D-%7C-%5C-%5E-%7E-%5B-%5D-%60/Hawaiian-Milkcaps/2
And is displayed in the address bar as:
%24-%26-%2B-%2C-/-%3A-%3B-%3D-%3F-%40-'-"-<->-%23-%25-{-}-|-\-^-~-[-]-`/Hawaiian-Milkcaps/2 in Firefox,
$-%26-%2B-,-/-:-;-%3D-%3F-@-'-"-<->-%23-%25-{-}-|-\-^-~-[-]-`/Hawaiian-Milkcaps/2 in Google Chrome,
%24-%26-%2B-%2C-/-%3A-%3B-%3D-%3F-%40-%27-%22-%3C-%3E-%23-%25-%7B-%7D-%7C-%5C-%5E-%7E-%5B-%5D-%60/Hawaiian-Milkcaps/2 in Safari 4,
%24-%26-%2B-%2C-/-%3A-%3B-%3D-%3F-%40-%27-%22-%3C-%3E-%23-%25-%7B-%7D-%7C-%5C-%5E-~-%5B-%5D-%60/Hawaiian-Milkcaps/2 in Internet Explorer 7,
%24-%26-%2B-%2C-/-%3A-%3B-%3D-%3F-%40-'-%22-%3C-%3E-%23-%25-%7B-%7D-%7C-%5C-%5E-~-%5B-%5D-`/Hawaiian-Milkcaps/2 in Opera 9.64, and
%24-%26-%2B-%2C-/-%3A-%3B-%3D-%3F-%40-%27-%22-%3C-%3E-%23-%25-%7B-%7D-%7C-%5C-%5E-%7E-%5B-%5D-%60/Hawaiian-Milkcaps/2 in IE6.

After that I tried a url with Chinese characters in it:
Pogs-金錢符號/Series-1/1
When it was url encoded, the chinese characters were displayed in the address bar in Firefox, Chrome, Safari, and Opera, but the hex code was displayed in the address bar in IE6 and IE7.

When it wasn't url encoded, the Chinese Characters were displayed in the address bar for all browsers except IE6, because I run IE6 in a virtual machine that doesn't have the Asian language packs installed, so it just displayed some blocks instead.

I checked the Apache access logs to see what was actually being requested, and it seems that all the browsers actually url encoded the requests anyway:

192.168.0.4 - - [13/Aug/2009:20:43:52 +0100] "GET /Pogs-%E9%87%91%E9%8C%A2%E7%AC%A6%E8%99%9F/Series-1/1 HTTP/1.1" 200 2081 "http://www.milkcapmania.com/Pogs-%E9%87%91%E9%8C%A2%E7%AC%A6%E8%99%9F/Series-1/1" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13 (.NET CLR 3.5.30729)"
192.168.0.4 - - [13/Aug/2009:20:44:39 +0100] "GET /Pogs-%E9%87%91%E9%8C%A2%E7%AC%A6%E8%99%9F/Series-1/1 HTTP/1.1" 200 2081 "http://www.milkcapmania.com/Pogs-%E9%87%91%E9%8C%A2%E7%AC%A6%E8%99%9F/Series-1/1" "Opera/9.64 (Windows NT 6.0; U; en) Presto/2.1.1"
192.168.0.4 - - [13/Aug/2009:20:44:51 +0100] "GET /Pogs-%E9%87%91%E9%8C%A2%E7%AC%A6%E8%99%9F/Series-1/1 HTTP/1.1" 200 2081 "http://www.milkcapmania.com/Pogs-\xe9\x87\x91\xe9\x8c\xa2\xe7\xac\xa6\xe8\x99\x9f/Series-1/1" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; WOW64; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30618)"
192.168.0.4 - - [13/Aug/2009:20:44:57 +0100] "GET /Pogs-%E9%87%91%E9%8C%A2%E7%AC%A6%E8%99%9F/Series-1/1 HTTP/1.1" 200 2081 "http://www.milkcapmania.com/Pogs-%E9%87%91%E9%8C%A2%E7%AC%A6%E8%99%9F/Series-1/1" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/530.5 (KHTML, like Gecko) Chrome/2.0.172.39 Safari/530.5"
192.168.0.4 - - [13/Aug/2009:20:45:13 +0100] "GET /Pogs-%E9%87%91%E9%8C%A2%E7%AC%A6%E8%99%9F/Series-1/1 HTTP/1.1" 200 2081 "http://www.milkcapmania.com/Pogs-%E9%87%91%E9%8C%A2%E7%AC%A6%E8%99%9F/Series-1/1" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/530.19.2 (KHTML, like Gecko) Version/4.0.2 Safari/530.19.1"
192.168.0.4 - - [13/Aug/2009:20:45:51 +0100] "GET /Pogs-%E9%87%91%E9%8C%A2%E7%AC%A6%E8%99%9F/Series-1/1 HTTP/1.1" 200 2081 "http://www.milkcapmania.com/Pogs-????/Series-1/1" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"

Interestingly, IE6 (on a virtual machine without the Asian language packs installed), still send the GET request urlencoded correctly, but sends the referrer header with question marks in place of the Chinese characters. IE7 sends the referrer in some method of writing hex code I guess.

When doing these tests with the Chinese characters, I noticed that the tree menu wasn't expanding correctly in Safari, though seemed to be working okay in all the other browsers. When I did some testing on this, I found that the the link href containing chinese characters was being unescaped as -金錢符è

In Safari the unescaped window.location.href was being converted as the chinese characters, but in all other browsers both the link href and the window.location.href were being unescaped as -金錢符è

I found a proper javascript function to encode and decode urls, and so tried using this instead of unescape. However, in Safari 4, using this function meant the window.location.href (which Safari had already decoded) was decoded again, and so still didn't match the link's href.

In Safari:
http://www.milkcapmania.com/Pogs-金錢符號/Series-1/1
!= http://www.milkcapmania.com/Pogs-ᢦﯓeries-1/1
== true

In Firefox
http://www.milkcapmania.com/Pogs-金錢符號/Series-1/1 != http://www.milkcapmania.com/Pogs-金錢符號/Series-1/1
== false


Interestingly, the same thing is logged in Firefox whether the original url was encoded or not, so it seems that all Browsers apart from Safari urlencode any unencoded characters in window.location.href, whilst Safari always urldecodes any url encoded characters in window.location.href. And all browsers including Safari seem to url encode any unencoded characters in the hrefs of anchors.

The weather today was a mixture of cloud and sun. I think there was probably a good sunset.

Food
Breakfast: Blood Orange marmalade toast sandwich; cup o' tea.
Lunch: 2x Cheese on toasts; cherry plum tomatoes; iceberg lettuce; clementine; Fairtrade chocolate wafer bar; cup o' tea.
Dinner: Shepherd's pie; grated cheese; tomato ketchup; carrots; peas. Pudding was a Caramel White Chocolate biscuit thing slice. Coffee.

No comments: