Friday 14 August 2009

More URL encoding

So today I was still looking at URL encoding. This time I decided to see what the browser was sending to the server when a link (mostly) isn't URL encoded. I used the following link href again:
/$-&-+-,-/-:-;-=-%3F-@-'-"-<->-%23-%25-{-}-|-%5C-^-~-[-]-`/Hawaiian-Milkcaps/2

Safari
Address bar: http://www.milkcapmania.com/$-&-+-,-/-:-;-=-%3F-@-'-%22-%3C-%3E-%23-%25-{-}-|-%5C-^-~-[-]-`/Hawaiian-Milkcaps/2
Request: GET /$-&-+-,-/-:-;-=-%3F-@-'-%22-%3C-%3E-%23-%25-%7B-%7D-%7C-%5C-%5E-~-%5B-%5D-%60/Hawaiian-Milkcaps/2 HTTP/1.1
Referer: http://www.milkcapmania.com/$-&-+-,-/-:-;-=-%3F-@-'-%22-%3C-%3E-%23-%25-{-}-|-%5C-^-~-[-]-`/Hawaiian-Milkcaps/2

Chrome
Address bar: http://www.milkcapmania.com/$-&-+-,-/-:-;-=-%3F-@-'-"-<->-%23-%25-{-}-|-\-^-~-[-]-`/Hawaiian-Milkcaps/2
Request: GET /$-&-+-,-/-:-;-=-%3F-@-'-%22-%3C-%3E-%23-%25-%7B-%7D-%7C-%5C-%5E-~-[-]-%60/Hawaiian-Milkcaps/2 HTTP/1.1
Referer: http://www.milkcapmania.com/$-&-+-,-/-:-;-=-%3F-@-'-%22-%3C-%3E-%23-%25-%7B-%7D-%7C-%5C-%5E-~-[-]-%60/Hawaiian-Milkcaps/2

Internet Explorer 7 (IE7)
Address bar: http://www.milkcapmania.com/$-&-+-,-/-:-;-=-%3F-@-'-%22-%3C-%3E-%23-%25-%7B-%7D-%7C-%5C-%5E-~-[-]-%60/Hawaiian-Milkcaps/2
Request: GET /$-&-+-,-/-:-;-=-%3F-@-'-%22-%3C-%3E-%23-%25-%7B-%7D-%7C-%5C-%5E-~-[-]-%60/Hawaiian-Milkcaps/2 HTTP/1.1
Referer: http://www.milkcapmania.com/$-&-+-,-/-:-;-=-%3F-@-'-%22-%3C-%3E-%23-%25-%7B-%7D-%7C-%5C-%5E-~-[-]-%60/Hawaiian-Milkcaps/2

Opera 9.64
Address bar: http://www.milkcapmania.com/$-&-+-,-/-:-;-=-%3F-@-'-%22-%3C-%3E-%23-%25-{-}-|-%5C-^-~-[-]-`/Hawaiian-Milkcaps/2
Request: GET /$-&-+-,-/-:-;-=-%3F-@-'-%22-%3C-%3E-%23-%25-{-}-|-%5C-^-~-[-]-%60/Hawaiian-Milkcaps/2 HTTP/1.1
Referer: http://www.milkcapmania.com/$-&-+-,-/-:-;-=-%3F-@-'-%22-%3C-%3E-%23-%25-{-}-|-%5C-^-~-[-]-%60/Hawaiian-Milkcaps/2

Firefox
Address bar: http://www.milkcapmania.com/$-&-+-,-/-:-;-=-%3F-@-'-"-<->-%23-%25-{-}-|-\-^-~-[-]-`/Hawaiian-Milkcaps/2
*The above address bar value is probably different from the one I posted for the same url in Firefox yesterday - if you copy the address bar value from Firefox, then some characters are url encoded when you paste it, even though they don't actually appear url encoded in the address bar. The above value is what appears in the address bar (what it looks like to the user).
Request: GET /$-&-+-,-/-:-;-=-%3F-@-%27-%22-%3C-%3E-%23-%25-%7B-%7D-|-%5C-%5E-~-%5B-%5D-%60/Hawaiian-Milkcaps/2 HTTP/1.1
Referer: http://www.milkcapmania.com/$-&-+-,-/-:-;-=-%3F-@-%27-%22-%3C-%3E-%23-%25-%7B-%7D-|-%5C-%5E-~-%5B-%5D-%60/Hawaiian-Milkcaps/2

Internet Explorer 6 (IE6)
Address bar: http://www.milkcapmania.com/$-&-+-,-/-:-;-=-%3F-@-'-"-<->-%23-%25-{-}-|-%5C-^-~-[-]-`/Hawaiian-Milkcaps/2
Request: GET /$-&-+-,-/-:-;-=-%3F-@-'-"-<->-%23-%25-{-}-|-%5C-^-~-[-]-`/Hawaiian-Milkcaps/2 HTTP/1.1
Referer: http://www.milkcapmania.com/$-&-+-,-/-:-;-=-%3F-@-%27-%22-%3C-%3E-%23-%25-%7B-%7D-|-%5C-%5E-~-%5B-%5D-%60/Hawaiian-Milkcaps/2

Next I tried checking to see if I could compare on the server the URL sent by the browser with what the URL should be. I was doing this test with the links NOT url encoded, other than the %, #, ? and \ characters url encoded, and all whitespace converted into hyphens -.

So I wrote some PHP to construct what the URL of the page should be, and then compared this to the urldecoded $_SERVER['REQUEST_URI']. However, any plus signs + in the url were being decoded to a space. If the whole URL was url encoded, then this wouldn't happen. To fix it without having url encoded links, all I had to do was to first replace any plus + signs in $_SERVER['REQUEST_URI'] before url decoding it $currentURL = urldecode(str_replace('+', '%2B', $_SERVER["REQUEST_URI"]));

I tested this method with the same url/link I was using for the tests above, and also one with Chinese characters, and viewing the pages in all browsers (Chrome 2, FF3, IE6, IE7, Opera9.6, Safari4), I could see that the requested URL and what the URL should be matched okay.

One thing I noticed in my testing was that Google Chrome displays a URL encoded backslash %5C in the address bar as a plain unencoded backslash \. If you copy the url from the address bar in Chrome, and then paste it into any browser except Firefox (so IE6, IE7, Safari 4, Opera 9.6, and even Chrome itself), the backslash will be converted into a forward slash / (and sent to the server as a forward slash).

For my rewrite rule for the page in question, I only allow 5 forward slashes /, so if a url already has 5 forward slashes /, and also contains a back slash \ in it, then if that back slash is converted into a forward slash, there will be too many forward slashes for the rewrite rule to match, and the user will get a 404.

In the afternoon and evening today I watched some WC Fields shorts and Goldfinger. I also went on Animal Crossing and Wii Sports Resort. I went on MC Neat's website, and it had some tracks from his 'new' album 'Neat Situations', due out early Spring 2007 (no, not a typo). Apparently he has 'a totally new image and new sound to blow you away', though personally I didn't think much of the 'new' tracks.

His website also has a gallery where you can 'Check out Neat in his various natural poses'. It only has two photos, the first looks like his head photoshopped (though it's probably real) inside a car window, and in the second photo he's sitting on a sofa holding his head like he's got a headache.

The weather today was pretty windy and quite sunny.

Food
Breakfast: Blood Orange marmalade toast sandwich; cup o' tea.
Lunch: Ham with mustard, sliced cherry plum tomatoes (3 fruits in one), and iceberg lettuce sandwich; Clementine; Fairtrade Wafer biscuit; cup o' tea.
Dinner: Beef burger in a bun with mature cheddar cheese, tomato ketchup and iceberg lettuce; Fried vegetables with mexican flavouring stuff that didn't actually have much flavour. Pudding was a Cadbury's Magnum style ice cream (delee). Piece of Sainsbury's Turkish delight chocolate; Piece of Sainsbury's Caramel chocolate; coffee.

No comments: