Wednesday 12 August 2009

Trying to get an .htaccess rewrite rule to work

Today I was trying to figure out how to get reasonably nice looking (and SEO friendly) urls. I found yesterday, that urls with %3F (url encoded question mark) in them weren't working, and neither were urls with %2F (url encoded forward slash).

I had posted a question about this on the WebFaction forums yesterday, and when I checked today, Japh had replied to it to say that the problem was that Apache's mod_rewrite does a url decode on the url before it checks it against the rules. So the bit of the url after the %3F would be considered a query string by apache I guess, and not part of the actual url address.

The solution he suggested was to double encode the url, so %3F would become %253F. If possible, I wanted to keep my urls as simple as possible, so I did some googling to see if there was any way to get round this. A url containing %3F actually works okay with the rewrite rules on my local system, just not on the webserver, so I did some testing using %2F (url encoded forward slash) in the url instead, as this doesn't work on the webserver or my local system.

I did a lot of testing, but just couldn't get any rewrite rules to work when %2F was in the url. The last post in this thread: mod rewrite fails with %2f character explains it -
The naked "%2f" is allowed in a query string. but not in a URL. In order to be valid, it would have to be encoded as %252f, which I think you will find to work as you expect.

Because the URL is itself invalid, the server is rejecting it before any Apache modules are invoked. On my server, not even the custom 404 error page is applied -- The server simply rejects the request out-of-hand.


Before lunch me, Moccle and L started watching 'The Lost World' (1925), after lunch I went on Animal, then we finished watching The Lost World.

After that, I carried on looking at what characters are acceptable in URLs, I found a useful guide - URL Encoding, so went through the characters there to see which characters were okay, and which weren't. Obviously, you could just url encode all non alpha-numeric characters, but this leads to messy looking urls, and I'm not sure how SEO friendly URL encoded characters are versus normal characters.

Anyway, going through those characters, I found only % ? and # need url encoding. (In my case anyway, as I'm discarding all the url except the end part which contains the number of the database record my page needs to look up). I also found that IE6 a backslash would be converted into a forward slash, so I also decided to url encode that.

In the evening I did some more looking at url rewriting urls with %3F in them, and eventually found that by putting the rewrite rule as a rewrite condition, and checking the rule against the REQUEST_URI, then I could check it okay even though %3F was being unencoded to a question mark. Here's a link to the thread at the WebFaction forums: URL re-writing not working when %3F or %2F in url

Also in the evening, I watched Bulletproof (Gary Busey film) with Moccle. Typical Gary Busey/80's action flick skillness. It also had the ultimate baddies of A-rab Russian Mexican South American Communists. Moccle also did a good job of spotting Danny Trejo in the opening gambit of the film.

It also had a dude in a car near the end who was fat like Hurley from Lost, and sounded totally like him as well. He had a beard though, and was about the same age as Hurley is now, and wasn't played by the same actor.

Food
Breakfast: Blood Orange marmalade toast sandwich; cup o' tea.
Lunch: Ham with mustard and iceberg lettuce sandwich; satsuma; piece of Chocolate cereal cake stuff; cup o' tea.
Dinner: Big german sausage in a sub with iceberg lettuce and tomato ketchup; vegetable fake cup a soup. Pudding was an Aero Mousse and also a caramel and white chocolate biscuit slice thing. Coffee.
Supper: Hob-nob; Shortbread finger.

No comments: