Wednesday 23 November 2011

How to update multiple rows in mysql and rant against rfc3986

Today I was doing some work on my photo website and needed to update all the records in a table, but each row needed to be updated to a different value. So I looked for ways to do this, and it seems there are a few:

If I get a minute (unlikely) I will test the different methods to see which is most efficient.

I was working on creating a feed for my site, and reading this tutorial on ATOM feeds, I liked the source of the comment they use to get around browser's not applying XSL stylesheets:

I employ an excerpt in Esperanto translation from the first chapter of a Polish novel about ancient Egypt.

As part of my work on creating a feed for the site, I have had to change the url structure to be encoded as per rfc3986. I don't agree with the restrictive ASCII only nature of rfc3986, in my opinion we should have moved beyond that by now and chinese characters etc. should be valid in URLs. rfc3986 to me comes across like so long as we don't have to encode characters in english it's okay, the rest of the world will just have to put up with encoding their characters as we don't care about them. That's probably not actually the case, but just the way it seems to me.

As well as this, some web browsers display unencoded urls correctly in the address bar, but display encoded urls in their encoded form e.g.

  • If I link to a page as http://zh-yue.wikipedia.org/wiki/中文 it will display as http://zh-yue.wikipedia.org/wiki/中文 in the address bar.

  • However, if I link to the url encoded as per rfc3986 http://zh-yue.wikipedia.org/wiki/%E4%B8%AD%E6%96%87, it will display in the address bar as http://zh-yue.wikipedia.org/wiki/%E4%B8%AD%E6%96%87

Now I know which looks much nicer to me as a url, especially if my native language was Chinese. It should be noted that for the unencoded url, the browser will encode the url itself when it makes the page request, but will display the unencoded url in the address bar. So by using unencoded urls, you do not avoid rfc3986, but rather the encoding responsibility for requests is put on the browser's shoulders.

I believe Google Chrome is much better in this aspect than IE (what a surprise!) and displays the unencoded url in the address bar even if the page was loaded from an encoded link. Unfortunately IE is the most popular browser, so it is important how things appear in it.

There is also an issue of filesize - an encoded url will take up more space than an unencoded url. Not a big issue, but another reason against encoding urls. (For the above example the encoded url is 51 bytes while the unencoded url is 39 bytes when both are saved as UTF-8, and that's with only two of the 35 characters being encoded).

Anyway, despite my disagreement with rfc3986 I still need to implement it to make the feed valid. Plus my not using it has probably made some pages on my site not discoverable by Google or accessible by some strange web browsers / bots.

So while I was looking at converting my urls to be compliant with rfc3986, I wondered about how I was structuring my URLs with regard to pagination and SEO. I found quite a bit of info on this subject: http://www.google.com/support/forum/p/Webmasters/thread?tid=344378292ff91e8d&hl=en&start=40#fid_344378292ff91e8d0004af9f4a5efbe7.

I have still got quite a bit of reading to do, but what I gather so far is:

  • Use rel=next and prev links in the <head> for paginated pages
  • Use query string parameters for the page (and other parameters) rather than including them as part of the URL (which I am doing at the moment - oops):

    @katty22: One thing I’d like to mention is that for interchangeable/filterable options, it’s more search engine friendly to keep the options as parameters, not as subdirectories. For example, we’d prefer this URL:
    http://www.example.com/item?story=abc&price=up&page=1
    rather than:
    http://www.example.com/story=abc/price=up/page=1
  • Unlike some articles suggest, if you have four pages that are the same, except for a url parameter, and each 'page' has a single link to it, this will not be any worse SEO-wise than having one page with four links to it. Google will automatically identify the pages as being the same as group them into a cluster:

    When Google detects duplicate content, such as variations caused by URL parameters, we group the duplicate URLs into one cluster and select what we think is the "best" URL to represent the cluster in search results. We then consolidate properties of the URLs in the cluster, such as link popularity, to the representative URL. Consolidating properties from duplicates into one representative URL often provides users with more accurate search results.
    source: http://www.google.com/support/webmasters/bin/answer.py?answer=1235687
  • I was interested about this suggestion on pagination. The article suggests that you shouldn't have more than 100 links on a page, and that comment suggests only linking certain pages using logarithmic pagination.

    I haven't read any guidance from Google on this yet. At the moment I have all pages linked and then use js to reduce this for the user, e.g. for page 45 js would reduce the links to look like
    1, 2, [...] 43, 44, 45, 46, 47, [...], 120, 121
    And the user can expand back to showing all linked pages by clicking on the [...]

    So this is something I need to look at more closely.

I hope to do some more research and reading on this aspect of SEO tomorrow.

No comments: