Thursday 22 December 2011

Websiting

This morning I thought about how my wordpress blog feed sent a 304 Not modified header if there hadn't been any new posts since the feed was last requested. For my photo website feed I hadn't implemented this, but it seemed like it would be a good idea. Otherwise a feed reader would have to download the whole feed each time it wanted to check for new items.

However, when I looked for tutorials about creating a feed using PHP, none of them seemed to mention this. However, I did find a tutorial on dealing with sending Last-Modified headers here: Using HTTP IF MODIFIED SINCE with PHP.

I read the W3C info on the Last-Modified header, which states that HTTP/1.1 servers SHOULD send Last-Modified whenever feasible. I haven't been doing this on any of my dynamic pages, however I can't see how I'd be able to implement it, other than on pages where the images are sorted by date.

If I have 20 pages sorted by rating, and a new image is added today that gets inserted to page 10, pages 1-9 would still be the same, while pages 10-20 would be last updated today. The only solution I can think of would be to create a new database table with a record for every single page that could exist. Then whenever a new image is added, calculate the pages that would be changed and update the pages table with the new last modified date.

The work required to write this logic, the size of the new db table, and the extra processing work required for the server means it is not worth implementing a Last-Modified date for me. It is okay for the feed though, since there the images are sorted in date order, so you can just get the date of the most recently added image.

I did what I was hoping to be my final run with the IIS Site Analyser, but then still found some more things that needed to be corrected. I found that as well as the RSS feed, Wordpress also had an ATOM feed. I realised that you could have ATOM and RSS feeds in wordpress, but I didn't think that both of them were linked to from the blog. I thought wrong obviously.

So I did the same for the ATOM feed as I did the other day for the RSS feed. But then I found some more problems. The ATOM feed had an id specified like so:

<id><?php bloginfo('atom_url'); ?></id>

This seems to be quite incorrect, since it means that all feeds for the blog would have the same id. In a similar way, the alternate link (link to the HTML page that the feed is for) was the same for all feeds:

<link rel="self" type="application/atom+xml" href="<?php self_link(); ?>" />

For fixing the id issue, you can just use self_link(), which gives the URL of the current feed. For fixing the alternate link, I took Wordpress' self_link() function and modified it slightly to remove '/feed' or '/feed/atom'. This gives the url of the page the feed is for. I put this function in my theme's functions.php file:

/**
 * Display the link for the currently displayed feed in a XSS safe way.
 *
 * Generate a correct link for the rss link or atom alternate link element.
 *
 * @package WordPress
 * @subpackage Feed
 * @since 2.5
 */
function get_self_alt_link() {
 $host = @parse_url(home_url());
 $host = $host['host'];
 return esc_url(
  'http'
  . ( (isset($_SERVER['https']) && $_SERVER['https'] == 'on') ? 's' : '' ) . '://'
  . $host
  . preg_replace('/blog(\/)?(.*)?\/feed(\/atom)?(\?.*)?/', 'blog/$2$4', $_SERVER['REQUEST_URI'])
  );
}

The alternate link issue is relevant for RSS feeds as well as ATOM feeds, just I missed the issue before when I was modifying the RSS feed template.

I spent most of the rest of the day trying to fix some issues with my google maps page. I was sure I had it working okay before!

No comments: