Wednesday 18 February 2009

Creating a sitemap

I did some more reading about sitemaps this morning. I started off reading Sitemap Tutorial, which used the
Sitemap Protocol 0.9 introduced by SiteMaps.org which is a standard, validated and used by Google and most other search engines.


So then I visited sitemaps.org and read the info there, which was very helpful.

I checked the w3c example of styling XML with a CSS stylesheet in IE, and it worked fine. However, you can't make the urls turn into real links using CSS or include the rest of the HTML page, so I started looking at XSL.

w3 schools has a good tutorial on using XSL to style XML documents, so I followed that, but found that the for-each section wasn't working for me. So I copied the w3schools example XML and XSLT, and then gradually changed them until it didn't work anymore. I found that when I added the xml namespace to the urlset node, it stopped the for-each section working (well it probably still works, just doesn't find any items to loop through.

So after that I tried using DOMDocument in PHP, but I couldn't work out how access the nodes.

Next I tried SimpleXML, which was nice and, well... simple to use.

Me and Rad had lunch by ourselves, then about 1.30pm Clare, Uncle Gez and Ben came back from Asda, and a bit later Rad came back from the train station with Shaz. I played on Animal Crossing for a bit after lunch, then after everyone had come back I had some cakes since I didn't have any when I ate my lunch.

Later in the afternoon I gave DOMDocument another try and did manage to get the gist of how to access nodes this time.

Here's an example of displaying my sitemap using simpleXML in PHP:
$xml = simplexml_load_file('sitemap.xml');
echo '<ul>';
foreach($xml->url as $url)
{
$loc = substr($loc->loc, 24);
echo '<li><a href="'.$loc.'" class="label">'.$loc.'</a> Last Modified '.date_format(date_create($url->lastmod), 'jS F Y').'</li>';
}
echo '</ul>';


and here's the same using DOMDocument in PHP:
$xml = new DOMDocument();
$xml->load('sitemap.xml');
$urls = $xml->getElementsByTagName('url');
echo '<ul>';
foreach($urls as $url)
{
$loc = substr($url->getElementsByTagName('loc')->item(0)->nodeValue, 24);
echo '<li><a href="'.$loc.'" class="label">'.$loc.'</a> Last Modified '.date_format(date_create($url->getElementsByTagName('lastmod')->item(0)->nodeValue), 'jS F Y').'</li>';
}
echo '</ul>';


I'm substringing the location to remove the http://mysite.com from the url.

As you can see, SimpleXML is much easier to write, read and uses less code. Using DOMDocument you have the same problem you get when traversing the DOM in javascript - white space is a text node, so to get the <loc> value you can't just do $loc = substr($url->childNodes->item(0)->nodeValue, 24);, you have to do a getElementsByTagName or otherwise check if item(0) is a text node and if it is then get item(1) instead.

After doing that I googled for 'sitemap XSL' to see if there were any examples of how to style a sitemap XML file using XSLT. I found this helpful thread where someone was having exactly the same problem as me: google sitemap XML/XSL problem. So I made the changes they say to there (adding the sitemaps namespace to the XSLT file), and it worked!

Then I read the w3Schools XSL Tutorial to see if it had a substr() or similar function, and it doesn't. You could use javascript or server side scripting to do this, but seems kind of pointless. I think what I will do is have a php sitemap file for people to view (that way I can include page titles rather than just urls) and an xml sitemap for the search engine. The two files will be seperate (so the PHP file won't get its data from the XML file), but both files will be created by PHP when I approve changes to the database, so they will both contain the same up-to-date info.

Here's what my XSLT file looked like, obviously I would actually need parse the file using PHP and add includes to the page template files you get it styled in the same way as the rest of my site:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:url="http://www.sitemaps.org/schemas/sitemap/0.9" exclude-result-prefixes="url">

<xsl:template match="/">
<html>
<head></head>
<body>
<ul>
<xsl:for-each select="url:urlset/url:url">
<li>
<a class="label"><xsl:attribute name="href"><xsl:value-of select="url:loc" /></xsl:attribute>
<xsl:value-of select="url:loc"/>
</a>
<xsl:value-of select="url:lastmod"/>
</li>
</xsl:for-each>
</ul>
</body>
</html>
</xsl:template>

</xsl:stylesheet>


For the rest of the afternoon and evening I did more work on my website, just writing a script to create a PHP/html and xml sitemap.

In the evening I also watched the latest episode of The Office (US) and Flight Of The Conchords, I didn't think either of them were as good as usual, and gave them both a 5/10 on IMDB. I played on Animal for 15 minutes or so while I was waiting for Mac to get The Office ready to watch as well.

The weather today was overcast all day. It drizzled lightly most of the day I think and was also quite foggy all day.

Food
Breakfast: Pink grapefruit marmalade toast sandwich; cup o' tea.
Lunch: Medium cheddar cheese with crunchy salad sandwich; 2x clementines; Chorley cake; Asda triple chocolate Rocky style biscuit; cup o' tea.
Dinner: 2 slices pepperoni pizza; chips. Pudding was a creamy yoghurt. Oreo; coffee; Oreo; coffee. (That's not a mistake).

No comments: