Wednesday 9 October 2013

Getting SkipFiles working in awstats

This morning I was testing out different SkipFiles syntax for awstats to try and make it exclude certain files in its reports. I tried a wide range of different things, but I couldn't find anything that worked properly. The problem was that when the URL had a query string, then awstats refused to skip it.

Doing some searching, I found this thread: Virtual hosts and SkipFiles do not seem to work. According to the posters there, something is broken when using perl 5.12 and up (perl 5.14.2 is installed on my machine).

I had a look to see if I could easily install an older perl version on my system. I didn't do an exhaustive search, but it seems that on ubuntu you can only easily install the latest version from the repository. I'd probably have to build from source to get an older version, too much hassle for me at the moment.

I checked my web host, and they use perl 5.8.8, so in theory the regex syntax for skipfiles should work on there. After a bit of thinking I realised I could use a fake log to test whether the skipfiles directive was working on the web server.

With a log like so:

127.0.0.1 - - [08/Oct/2013:08:11:46 +0000] "GET /wp-cron.php?beans HTTP/1.0" 200 20 "-" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:24.0) Gecko/20100101 Firefox/24.0"
127.0.0.1 - - [08/Oct/2013:08:11:47 +0000] "GET /wp-cron.php HTTP/1.0" 200 20 "-" "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:24.0) Gecko/20100101 Firefox/24.0"

And the following SkipFiles in awstats:

SkipFiles="/xmlrpc.php REGEX[^\/wp-admin\/] REGEX[^\/wp-cron\.php] REGEX[^\/wp-login\.php]"

Updating awstats gave the following:

Parsed lines in file: 2
 Found 2 dropped records,
 Found 0 corrupted records,
 Found 0 old records,
 Found 0 new qualified records.

So it looks like my issues should be solved, so long as my webhost doesn't upgrade its perl version.

After doing a bit more searching, I found that the problem is not with perl itself. Rather the way awstats was written was not compatible with perl 5.12+ (see AWStats 7.0 *BROKEN* with perl 5.14. I tried downloading the latest version of awstats to my PC, and now SkipFiles worked correctly when running an update.

I'm not sure what was changed in awstats, and whether the new version is compatible with perl < 5.12. So I'll probably leave the current version on the web server. I'll update it if the server's perl version gets updated and the skipped files start being recorded on the stats.

Going through my stats I saw quite a 301 redirects for one of my sites. When I looked into it, these requests were missing the trailing slash from a directory. I checked the actual website, and the trailing slash was not missing from the links to this directory. Furthermore, the query string parameters were in a different order in these requests to the order they are in in the links on the website.

Possibly bot behaviour, but I don't get why they'd strip the trailing slash and re-order the query string. Very strange!

No comments: