Tuesday 12 March 2013

Stat checking

This morning I was just finishing off the article I started writing yesterday.

In the afternoon I was doing website stats checking. I noticed for my photo website there was a 404 error that seemed to be genuine (i.e. looked to be for a file that should exist but didn't rather than bots looking for vulnerabilities by checking for specific files).

When I looked into it, this file didn't exist, but should have done. How it got deleted, I don't know. It was an image file, and a larger version and thumbnail both existed, just the small one was missing. So, after creating and uploading the missing file, I then spent quite a bit of time checking to see if there were any other missing files. There were quite a few discrepancies.

I have the images arranged in 3 folders - medium, small, and thumbs. Each folder had a different amount of files (should be the same). The folders on my local copy of the site had different numbers of files compared to the server (should be the same). The folders had different numbers of files compared to the number of records in the database (should be the same).

So checking for all the differences and getting everything synced up took quite a while. There weren't any more missing files though, just extra files.

Carrying on with checking my stats, there were some unusual URLs that awstats had listed as 301s. awstats doesn't list the URL that the user was redirected to, so I wanted to check the server logs to see what it was. After downloading the latest server logs, I got a low disk space message.

So then I spent quite a bit of the afternoon and evening working on a shell script to gzip my server logs. I didn't want to zip the latest ones, so I wrote my script to zip all logs except the 10 most recent ones. After zipping, my logs folder went down from about 4GB to about 1GB. (Likely it would have gone down to about 500MB if I'd zipped all logs).

Going back to the strange 301s, I found that actually the request was different to what awstats had listed. awstats listed a request like /somedirjpg as a 301. The actual request was /somedir, which then 301'd to /somedir/ (which was a 403). Why awstats was appending jpg, png, txt etc. to these requests, I don't know.

I finished checking the stats of my sites on webfaction, so more stat checking with google, bing, and hostgator tomorrow probably.

No comments: