Tuesday 10 November 2009

MySQLing

This morning I was doing some more work on my photo website, specifically the blog part. I downloaded some plugins for Wordpress that were recommended in an ebook about blogging that I downloaded recently. I didn't download and install all the plugins mentioned in the book, just the ones I thought would be useful:
However, when I installed Sociable and tweetmeme, I found that Sociable already has a twitter link on it, allowing you to tweet the post (which is also the purpose of tweetmeme).

The only problem with the Sociable twitter link is that it posts the link to the page rather than a tinyURL version of the URL.

One of the things I wanted to do when I got time was to create my own URL shortening service, as I'm not a fan of relying on potentially unreliable external services. So this prompted me to get on with creating my own URL shortening service before finishing my photo website, since I'll need the service for the Sociable Twitter links on the blog.

What I wanted to do with my URL shortening service was to base the short URL upon the auto incremented value of the id column of the MySQL table I'd be storing the details in.

I was planning to do this using the MySQL CHAR() function, but when I tried it and had a look at the ASCII character table, I found that the characters I wanted to use (upper and lower case alpha numeric) weren't contiguous.

Also, I wouldn't be able to get the id of the row being inserted until after the row had been inserted. I did try using a trigger AFTER INSERT on the table in question, so as to update the row it had just inserted with the short URL. But I got a message that a trigger can't update the same table it is triggered on.

A way to get round this would be to have one table that is contains your auto increment column and long URL, then store your short URL in a separate table. You could then get your trigger to act after an insert to the first table, and to save the short URL in the 2nd table.

Instead I opted to just try generating a random short URL. The problem with this method is that you the short URL may already exist. Since you make the short URL column UNIQUE, it means you must keep generating new random short URLs and trying to insert them until you generate one that is unique and the insert succeeds.

Of course, this is not a problem until your table contains lots of records. I was basing my URL shortening service on a Sitepoint tutorial. I'm not quite sure how you work out how many unique URL combinations you can get given a number of possible characters and string length, but using the same logic as they used in that sitepoint article (number of possible characters to the power string length), I get 625 = 916,132,832 combinations for a 5 character length string composed of upper and lower case alpha numeric characters.

So, you can probably tell that it will likely be a long time until generating unique URLs becomes a problem. You would have to have 458,066,416 unique short URLs before generating a unique URL on the first go becomes a one in two chance.

I did have a few problems writing my function, which took me quite a long time to figure out. The first was that I was getting an error in phpMyAdmin when I tried to create the function:
There seems to be an error in your SQL query. The MySQL server error output below, if there is any, may also help you in diagnosing the problem

ERROR: Unknown Punctuation String @ 33
STR: //
SQL: DROP FUNCTION IF EXISTS makesURL//# MySQL returned an empty result set (i.e. zero rows).
Eventually I figured out the problem - I had a WHILE loop, but I had written it
WHILE condition
//do stuff
END WHILE;
When it should be
WHILE condition DO
//do stuff
END WHILE;


After figuring that out, I had a problem in that my function to generate a random string wasn't working, it would always return NULL instead of a random string. The function looked like this
DROP FUNCTION IF EXISTS randStr//
CREATE FUNCTION randStr(length INT)
RETURNS CHAR(255) DETERMINISTIC
BEGIN
DECLARE chars CHAR(62);
DECLARE str CHAR(255);
DECLARE i INT DEFAULT 0;
SET chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
WHILE i < length DO
SET str = CONCAT(
str, SUBSTRING(
chars, FLOOR(
1 + (RAND() * 62)
), 1
)
);
SET i = i+1;
END WHILE;
RETURN str;
END
The reason why it wasn't working, and the fix is pretty obvious, but it took me ages to realise what the problem was.

The problem is that str is NULL, so when you CONCAT with anything, you will always get NULL. So the solution is simply to set str to an empty string before you start the WHILE loop: SET str = '';

I also googled for creating a random string in MySQL, and most of the solutions suggested substringing an md5 hash of a random number. Unfortunately, this isn't as good as the proper random string method for the number of possible unique combinations, as the characters are always lower case. Using a 5 character string, an md5 implementation would give 60,466,176 unique combinations compared to 916,132,832 unique combinations (that's 855,666,656 more) for a method that includes uppercase characters.

One of the threads also had a different suggestion, so I tried that, both in one function and also in separate functions as they suggested in that thread.

Here follows the code for my own MySQL random string generating function (randStr), an all-in-one function using the method suggested in the thread mentioned above (randStr2), and lastly the separate random character generating (generateAlpha) and random string generating (randStr3) functions suggested in the thread mentioned above.
DROP FUNCTION IF EXISTS randStr//
CREATE FUNCTION randStr(length INT)
RETURNS CHAR(255) NOT DETERMINISTIC
BEGIN
DECLARE chars CHAR(62);
DECLARE str CHAR(255);
DECLARE i INT DEFAULT 0;
SET chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
SET str = '';
WHILE i < length DO
SET str = CONCAT(
str, SUBSTRING(
chars, FLOOR(
1 + (RAND() * 62)
), 1
)
);
SET i = i+1;
END WHILE;
RETURN str;
END//

DROP FUNCTION IF EXISTS randStr2//
CREATE FUNCTION randStr2(length INT)
RETURNS CHAR(255) NOT DETERMINISTIC
BEGIN
DECLARE str CHAR(255);
DECLARE i INT DEFAULT 0;
SET str = '';
WHILE i < length DO
SET str = CONCAT(
str, ELT(FLOOR(1 + (RAND() * 62)), 'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',
'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
0,1,2,3,4,5,6,7,8,9)
);
SET i = i+1;
END WHILE;
RETURN str;
END//

CREATE FUNCTION generateAlpha()
RETURNS CHAR(1) NOT DETERMINISTIC
RETURN ELT(FLOOR(1 + (RAND() * 62)), 'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',
'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
0,1,2,3,4,5,6,7,8,9);//

CREATE FUNCTION randStr3(length INT)
RETURNS CHAR(255) NOT DETERMINISTIC
BEGIN
DECLARE str CHAR(255);
DECLARE i INT DEFAULT 0;
SET str = '';
WHILE i < length DO
SET str = CONCAT(str, generateAlpha());
SET i = i+1;
END WHILE;
RETURN str;
END//

And the benchmarks for each of these functions (I ran each benchmark a few times and took an average of the time taken for each one):
SELECT BENCHMARK(10000, (SELECT randStr(5))); #0.60s

SELECT BENCHMARK(10000, (SELECT randStr2(5))); #0.74s

SELECT BENCHMARK(10000, (SELECT randStr3(5))); #0.95s

SELECT BENCHMARK(10000, (SELECT SUBSTRING(MD5(RAND()), 5))); #0.08s

As you can see, the md5 hash is 750% faster than the next fastest method, and running the generateAlpha() function in a loop inside another function is slower than just running the same code as part of the loop.

So I wrote my function to generate the short URL like so:
DROP FUNCTION IF EXISTS makesURL//
CREATE FUNCTION makesURL (lURL CHAR(255))
RETURNS CHAR(5) NOT DETERMINISTIC
BEGIN
DECLARE sURL CHAR(5);
WHILE sURL IS NULL DO

#First check if the URL we are trying to shorten has already been shortened
SET sURL = (SELECT shortURL.sURL FROM shortURL WHERE shortURL.lURL = lURL LIMIT 1);
IF sURL IS NOT NULL THEN
RETURN sURL;
END IF;

#Otherwise try and insert it
SET @sURL = randStr(5);
INSERT INTO shortURL VALUES(NULL, @sURL, lURL);
IF (SELECT LAST_INSERT_ID()) THEN
RETURN @sURL;
END IF;
END WHILE;
END
If you're wondering why I try and retrieve the sURL using the lURL inside the loop, the reason is this: It is possible that a record with the lURL may not exist when first looking for it, but between the time of looking for it and trying to insert the record, someone else may add a record for the same lURL to the table. If you didn't check for lURL inside the loop, and this happened, then your function would get caught in an endless loop since the record insert would never occur due to the unique key constraint on the lURL column.

This function has a problem though - if you try and insert a record with an lURL or sURL that already exists, an error will be thrown, and the function will exit, rather than the loop restarting.

To fix this you need to add the following line:
DECLARE CONTINUE HANDLER FOR SQLSTATE '23000' SET @x2 = LAST_INSERT_ID(NULL);
This ensures that the function won't exit if a value already exists, and also that LAST_INSERT_ID() will be 0. Since I am checking the value of LAST_INSERT_ID() to see if the INSERT was successful or not, it needs setting to 0 if the INSERT was not successful, as otherwise if any successful INSERT operations had previously occurred on the same connection, LAST_INSERT_ID() would be set that INSERT operations id.

So, after a few more code changes, my shortURL generating function looks like this:
CREATE FUNCTION makesURL (lURL CHAR(255))
RETURNS CHAR(5) NOT DETERMINISTIC
BEGIN
DECLARE sURL CHAR(5);
DECLARE CONTINUE HANDLER FOR SQLSTATE '23000' BEGIN SET sURL = NULL; END;
LOOP

#First check if the URL we are trying to shorten has already been shortened
SET sURL = (SELECT shortURL.sURL FROM shortURL WHERE shortURL.lURL = lURL LIMIT 1);
IF sURL IS NOT NULL THEN
RETURN sURL;
END IF;

#Otherwise try and insert it
SET sURL = randStr(5);
INSERT INTO shortURL VALUES(NULL, sURL, lURL);
IF sURL IS NOT NULL THEN
RETURN sURL;
END IF;
END LOOP;
END


I spent the evening trying to work out how to write a function that would get a string based on a number (for generating a url based on an auto increment number rather than a random number). I came up with the following:
DROP FUNCTION IF EXISTS strFromNum//
CREATE FUNCTION strFromNum(num INT)
RETURNS CHAR(5) NOT DETERMINISTIC
BEGIN
DECLARE chars CHAR(62);
DECLARE str CHAR(5);
DECLARE i INT DEFAULT 0;
DECLARE j INT;
SET chars = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz';
SET str = '';
WHILE num > 0 DO
SET j = num%62 +1;
SET str = CONCAT(str, SUBSTRING(chars, j, 1));
SET num = FLOOR( num/62 );
END WHILE;
RETURN str;
END//
I then spent the rest of the evening trying to make it so that it would be 0 for 62, rather than 01, as it seemed to me that the string returned should be in single digits until after the number passed was over 62. But no matter how I changed the function, I couldn't get it to generate 0 for 62 and still work properly with other numbers.

About 10.30pm I remembered that HexCode works in the same way, so I decided to see if I could find a guide on converting decimal to hex.

After reading that and changing my code, it's the same as the function above except that I was concatenating the string the wrong way round - the new value each time the loop runs through should be appended to the start of the string rather than the end.

I'm kind of glad that I had nearly got the function right on my first try, but also annoyed that I thought it wasn't right and spent hours trying to fix it.

My string from number hex code style function now looks like this, which allows 0-61 (so 62 numbers) before it will go on to two digits. If you wanted to get the full number of possible values using this in combination with an auto increment column, you'd need to feed it the auto increment value minus 1.
DROP FUNCTION IF EXISTS strFromNum//
CREATE FUNCTION strFromNum(num INT)
RETURNS CHAR(5) NOT DETERMINISTIC
BEGIN
DECLARE chars CHAR(62);
DECLARE str CHAR(5);
DECLARE i INT DEFAULT 0;
DECLARE j INT;
SET chars = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz';
SET str = '';
REPEAT
SET j = num%62 +1;
SET str = CONCAT(SUBSTRING(chars, j, 1), str);
SET num = FLOOR( num/62 );
UNTIL num = 0;
END REPEAT
RETURN str;
END//

I'll have to carry on looking at this and hopefully get my short URL service up and running tomorrow.

The weather today was overcast all day. It started foggy, and the fog gradually cleared through the day. It also rained a bit in the morning. The cloud cover was still thick at sunset, so you couldn't see any sign of the sun.

Food
Breakfast: Bowl of Asda Golden Balls Cereal; Cup o' tea.
Lunch: Sliced pickled beetroot sandwich; 2x Ibuprofen; A few Japanese Doritos (can't read what flavour they are, but they have a picture of a Taco or sumat on the front); Clementine; Banana; Small slice of home-made fruit cake; Rocky; cup o' tea.
Dinner: Chicken and vegetable curry; Rice; sultanas. Pudding was 2x home-made doughnuts with custard. Coffee; 2x pieces of Sainsbury's Caramel Chocolate.

No comments: