Tuesday 7 September 2010

Geoclustering

Today I was just working my geo clustering script. A lot of the time I was trying to work out how get the Haversine distance between 2 points so that the distance would be relative to the zoom level of the map.

In my efforts I realised that the mercator distance used in Introduction to Marker Clustering With Google Maps (which my script is based on) is suitable for clustering for google maps, but may not be suitable for google earth. This is because google earth uses a spherical projection instead of mercator. In mercator projection the sphere becomes stretched horizontally near the top.


$clusters_by_id = array(
array(array( 'lat' => 52.477631239722, 'lon' => -0.92094183),
array( 'lat' => 52.478271239722, 'lon' => -0.92094183), 15
),
array(array( 'lat' => 52.477631239722, 'lon' => -0.92094183),
array( 'lat' => 52.477641239722, 'lon' => -0.92094183), 21
),
array(array( 'lat' => 0, 'lon' => 180),
array( 'lat' => 0, 'lon' => -180), 21
),
array(array( 'lat' => 52.477631239722, 'lon' => -0.92094183),
array( 'lat' => 52.477631239722, 'lon' => -0.92095183), 21
),
array(array( 'lat' => 0, 'lon' => -0.92094183),
array( 'lat' => 0, 'lon' => -0.92095183), 21
),
array(array( 'lat' => 0, 'lon' => 180),
array( 'lat' => 0, 'lon' => 0), 10
)
);
foreach($clusters_by_id as $val){
$this->_zoom = $val[2];
$totalPixWidth = ((self::OFFSET * 2) >> (21 - $this->_zoom));
echo "

".$totalPixWidth;
echo '
pixelDistance = ' . $this->pixelDistance($val[0]['lat'], $val[0]['lon'], $val[1]['lat'], $val[1]['lon'], $this->_zoom);
$haversineDistance = $this->haversineDistance($val[0]['lat'], $val[0]['lon'], $val[1]['lat'], $val[1]['lon'], $this->_zoom);
echo "
haversineDistance = " . $haversineDistance;
$sphericalCosineDistance = $this->sphericalCosineDistance($val[0]['lat'], $val[0]['lon'], $val[1]['lat'], $val[1]['lon'], $this->_zoom);
echo "
sphericalCosineDistance = $sphericalCosineDistance";
}


Gives the following results:

8388608
pixelDistance = 24
haversineDistance = 14
sphericalCosineDistance = 854

536870912
pixelDistance = 24
haversineDistance = 14
sphericalCosineDistance = 854

536870912
pixelDistance = 0
haversineDistance = 0
sphericalCosineDistance = 158795416

536870912
pixelDistance = 15
haversineDistance = 9
sphericalCosineDistance = 511

536870912
pixelDistance = 15
haversineDistance = 14
sphericalCosineDistance = 854

262144
pixelDistance = 131072
haversineDistance = 131072
sphericalCosineDistance = 923030.0017659664154053

As you can see, near the equator there isn't a large difference between mercator (cylindrical) and spherical projections, but as you get further north (or south) the differences become larger.

So I may end up having two clustering scripts (and two sets of clustering tables in the database), one that uses mercator projection (for google maps), and the other that uses spherical projection (for google earth). But I intend to see how the points actually look in google earth and maps first before I go to the trouble of doubling up.

The spherical cosine distance seems to give a totally different value, not sure why. I got it from this page: Calculate distance, bearing and more between Latitude/Longitude points.

After making various different versions of my geo clustering script, I ran a very basic benchmark to see what difference in speed the changes made.
dk_geo_clusters_create Comparison

Each version is the same as the previous version, except with a few changes made (detailed in the Difference column). So the %age difference column is how faster one version is compared to the previous version, not how much faster it is compared to the base version.

I think the difference in speed between versions 2, 3, 4, and 5 can probably be discounted. Each version was only run 5 times, and the difference is small enough that I'd want to run each version more like 100 times to get a more accurate figure for those.

Once I have clustering working well, I may look again at those changes to see if they have any effect.

The conclusions I think that can be drawn is that:
  1. Performing fewer database queries doesn't make a lot of difference (v1 has 43 database queries while v2 has 3 queries, yet is only a tad faster).
  2. Feeding the clustered points back into the cluster function for each successive zoom level is much faster than clustering all points for each successive zoom level (v1.5 vs v2).
  3. Performing the geo coordinates to mercator pixel coordinates transformation when the records are retrieved from the database instead of when calculating the distance between two points (within the clustering function) has a massive benefit (v6 vs v5).
  4. Using Haversine distance (v7) is faster than v5, but slower than v6.


The last two items I haven't done yet, I need to get google earth and google maps working with the v6 version to check it is okay. And I need to see if using the mercator pixel distance based clustering is okay for use with Google Earth. Once I've done that, I'll work on the last two, and then possibly test the v2, 3, 4 and 5 changes again.

In the evening I also wrote this blog post and watched an episode each of Sentai Jetman and Power Rangers with L.

And a useful page I found yesterday, but forgot to mention is: InnoDB vs MyISAM vs Falcon benchmarks – part 1. I was wondering whether to use InnoDB or MyISAM for my cluster table, based on the results there I decided to go for InnoDB.

No comments: