Latest Tweets

 

Clustering Angles

I was recently asked via email for a suggestion of how to adapt my GVM algorithm to cluster angular values. The example that was given in the email makes the problem clear:

Correct clustering: { 1°, 359°, 355° }, { 85°, 89°, 96° }

Incorrect clustering: { 1°, 85°, 89°, 96° }, { 359°, 355° }

Unfortunately, directly adapting the algorithm so that it works with angular data is not possible because the underlying mathematics requires a metric space. This assumption is violated in the case of a circle because there are generally two distances between any two points.

To use GVM (or any other spatial clustering algorithm) to cluster angular data, my approach would be to take each angle, and use it to plot a point on the unit circle in two dimensions, ie. given the angle a, plot x = cos a, y = sin a, and then cluster those coordinates.

Angular clustering

Though the centroids of the resulting clusters may not lie on the circle (the algorithm cannot be constrained to ensure this), the centroids of tight clusters will fall close to it and using arctan (Math.atan2(y,x) in Java) you will be able to recover an angle for the cluster. This with the exception of the degenerate case where the centroid lies at the origin, but in any case you will need to discard clusters who’s centroids fall close to it since these are effectively ‘false’ clusters that arise because the algorithm is unaware of the true constraints of the problem.

It’s may be worth considering the presence of a cluster close to the origin as an indication that the number of clusters is too low and using it as a condition to re-cluster with a greater maximum number of clusters. Though I haven’t tried this myself, it might provide a good heuristic.

  1. tomgibara posted this
blog comments powered by Disqus