I’ve been qualitatively comparing GVM with k-means by producing some plots using R.
As I anticipated (having used both a number of times) they both perform fairly equally, except GVM only needs to take a single pass through the dataset — a big advantage with datasets that don’t fit in memory — and has a robust upper bound on its execution time.
The fist plot in each set contains the points prior to clustering. The last plot in each set is one obtained using EM.
Waiting time between eruptions and the duration of the eruption for the Old Faithful Geyser in Yellowstone National Park, Wyoming, USA.




Two 2D normal distributions crossing at the origin.




Points uniformly distributed within three circles.



