Landmark graph size does not match n_landmark #60

stanleyjs · 2022-01-03T17:33:02Z

Hi,

I noticed that master fails test_landmark.test_landmark_knn_graph and test_landmark.test_landmark_knn_pygsp_graph.

The reason these tests fails is due to the shape of the landmark operator. The tests expect the landmark transitions operator to be (data.shape[0], n_landmark) and the landmark operator itself to be (n_landmark, n_landmark). I looked into why this is not true. It turns out that the current way of building clusters, MiniBatchKMeans, is not assigning to all n_landmark clusters, so you can have landmark graphs with <= n_landmark nodes. This happens in the tests. I verified by changing the random seed and checking len(np.unique(G.clusters)), and indeed the size of the cluster assignments changes based on the seed.

There's two ways to fix this bug.

It's working as intended: n_landmarks is an upper bound, rather than an exact target. In this case, we just need to change the tests to reflect this.
It's not working as intended: n_landmarks should be the exact size of the landmark graph. In this case, we need to either a) change MinibatchKMeans to an algorithm that assigns all clusters 100% of the time, or figure out which parameters of MinibatchKMeans ensures this.

The text was updated successfully, but these errors were encountered:

stanleyjs mentioned this issue Jan 3, 2022

Patches: tasklogger log_x, randomized_svd arguments, deprecated graph_shortest_path #62

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Landmark graph size does not match n_landmark #60

Landmark graph size does not match n_landmark #60

stanleyjs commented Jan 3, 2022

Landmark graph size does not match n_landmark #60

Landmark graph size does not match n_landmark #60

Comments

stanleyjs commented Jan 3, 2022