You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that master fails test_landmark.test_landmark_knn_graph and test_landmark.test_landmark_knn_pygsp_graph.
The reason these tests fails is due to the shape of the landmark operator. The tests expect the landmark transitions operator to be (data.shape[0], n_landmark) and the landmark operator itself to be (n_landmark, n_landmark). I looked into why this is not true. It turns out that the current way of building clusters, MiniBatchKMeans, is not assigning to all n_landmark clusters, so you can have landmark graphs with <= n_landmark nodes. This happens in the tests. I verified by changing the random seed and checking len(np.unique(G.clusters)), and indeed the size of the cluster assignments changes based on the seed.
There's two ways to fix this bug.
It's working as intended: n_landmarks is an upper bound, rather than an exact target. In this case, we just need to change the tests to reflect this.
It's not working as intended: n_landmarks should be the exact size of the landmark graph. In this case, we need to either a) change MinibatchKMeans to an algorithm that assigns all clusters 100% of the time, or figure out which parameters of MinibatchKMeans ensures this.
The text was updated successfully, but these errors were encountered:
Hi,
I noticed that master fails
test_landmark.test_landmark_knn_graph
andtest_landmark.test_landmark_knn_pygsp_graph
.The reason these tests fails is due to the shape of the landmark operator. The tests expect the landmark transitions operator to be
(data.shape[0], n_landmark)
and the landmark operator itself to be(n_landmark, n_landmark)
. I looked into why this is not true. It turns out that the current way of building clusters, MiniBatchKMeans, is not assigning to alln_landmark
clusters, so you can have landmark graphs with <=n_landmark
nodes. This happens in the tests. I verified by changing the random seed and checkinglen(np.unique(G.clusters))
, and indeed the size of the cluster assignments changes based on the seed.There's two ways to fix this bug.
n_landmarks
is an upper bound, rather than an exact target. In this case, we just need to change the tests to reflect this.n_landmarks
should be the exact size of the landmark graph. In this case, we need to either a) changeMinibatchKMeans
to an algorithm that assigns all clusters 100% of the time, or figure out which parameters ofMinibatchKMeans
ensures this.The text was updated successfully, but these errors were encountered: