Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Landmark graph size does not match n_landmark #60

Open
stanleyjs opened this issue Jan 3, 2022 · 0 comments
Open

Landmark graph size does not match n_landmark #60

stanleyjs opened this issue Jan 3, 2022 · 0 comments

Comments

@stanleyjs
Copy link
Collaborator

Hi,

I noticed that master fails test_landmark.test_landmark_knn_graph and test_landmark.test_landmark_knn_pygsp_graph.

The reason these tests fails is due to the shape of the landmark operator. The tests expect the landmark transitions operator to be (data.shape[0], n_landmark) and the landmark operator itself to be (n_landmark, n_landmark). I looked into why this is not true. It turns out that the current way of building clusters, MiniBatchKMeans, is not assigning to all n_landmark clusters, so you can have landmark graphs with <= n_landmark nodes. This happens in the tests. I verified by changing the random seed and checking len(np.unique(G.clusters)), and indeed the size of the cluster assignments changes based on the seed.

There's two ways to fix this bug.

  1. It's working as intended: n_landmarks is an upper bound, rather than an exact target. In this case, we just need to change the tests to reflect this.
  2. It's not working as intended: n_landmarks should be the exact size of the landmark graph. In this case, we need to either a) change MinibatchKMeans to an algorithm that assigns all clusters 100% of the time, or figure out which parameters of MinibatchKMeans ensures this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant