Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix logic in _get_algorithm_definitions to avoid skipping algorithm definitions #498

Merged
merged 1 commit into from
Mar 19, 2024

Conversation

alexklibisz
Copy link
Contributor

Maybe I'm missing something, but it seems like the logic in _get_algorithm_definitions leads to incorrectly skipping algorithm definitions, which I've attempted to fix here.

For example, elastiknn has definitions for the "point types" any and euclidean: https://github.com/erikbern/ann-benchmarks/blob/main/ann_benchmarks/algorithms/elastiknn/config.yml

But, if I run python run.py --algorithm elastiknn-l2lsh --dataset random-xs-20-euclidean --run-disabled --timeout 30 --local --force --runs 1, I get the "Nothing to run" exception. That doesn't make sense IMO. Elastiknn has definitions for the euclidean point type, so there is not "nothing to run".

It seems that the non-any point type is skipped because of the logic in _get_algorithm_definitions. If an algorithm has definitions for any, they take precedence over the definitions for a specific point type (euclidean). We can fix this by changing the logic so that it accumulates all matching point types, rather than just taking the any type and skipping the rest. In other words, we change the elif to a second if.

@maumueller
Copy link
Collaborator

Interesting, this seems to have been broken for a long time (and meant that we excluded many of the implementations of the nmslib library.) Thanks for the fix, @alexklibisz!

I sampled a few implementations and only pynndescent has a somewhat strange structure for euclidean/angular/any. @lmcinnes Could you check if the any entry of https://github.com/erikbern/ann-benchmarks/blob/c4155055ee45a0dc46ee5bf1a90f6fbde927c50d/ann_benchmarks/algorithms/pynndescent/config.yml is useful?

@maumueller maumueller merged commit df8083a into erikbern:main Mar 19, 2024
34 of 41 checks passed
@lmcinnes
Copy link
Contributor

I think the any case is a "fallback" option in case the other matches didn't work out, so it works as a "I don't know what else to do; try this" approach, but if that is not how any is being used then perhaps we just remove the any option for pynndescent? How is any intended to work?

@maumueller
Copy link
Collaborator

Thanks for the quick reply! As it seems, any took precedence over all other configurations, so it might be that all your pynndescent runs were using these parameter settings.

With the fix from @alexklibisz, it will now merge the any/euclidean and any/angular configurations depending on the dataset.

@alexklibisz alexklibisz deleted the fix-get-algorithm-definitions branch March 19, 2024 14:24
@lmcinnes
Copy link
Contributor

I think removing the any option for pynndescent is probably the best option then. Those aren't really optimal parameters for anything, just a reasonable in-between choice to cover possibilities. Best to rely on the specific values for the individual metric types.

maumueller added a commit that referenced this pull request Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants