Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format of Search CV parameters #1231

Open
anuprulez opened this issue Aug 8, 2022 · 12 comments
Open

Format of Search CV parameters #1231

anuprulez opened this issue Aug 8, 2022 · 12 comments

Comments

@anuprulez
Copy link
Contributor

anuprulez commented Aug 8, 2022

As reported by one of our students, the tabular dataset returned by the search CV tool is populating the list differently as documented earlier in many ML tutorials. Do we know what needs to be changed in the tool parameter selection to make the search CV tool run? I remember we used to get a list of all hyperparameters and then we could select those whose values need to be optimized.

search_cv1

I tried a few ways to make it work - by changing tool's version, using different options in Choose a parameter name (with current value) but the jobs are failing. Do you know what should be the correct setting, then I will update the tutorials accordingly. I am not sure if this is a bug or it has been formatted like this (above image).

The hyperparameters were listed previously as:

searchcv2.

link to the history: https://usegalaxy.eu/u/kumara/h/age-prediction

ping @qiagu

Thanks a lot!

@qiagu
Copy link
Contributor

qiagu commented Aug 8, 2022

@anuprulez Thanks for exposing the issue. My intuition is that changes of Galaxy's tool infrastructure caused the issue. I will look deep into it soon.

@qiagu
Copy link
Contributor

qiagu commented Aug 8, 2022

@anuprulez Unfortunately, I couldn't reproduce the issue shown in the shared history. I have correct list of parameters in the dropdown in my local instance. One thing worth mentioning: use only the list in the Search list, having parameter name in the front, like selectkbest__k: [5880, 5890, 5895, 5900], would lead to failure.

@qiagu
Copy link
Contributor

qiagu commented Aug 8, 2022

As wrapped here https://github.com/bgruening/galaxytools/blob/master/tools/sklearn/search_model_validation.xml#L11-L17, lines starting with @ should be selected to be search parameters, i.e., showing as the dropdown options. The EU instance seems to do the opposite, excluding lines having @. I have no idea what causes that, since I can't reproduce the issue elsewhere.

@anuprulez
Copy link
Contributor Author

anuprulez commented Aug 10, 2022

@qiagu thanks a lot for looking into it. I found a fix here: #1232
Without this change, most of the tool tests (that are expected to pass) were failing in my local instance. Two changes worked: either removing the entire attribute startswith="@" or adding quotes for @: startswith=" '@' ". I chose the latter option. I hope it works on the EU server. It seems to be related to parsing of special symbols I guess but not 100% sure.

@anuprulez
Copy link
Contributor Author

Adding quotes to @ symbol did not work, unfortunately. But, removing it did and the tests are now passing. Removing it will add a few extra params in the list of estimator params of which the most we already show using @ as a startwith filter. It looks like there is some issue with parsing of this special symbol on EU server in the tool form, not 100% sure though.

@anuprulez
Copy link
Contributor Author

Same issue on Galaxy Main as well. We should keep this issue open until it is fixed

params_galaxy_main

main_diff

@qiagu
Copy link
Contributor

qiagu commented Aug 12, 2022

Ha, the startswith issue got fixed 29 days ago.
See: galaxyproject/galaxy@d642ce3

@bgruening
Copy link
Owner

Do you want to ask for a backport?

@qiagu
Copy link
Contributor

qiagu commented Aug 12, 2022

My bad. I'd correct myself. The issue seems to be caused by the commit 29 days ago. We may need to revert the change.
galaxyproject/galaxy@d642ce3#diff-7a49a06712755de974dbcb43a317151e9ceee2af1e58914e28f2f4c7de9bdf9aL648-R648

@bgruening
Copy link
Owner

I guess you need to open an issue then.

ping @bernt-matthias we well.

Thanks @qiagu for finding this.

@bernt-matthias
Copy link
Collaborator

bernt-matthias commented Aug 12, 2022

As far as I see this repo is using 21.05 for testing PRs. So it should not be affected by galaxyproject/galaxy#14332 which was for 22.01, or?

Anyway, galaxyproject/galaxy#14332 brought the code in line with the docs. In the Galaxy tool xml docs you will find: startswith: Ignore lines starting with the given string. Unfortunately that was never what was implemented in Galaxy. So I changed the code. As alternative we might revert the code change (ie fix) the docs.

Edit: Actually the docs were wrong (just added approx a year ago .. but the code is approx 15 years old)

@bernt-matthias
Copy link
Collaborator

Will be fixed in galaxyproject/galaxy#14440

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants