-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Permutation_test (1/4): add comments and docstring of the functions #111
base: main
Are you sure you want to change the base?
Conversation
My comments for step_down_max_t need to be verified because there are based on AI. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #111 +/- ##
============================================
+ Coverage 84.25% 100.00% +15.74%
============================================
Files 43 21 -22
Lines 2452 889 -1563
============================================
- Hits 2066 889 -1177
+ Misses 386 0 -386 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks mostly good. I have 2 comments:
- I don't see why we no longer allow to run CV
- the Westfall-Young "maxT" procedure seems overly complicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To re-open the discussion on the API: when I re-implemented the PermutationImportance
class (which differs since X columns are permuted instead of y here), I found it convenient to have a class with .fit and .score, rather than a function like here.
Would having consistency in having both as functions or classes be clearer? If so, I am open to returning to functions if you find it better.
Here again, a difference is that this function does not require train / test split which might be the criterion for using classes.
examples/plot_fmri_data_example.py
Outdated
@@ -152,18 +153,23 @@ def preprocess_haxby(subject=2, memory=None): | |||
SVR_permutation_test_inference = False | |||
if SVR_permutation_test_inference: | |||
# We computed the regularization parameter by CV (C = 0.1) | |||
pval_corr_svr_perm_test, one_minus_pval_corr_svr_perm_test = permutation_test_cv( | |||
X, y, n_permutations=50, C=0.1 | |||
estimator = LinearSVR(C=0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C is the regularization parameter, it is not optimized via CV here. Or am I missing something?
I suggest using something like randomized search.
estimator = LinearSVR(C=0.1) | |
estimator = RandomizedSearchCV( | |
LinearSVR(random_state=42), | |
param_distributions={ "C": np.logspace(-3, 3, 10), }, | |
n_iter=10, | |
n_jobs=5, | |
random_state=42, | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't provide an optimisation by CV because, in the original example, it wasn't the case.
Nevertheless, we need to be careful with optimisation because it will increase the time for running examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an estimate of the compute time ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my computer, the example without the CV is: 5m39s and with CV is: 7m16s.
In estimation, it increases 2 minutes the time of calculation.
One solution can be to store the best value of the parameter and avoid doing the refitting each time. .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, there's L153 SVR_permutation_test_inference = False
so is this even running in the CI?
If not, I suggest leaving the CV to show this possibility to the user without increasing CI time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the time being, we should not change this if there is no explicit reason for that (e.g. significant reduction of documentation generation time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My feeling is that this is toio much time for a method that does not enjoy any theoretical guarantee. Out of curiosity, what do you get if you replace the SVR with a RidgeRegression ?
This is already done in the example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a RidgeCV, so that there is no magic alpha parameter hanging around.
@bthirion Do we do the cross-validation for Ridge and linearSVR or only for Ridge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for this work. Minor details pending.
docs/_downloads/06bb0f682e15138dfd659b791a14e9c6/plot_variable_importance_classif.zip
Outdated
Show resolved
Hide resolved
bd6debe
to
0299315
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 minor things, and then it's good to go !
estimator = Ridge() | ||
pval_corr_ridge_perm_test, one_minus_pval_corr_ridge_perm_test = permutation_test( | ||
# We computed the parameter from a cross valisation (alpha = 0.0215) | ||
estimator = Ridge(alpha=0.0215) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a RidgeCV, so that there is no magic alpha parameter hanging around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't get the point. How much more time does RidgeCV take ?
If you want to have reasonable computation time, then we should dimply get rid of the SVR (especially because we use hard-coded C parameter, which is unclean).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cross-validation of LinearSVR: 56s
Instantiation of LinearSVR: >1ms
Permutation test of LinearSVR: 22O s
Permutation test with RidgeCV: 81.6 s
Permutation test with Ridge: 10.7 s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then another possibility would be to call RidgeCV once to learn the hyperparameter and then plug it into the Permutation test. Does this make computation time more reasonable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly. RidgeCV uses an arbitrary grid that may not contains the correct range of values.
When you use Ridge, you can change the regularizer by a factor of 2 or so, this has limited impact. What you need is just to pick the right magnitude order.
The default value of alpha may not be of the right order of magnitude (?)
Permuting y only has a big impact on the optimal alpha of the variance fit by the model is large (e.g. 90%) : I guess it is not the case here ?
As you started the example a while ago with an alpha close to 0.02, can you confirm using grid search that this is the value to pick ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For alpha equals 1.0, the default value, the algorithm converges for every permutation.
For alpha equals 0.01, the value of the cross-validation based on the data, the algorithm doesn't converge for some permutation.
I forget how I got the value of 0.02, I probably changed the range of value of the CV but with this alpha I didn't get any warnings.
I look in detail what are the values of alpha by using RidgeCV, I find that applying the CV on the data provides a value of 0.1 for alpha and apply the CV on data with a permutation of y, I get a value of alpha equals 10.0. As I expected, the shuffle of alpha changes the optimal alpha. In this case, the change is in 3 orders of magnitude.
What value do you want to pick for having the right magnitude?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, then the question becomes: does it make a true difference in terms of results to use Ridge vs RidgeCV ?
If yes, I think that we should rely on RidgeCV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Ridge or RidgeCV in this example provides the same result; there are not significant voxels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, ha. Then I'd go with defaults parameters, without CV, and add a comment stating that ideally you want to use a RidgeCV, but this is not done here for the sake of time.
I removed the function permutation_test_cv. This function was a permutation_test with cross-validation for finding the best parameter C for the linearSVR. In this context, it can be used as a global surrogate based on Linear SVR but I estimated it was too specific and that the user can do it by hand for the moment.
I move the function step_down_max_t to stat_tool because it's a function used for computing the pvalue.
I modified the function permutation_test in consequence for returning the weight and the distribution of the weights from the permutation of y. I think that this intermediate step is more adapted to a general API.
I modified the test in consequence with this new signature of functions.