Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update unsequa to fully support the latest version of SALib #886

Merged
merged 30 commits into from
Jun 11, 2024

Conversation

luseverin
Copy link
Collaborator

@luseverin luseverin commented May 30, 2024

Changes proposed in this PR:

  • Update SALib sensitivity and sampling methods from newest version (SALib 1.4.7) #828
  • Allow for computation of relative and absolute delta impacts in CalcDeltaClimate

This PR fixes #828

PR Author Checklist

PR Reviewer Checklist

@luseverin luseverin requested review from simonameiler and chahank May 30, 2024 12:04
@luseverin
Copy link
Collaborator Author

A few more comments on this PR:

  • Three new sensitivity methods required a specific treatment to be included into unsequa: ff, hdmr, and delta
  • The output of the ff and hdmr methods from SALib required refactoring in order to fit unsequa's _sens_df output format. I had to remove a few variables from the output of hdmr as I could not find a way to refactor them to the _sens_df. These include notably all the details of the surrogate model used, included in the Em (Emulator) variable. Additionally, all output variables from hdmr are refactored to 2D arrays, even the first-order sensitivity indices.
  • The delta method systematically failed with a matrix inversion error: LinAlgError: Singular matrix detected This may be due to the sample size (30000) being too small. If this is not the case, check Y values or raise an issue with the SALib team. I could not find why that was so I removed delta from the compatible sensitivity methods.
  • I also have some doubts on the stability of the hdmr method as it seem to yield different results between identical model runs. I decreased the sensitivity of the AssertEqual tests in the unit test in consequence so that it should not be failing.

@@ -152,6 +154,7 @@ Changed:

- `geopandas` >=0.13 → >=0.14
- `pandas` >=1.5,<2.0 &rarr; >=2.1
- `salib` >=1.3.0 &rarr; >=1.4.7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emanuel-schmid : why would we indicate two different larger than versions (as for geopandas)?

Copy link
Collaborator

@emanuel-schmid emanuel-schmid May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about dependency version constraint changes. They start from the given minimal version of the last release (left hand side) and end at the given minimal version of this release (right hand side).
However: the whole version change section is eventually done at release time by a script I wrote. Don't bother, it will be overwritten!

Comment on lines 256 to 258
if param not in self.distr_dict: #dummy_0 param added to uniform_base_sample
#when using ff method, need to ignore it?
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rather explicitly catch the case for dummy_ variables and only ignore these instead of all possible params.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this line of code? As we now raise a ValueError when the number of parameters is not a power of 2 and require the user to input their own dummy parameter beforehand. Then if the user inputs their own dummy parameter, the dummy parameters should be covered by the df_samples[param] = df_samples[param].apply(self.distr_dict[param].ppf), correct?

@@ -500,10 +536,44 @@ def _calc_sens_df(method, problem_sa, sensitivity_kwargs, param_labels, X, unc_d
else:
sens_indices = method.analyze(problem_sa, Y,
**sensitivity_kwargs)
#refactor incoherent SALib output
nparams = len(param_labels)
if method.__name__ == 'SALib.analyze.ff':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the attribute sensitivity_method? This would be more stable I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is not available from within _calc_sens_df, I think? But we can pass it as an argument of _calc_sens_df if you think it is worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm not sure. But maybe then just check for the last 2 strings ff. Because I vaguely remember that Salib had played around with changing the module structure, such that it would not be Salib.analyse.ff but something Salib.xxx.ff.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok then I changed it to only check if the corresponding last characters of the method name string correspond to the target method name instead of the entire Salib.analyse.ff.

@luseverin
Copy link
Collaborator Author

Great, thanks!

I think there are a few small things to update, and maybe one general idea for the tests. Since the tests for the different method repeat themselves, it would be better to define a function and then call it in a loop for all methods instead of copying large parts of the code.

Ok so I included your suggested changes + I added a prototype of a generic testing function for testing the different sensitivity methods. I store all the parameters and expected tests results in a dict which I pass to the function that gets looped over. Maybe not the most efficient way to do it so please let me know if you have any suggestions for improvements.

@chahank
Copy link
Member

chahank commented Jun 4, 2024

Great job, we are almost there! What is missing:

  • Address all the new linter issues (and any older ones that you solve are bonus points!)
  • The method for the test is in the right direction. I would maybe propose to simplify the whole thing a bit. More on that in the code directly.

@luseverin
Copy link
Collaborator Author

Ok so I tried to simplify the tests of the sensitivity methods as much as possible in the generic test functions, and removed the previous method-specific test functions. I wasn't sure about a few lines that only appeared in the test for the morris method though:

for name, attr in unc_data.__dict__.items():
            if 'sens_df' in name:
                np.testing.assert_array_equal(
                    attr.param.unique(),
                    np.array(['x_exp', 'x_paa', 'x_mdd'])
                    )
                np.testing.assert_array_equal(
                    attr.si.unique(),
                    np.array(['mu', 'mu_star', 'sigma', 'mu_star_conf'])
                    )
                if 'eai' in name:
                    self.assertEqual(
                        attr.size,
                        len(unc_data.param_labels)*4*(len(exp_unc.evaluate().gdf) + 3)
                        )
                elif 'at_event' in name:
                    self.assertEqual(
                        attr.size,
                        len(unc_data.param_labels) * 4 * (haz.size + 3)
                        )
                else:
                    self.assertEqual(len(attr),
                                     len(unc_data.param_labels) * 4
                                     )

I did not include these lines in the generic test function as it did not seem absolutely necessary to me, but if you think those should be included just let me know.

Additionally, I need to fix the few new linter issues that popped up but I think it shouldn't be an issue. Otherwise I think I adressed the rest of your comments.

Comment on lines 588 to 592
# Assume sens_first_order_dict is a dictionary where values are lists/arrays of varying lengths
# !for some reason this make the plotting methods fail
#sens_first_order_df = pd.DataFrame({k: pd.Series(v, dtype=object)
# for k, v in sens_first_order_dict.items()})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that still needs to be addressed or is it a left-over?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a left-over, I cleaned it up.

@@ -36,6 +36,8 @@
from climada.hazard import Hazard
from climada.engine import ImpactCalc
from climada.engine.unsequa import InputVar, CalcImpact, UncOutput, CalcCostBenefit, CalcDeltaImpact
from climada.engine.unsequa.calc_base import LOGGER as ILOG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for renaming the logger? Why not just use LOGGER ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, no reason, it was just blind copying of the code I saw on stackoverflow, I'll change that!

@luseverin
Copy link
Collaborator Author

Ok so I incorporated your last suggestions. If there is no further comments from your side I suggest we merge this PR?

@chahank
Copy link
Member

chahank commented Jun 11, 2024

I think it looks good!

Just FYI: you managed to remove 39 pylint warnings, and add 16. Maybe you can take a last look whether the 16 new ones are easy to avoid or not.

Else, this is ready to merge from my point of view.

@luseverin
Copy link
Collaborator Author

So I have checked and the warnings are mainly "too-many-locals", "too-complex", "invalid-name", etc. Most of those are coming from lines that I not directly touched or that got included to the commits because of trailing whitespaces that got removed by VScode. For instance, plot_rp_uncertainty is too complex, or they are too many local variables in Calc_delta_climate.uncertainty. It is not straightforward to me how these should be treated so I would rather leave them as is for the moment if that's fine..

@chahank
Copy link
Member

chahank commented Jun 11, 2024

Thanks for checking. For the moment these are fine I think.

Feel free to merge, and thanks for the work!

@luseverin luseverin merged commit 9faf54b into develop Jun 11, 2024
12 checks passed
@luseverin luseverin deleted the feature/unsequa_salib_update branch June 11, 2024 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants