Update unsequa to fully support the latest version of SALib #886

luseverin · 2024-05-30T12:04:34Z

Changes proposed in this PR:

Update SALib sensitivity and sampling methods from newest version (SALib 1.4.7) #828
Allow for computation of relative and absolute delta impacts in CalcDeltaClimate

This PR fixes #828

PR Author Checklist

PR Reviewer Checklist

luseverin · 2024-05-30T12:20:53Z

A few more comments on this PR:

Three new sensitivity methods required a specific treatment to be included into unsequa: ff, hdmr, and delta
The output of the ff and hdmr methods from SALib required refactoring in order to fit unsequa's _sens_df output format. I had to remove a few variables from the output of hdmr as I could not find a way to refactor them to the _sens_df. These include notably all the details of the surrogate model used, included in the Em (Emulator) variable. Additionally, all output variables from hdmr are refactored to 2D arrays, even the first-order sensitivity indices.
The delta method systematically failed with a matrix inversion error: LinAlgError: Singular matrix detected This may be due to the sample size (30000) being too small. If this is not the case, check Y values or raise an issue with the SALib team. I could not find why that was so I removed delta from the compatible sensitivity methods.
I also have some doubts on the stability of the hdmr method as it seem to yield different results between identical model runs. I decreased the sensitivity of the AssertEqual tests in the unit test in consequence so that it should not be failing.

chahank · 2024-05-30T12:34:20Z

CHANGELOG.md

@@ -152,6 +154,7 @@ Changed:

 - `geopandas` >=0.13 &rarr; >=0.14
 - `pandas` >=1.5,<2.0 &rarr; >=2.1
+- `salib` >=1.3.0 &rarr; >=1.4.7


@emanuel-schmid : why would we indicate two different larger than versions (as for geopandas)?

This is about dependency version constraint changes. They start from the given minimal version of the last release (left hand side) and end at the given minimal version of this release (right hand side).
However: the whole version change section is eventually done at release time by a script I wrote. Don't bother, it will be overwritten!

climada/engine/unsequa/calc_base.py

chahank · 2024-05-30T12:39:42Z

climada/engine/unsequa/calc_base.py

+            if param not in self.distr_dict: #dummy_0 param added to uniform_base_sample
+                                             #when using ff method, need to ignore it?
+                continue


I think we should rather explicitly catch the case for dummy_ variables and only ignore these instead of all possible params.

Do we still need this line of code? As we now raise a ValueError when the number of parameters is not a power of 2 and require the user to input their own dummy parameter beforehand. Then if the user inputs their own dummy parameter, the dummy parameters should be covered by the df_samples[param] = df_samples[param].apply(self.distr_dict[param].ppf), correct?

climada/engine/unsequa/calc_base.py

chahank · 2024-05-30T12:47:53Z

climada/engine/unsequa/calc_base.py

@@ -500,10 +536,44 @@ def _calc_sens_df(method, problem_sa, sensitivity_kwargs, param_labels, X, unc_d
        else:
            sens_indices = method.analyze(problem_sa, Y,
                                                    **sensitivity_kwargs)
+        #refactor incoherent SALib output
+        nparams = len(param_labels)
+        if method.__name__ == 'SALib.analyze.ff':


Why not use the attribute sensitivity_method? This would be more stable I think.

Because it is not available from within _calc_sens_df, I think? But we can pass it as an argument of _calc_sens_df if you think it is worth it.

hmmm not sure. But maybe then just check for the last 2 strings ff. Because I vaguely remember that Salib had played around with changing the module structure, such that it would not be Salib.analyse.ff but something Salib.xxx.ff.

Ok then I changed it to only check if the corresponding last characters of the method name string correspond to the target method name instead of the entire Salib.analyse.ff.

…g method

…add more robust check of name of sensensitivity method

luseverin · 2024-05-31T15:27:52Z

Great, thanks!

I think there are a few small things to update, and maybe one general idea for the tests. Since the tests for the different method repeat themselves, it would be better to define a function and then call it in a loop for all methods instead of copying large parts of the code.

Ok so I included your suggested changes + I added a prototype of a generic testing function for testing the different sensitivity methods. I store all the parameters and expected tests results in a dict which I pass to the function that gets looped over. Maybe not the most efficient way to do it so please let me know if you have any suggestions for improvements.

chahank · 2024-06-04T12:50:41Z

Great job, we are almost there! What is missing:

Address all the new linter issues (and any older ones that you solve are bonus points!)
The method for the test is in the right direction. I would maybe propose to simplify the whole thing a bit. More on that in the code directly.

climada/engine/unsequa/test/test_unsequa.py

luseverin · 2024-06-10T15:55:04Z

Ok so I tried to simplify the tests of the sensitivity methods as much as possible in the generic test functions, and removed the previous method-specific test functions. I wasn't sure about a few lines that only appeared in the test for the morris method though:

for name, attr in unc_data.__dict__.items():
            if 'sens_df' in name:
                np.testing.assert_array_equal(
                    attr.param.unique(),
                    np.array(['x_exp', 'x_paa', 'x_mdd'])
                    )
                np.testing.assert_array_equal(
                    attr.si.unique(),
                    np.array(['mu', 'mu_star', 'sigma', 'mu_star_conf'])
                    )
                if 'eai' in name:
                    self.assertEqual(
                        attr.size,
                        len(unc_data.param_labels)*4*(len(exp_unc.evaluate().gdf) + 3)
                        )
                elif 'at_event' in name:
                    self.assertEqual(
                        attr.size,
                        len(unc_data.param_labels) * 4 * (haz.size + 3)
                        )
                else:
                    self.assertEqual(len(attr),
                                     len(unc_data.param_labels) * 4
                                     )

I did not include these lines in the generic test function as it did not seem absolutely necessary to me, but if you think those should be included just let me know.

Additionally, I need to fix the few new linter issues that popped up but I think it shouldn't be an issue. Otherwise I think I adressed the rest of your comments.

chahank · 2024-06-11T10:01:59Z

climada/engine/unsequa/calc_base.py

+    # Assume sens_first_order_dict is a dictionary where values are lists/arrays of varying lengths
+    # !for some reason this make the plotting methods fail
+    #sens_first_order_df = pd.DataFrame({k: pd.Series(v, dtype=object)
+    #                                    for k, v in sens_first_order_dict.items()})
+


Is this something that still needs to be addressed or is it a left-over?

It's a left-over, I cleaned it up.

chahank · 2024-06-11T10:03:17Z

climada/engine/unsequa/test/test_unsequa.py

@@ -36,6 +36,8 @@
 from climada.hazard import Hazard
 from climada.engine import ImpactCalc
 from climada.engine.unsequa import InputVar, CalcImpact, UncOutput, CalcCostBenefit, CalcDeltaImpact
+from climada.engine.unsequa.calc_base import LOGGER as ILOG


Any particular reason for renaming the logger? Why not just use LOGGER ?

No, no reason, it was just blind copying of the code I saw on stackoverflow, I'll change that!

climada/engine/unsequa/test/test_unsequa.py

luseverin · 2024-06-11T11:40:37Z

Ok so I incorporated your last suggestions. If there is no further comments from your side I suggest we merge this PR?

chahank · 2024-06-11T12:14:10Z

I think it looks good!

Just FYI: you managed to remove 39 pylint warnings, and add 16. Maybe you can take a last look whether the 16 new ones are easy to avoid or not.

Else, this is ready to merge from my point of view.

luseverin · 2024-06-11T12:34:53Z

So I have checked and the warnings are mainly "too-many-locals", "too-complex", "invalid-name", etc. Most of those are coming from lines that I not directly touched or that got included to the commits because of trailing whitespaces that got removed by VScode. For instance, plot_rp_uncertainty is too complex, or they are too many local variables in Calc_delta_climate.uncertainty. It is not straightforward to me how these should be treated so I would rather leave them as is for the moment if that's fine..

chahank · 2024-06-11T12:56:52Z

Thanks for checking. For the moment these are fine I think.

Feel free to merge, and thanks for the work!

Simona Meiler and others added 17 commits January 23, 2024 17:07

update sensitivity and sampling methods of SALib incl. tests

d0ab1fe

Merge branch 'develop' into feature/unsequa_salib_update

88b9e80

update changelog

c91b7c4

update unit tests

7a8e0d0

Merge develop into feature/unsequa_salib_update

f1ad2e3

Start updating unsequa.calc_base to include SAlib's 'ff method'

1919ef8

Finish incorporating ff method in unsequa

df80951

Fix error when no second_order argument for handling ff method

45c2ed9

Cast hdmr output to 2D array to make it compatible with unsequa

9b62a74

Correct removing of problematic output from hdmr

b2c5ee6

Add tests for hdmr and ff

8121140

Update docstring for delta and hdmr methods

a6f938f

Remove delta from the compatible methods

8a84183

Allow for computation of relative and absolute delta climate impacts

45086be

Clean code

972d2f2

Decrease precision level for hdmr test because fails otherwise

a81035d

Update changelog

8fcd334

luseverin requested review from simonameiler and chahank May 30, 2024 12:04

luseverin added the Code update label May 30, 2024