Merge branch 'develop' into feature/set_geometry

CLIMADA-project · Jul 9, 2024 · 2cdcdb7 · 2cdcdb7
2 parents c9d7c95 + f825ca5
commit 2cdcdb7
Show file tree

Hide file tree

Showing 20 changed files with 1,074 additions and 415 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -16,15 +16,23 @@ Code freeze date: YYYY-MM-DD
 
 ### Changed
 
+- Use Geopandas GeoDataFrame.plot() for centroids plotting function [896](https://github.com/CLIMADA-project/climada_python/pull/896)
+- Update SALib sensitivity and sampling methods from newest version (SALib 1.4.7) [#828](https://github.com/CLIMADA-project/climada_python/issues/828)
+- Allow for computation of relative and absolute delta impacts in `CalcDeltaClimate`
 - Remove content tables and make minor improvements (fix typos and readability) in
 CLIMADA tutorials. [#872](https://github.com/CLIMADA-project/climada_python/pull/872)
 - Centroids complete overhaul. Most function should be backward compatible. Internal data is stored in a geodataframe attribute. Raster are now stored as points, and the meta attribute is removed. Several methds were deprecated or removed. [#787](https://github.com/CLIMADA-project/climada_python/pull/787)
 - Improved error messages produced by `ImpactCalc.impact()` in case impact function in the exposures is not found in impf_set [#863](https://github.com/CLIMADA-project/climada_python/pull/863)
+- Update the Holland et al. 2010 TC windfield model and introduce `model_kwargs` parameter to adjust model parameters [#846](https://github.com/CLIMADA-project/climada_python/pull/846)
 - Changed module structure: `climada.hazard.Hazard` has been split into the modules `base`, `io` and `plot` [#871](https://github.com/CLIMADA-project/climada_python/pull/871)
+- `Impact.from_hdf5` now calls `str` on `event_name` data that is not strings, and issue a warning then [#894](https://github.com/CLIMADA-project/climada_python/pull/894)
+- `Impact.write_hdf5` now throws an error if `event_name` is does not contain strings exclusively [#894](https://github.com/CLIMADA-project/climada_python/pull/894)
 
 ### Fixed
 
 - Avoid an issue where a Hazard subselection would have a fraction matrix with only zeros as entries by throwing an error [#866](https://github.com/CLIMADA-project/climada_python/pull/866)
+- Allow downgrading the Python bugfix version to improve environment compatibility [#900](https://github.com/CLIMADA-project/climada_python/pull/900)
+- Fix broken links in `CONTRIBUTING.md` [#900](https://github.com/CLIMADA-project/climada_python/pull/900)
 
 ### Added
 
@@ -158,6 +166,7 @@ Changed:
 
 - `geopandas` >=0.13 &rarr; >=0.14
 - `pandas` >=1.5,<2.0 &rarr; >=2.1
+- `salib` >=1.3.0 &rarr; >=1.4.7
 
 Removed:
 

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -22,7 +22,7 @@ Please contact the [lead developers](https://wcr.ethz.ch/research/climada.html)
 
 ## Minimal Steps to Contribute
 
-Before you start, please have a look at our [Developer Guide][devguide].
+Before you start, please have a look at our Developer Guide section in the [CLIMADA Docs][docs].
 
 To contribute follow these steps:
 
@@ -65,21 +65,22 @@ To contribute follow these steps:
 
 ## Resources
 
-The CLIMADA documentation provides a [Developer Guide][devguide].
+The [CLIMADA documentation][docs] provides several Developer Guides.
 Here's a selection of the commonly required information:
 
 * How to use Git and GitHub for CLIMADA development: [Development and Git and CLIMADA](https://climada-python.readthedocs.io/en/latest/guide/Guide_Git_Development.html)
-* Coding instructions for CLIMADA: [Python Dos and Don'ts](https://climada-python.readthedocs.io/en/latest/guide/Guide_PythonDos-n-Donts.html), [Performance Tips](https://climada-python.readthedocs.io/en/latest/guide/Guide_Py_Performance.html), [CLIMADA Conventions](https://climada-python.readthedocs.io/en/latest/guide/Guide_Miscellaneous.html)
-* How to execute tests in CLIMADA: [Testing and Continuous Integration][testing]
+* Coding instructions for CLIMADA: [Python Dos and Don'ts](https://climada-python.readthedocs.io/en/latest/guide/Guide_PythonDos-n-Donts.html), [Performance Tips](https://climada-python.readthedocs.io/en/latest/guide/Guide_Py_Performance.html), [CLIMADA Conventions](https://climada-python.readthedocs.io/en/latest/guide/Guide_CLIMADA_conventions.html)
+* How to execute tests in CLIMADA: [Testing][testing] and [Continuous Integration](https://climada-python.readthedocs.io/en/latest/guide/Guide_continuous_integration_GitHub_actions.html)
 
 ## Pull Requests
 
 After developing a new feature, fixing a bug, or updating the tutorials, you can create a [pull request](https://docs.github.com/en/pull-requests) to have your changes reviewed and then merged into the CLIMADA code base.
 To ensure that your pull request can be reviewed quickly and easily, please have a look at the _Resources_ above before opening a pull request.
-In particular, please check out the [Pull Request instructions](https://climada-python.readthedocs.io/en/latest/guide/Guide_Git_Development.html#Pull-requests).
+In particular, please check out the [Pull Request instructions](https://climada-python.readthedocs.io/en/latest/guide/Guide_Git_Development.html#pull-requests).
 
 We provide a description template for pull requests that helps you provide the essential information for reviewers.
 It also contains a checklist for both pull request authors and reviewers to guide the review process.
 
+[docs]: https://climada-python.readthedocs.io/en/latest/
 [devguide]: https://climada-python.readthedocs.io/en/latest/#developer-guide
-[testing]: https://climada-python.readthedocs.io/en/latest/guide/Guide_Continuous_Integration_and_Testing.html
+[testing]: https://climada-python.readthedocs.io/en/latest/guide/Guide_Testing.html
diff --git a/climada/engine/impact.py b/climada/engine/impact.py
@@ -937,11 +937,6 @@ def write_hdf5(self, file_path: Union[str, Path], dense_imp_mat: bool=False):
 
         The impact matrix can be stored in a sparse or dense format.
 
-        Notes
-        -----
-        This writer does not support attributes with variable types. Please make sure
-        that ``event_name`` is a list of equally-typed values, e.g., all ``str``.
-
         Parameters
         ----------
         file_path : str or Path
@@ -950,6 +945,11 @@ def write_hdf5(self, file_path: Union[str, Path], dense_imp_mat: bool=False):
             If ``True``, write the impact matrix as dense matrix that can be more easily
             interpreted by common H5 file readers but takes up (vastly) more space.
             Defaults to ``False``.
+
+        Raises
+        ------
+        TypeError
+            If :py:attr:`event_name` does not contain strings exclusively.
         """
         # Define writers for all types (will be filled later)
         type_writers = dict()
@@ -983,7 +983,7 @@ def write(group: h5py.Group, name: str, value: Any):
 
         def _str_type_helper(values: Collection):
             """Return string datatype if we assume 'values' contains strings"""
-            if isinstance(next(iter(values)), str):
+            if all((isinstance(val, str) for val in values)):
                 return h5py.string_dtype()
             return None
 
@@ -1037,6 +1037,8 @@ def write_csr(group, name, value):
             # Now write all attributes
             # NOTE: Remove leading underscore to write '_tot_value' as regular attribute
             for name, value in self.__dict__.items():
+                if name == "event_name" and _str_type_helper(value) is None:
+                    raise TypeError("'event_name' must be a list of strings")
                 write(file, name.lstrip("_"), value)
 
     def write_sparse_csr(self, file_name):
@@ -1240,10 +1242,18 @@ def from_hdf5(cls, file_path: Union[str, Path]):
             ).intersection(file.keys())
             kwargs.update({attr: file[attr][:] for attr in array_attrs})
 
-            # Special handling for 'event_name' because it's a list of strings
+            # Special handling for 'event_name' because it should be a list of strings
             if "event_name" in file:
                 # pylint: disable=no-member
-                kwargs["event_name"] = list(file["event_name"].asstr()[:])
+                try:
+                    event_name = file["event_name"].asstr()[:]
+                except TypeError:
+                    LOGGER.warning(
+                        "'event_name' is not stored as strings. Trying to decode "
+                        "values with 'str()' instead."
+                    )
+                    event_name = map(str, file["event_name"][:])
+                kwargs["event_name"] = list(event_name)
 
         # Create the impact object
         return cls(**kwargs)

diff --git a/climada/engine/test/test_impact.py b/climada/engine/test/test_impact.py
@@ -779,7 +779,8 @@ def test_select_event_identity_pass(self):
         ent.exposures.assign_centroids(hazard)
 
         # Compute the impact over the whole exposures
-        imp = ImpactCalc(ent.exposures, ent.impact_funcs, hazard).impact(save_mat=True, assign_centroids=False)
+        imp = ImpactCalc(ent.exposures, ent.impact_funcs, hazard).impact(
+            save_mat=True, assign_centroids=False)
 
         sel_imp = imp.select(event_ids=imp.event_id,
                              event_names=imp.event_name,
@@ -1019,10 +1020,11 @@ def test_write_hdf5_without_imp_mat(self):
 
     def test_write_hdf5_type_fail(self):
         """Test that writing attributes with varying types results in an error"""
-        self.impact.event_name = [1, "a", 1.0, "b", "c", "d"]
-        with self.assertRaises(TypeError) as cm:
+        self.impact.event_name = ["a", 1, 1.0, "b", "c", "d"]
+        with self.assertRaisesRegex(
+            TypeError, "'event_name' must be a list of strings"
+        ):
             self.impact.write_hdf5(self.filepath)
-        self.assertIn("No conversion path for dtype", str(cm.exception))
 
     def test_cycle_hdf5(self):
         """Test writing and reading the same object"""
@@ -1120,6 +1122,15 @@ def test_read_hdf5_full(self):
         impact = Impact.from_hdf5(self.filepath)
         npt.assert_array_equal(impact.imp_mat.toarray(), [[0, 1, 2], [3, 0, 0]])
 
+        # Check with non-string event_name
+        event_name = [1.2, 2]
+        with h5py.File(self.filepath, "r+") as file:
+            del file["event_name"]
+            file.create_dataset("event_name", data=event_name)
+        with self.assertLogs("climada.engine.impact", "WARNING") as cm:
+            impact = Impact.from_hdf5(self.filepath)
+        self.assertIn("'event_name' is not stored as strings", cm.output[0])
+        self.assertListEqual(impact.event_name, ["1.2", "2.0"])
 
 # Execute Tests
 if __name__ == "__main__":

diff --git a/climada/engine/unsequa/calc_base.py b/climada/engine/unsequa/calc_base.py
@@ -203,8 +203,8 @@ def make_sample(self, N, sampling_method='saltelli',
             Number of samples as used in the sampling method from SALib
         sampling_method : str, optional
             The sampling method as defined in SALib. Possible choices:
-            'saltelli', 'fast_sampler', 'latin', 'morris', 'dgsm', 'ff'
-            https://salib.readthedocs.io/en/latest/api.html
+            'saltelli', 'latin', 'morris', 'dgsm', 'fast_sampler', 'ff', 'finite_diff',
+             https://salib.readthedocs.io/en/latest/api.html
             The default is 'saltelli'.
         sampling_kwargs : kwargs, optional
             Optional keyword arguments passed on to the SALib sampling_method.
@@ -215,6 +215,17 @@ def make_sample(self, N, sampling_method='saltelli',
         unc_output : climada.engine.uncertainty.unc_output.UncOutput()
             Uncertainty data object with the samples
 
+        Notes
+        -----
+        The 'ff' sampling method does not require a value for the N parameter.
+        The inputed N value is hence ignored in the sampling process in the case
+        of this method.
+        The 'ff' sampling method requires a number of uncerainty parameters to be
+        a power of 2. The users can generate dummy variables to achieve this
+        requirement. Please refer to https://salib.readthedocs.io/en/latest/api.html
+        for more details.
+
+
         See Also
         --------
         SALib.sample: sampling methods from SALib SALib.sample
@@ -231,11 +242,17 @@ def make_sample(self, N, sampling_method='saltelli',
             'names' : param_labels,
             'bounds' : [[0, 1]]*len(param_labels)
             }
-
+        #for the ff sampler, no value of N is needed. For API consistency the user
+        #must input a value that is ignored and a warning is given.
+        if sampling_method == 'ff':
+            LOGGER.warning("You are using the 'ff' sampler which does not require "
+                           "a value for N. The entered N value will be ignored"
+                           "in the sampling process.")
         uniform_base_sample = self._make_uniform_base_sample(N, problem_sa,
                                                              sampling_method,
                                                              sampling_kwargs)
         df_samples = pd.DataFrame(uniform_base_sample, columns=param_labels)
+
         for param in list(df_samples):
             df_samples[param] = df_samples[param].apply(
                 self.distr_dict[param].ppf
@@ -271,7 +288,7 @@ def _make_uniform_base_sample(self, N, problem_sa, sampling_method,
             SALib sampling method.
         sampling_method: string
             The sampling method as defined in SALib. Possible choices:
-            'saltelli', 'fast_sampler', 'latin', 'morris', 'dgsm', 'ff'
+            'saltelli', 'latin', 'morris', 'dgsm', 'fast_sampler', 'ff', 'finite_diff',
             https://salib.readthedocs.io/en/latest/api.html
         sampling_kwargs: dict()
             Optional keyword arguments passed on to the SALib sampling method.
@@ -292,8 +309,20 @@ def _make_uniform_base_sample(self, N, problem_sa, sampling_method,
         #c.f. https://stackoverflow.com/questions/2724260/why-does-pythons-import-require-fromlist
         import importlib # pylint: disable=import-outside-toplevel
         salib_sampling_method = importlib.import_module(f'SALib.sample.{sampling_method}')
-        sample_uniform = salib_sampling_method.sample(
-            problem = problem_sa, N = N, **sampling_kwargs)
+
+        if sampling_method == 'ff': #the ff sampling has a fixed sample size and
+                                    #does not require the N parameter
+            if problem_sa['num_vars'] & (problem_sa['num_vars'] - 1) != 0:
+                raise ValueError("The number of parameters must be a power of 2. "
+                                 "To use the ff sampling method, you can generate "
+                                 "dummy parameters to overcome this limitation."
+                                 " See https://salib.readthedocs.io/en/latest/api.html")
+
+            sample_uniform = salib_sampling_method.sample(
+            problem = problem_sa, **sampling_kwargs)
+        else:
+            sample_uniform = salib_sampling_method.sample(
+                problem = problem_sa, N = N, **sampling_kwargs)
         return sample_uniform
 
     def sensitivity(self, unc_output, sensitivity_method = 'sobol',
@@ -323,17 +352,21 @@ def sensitivity(self, unc_output, sensitivity_method = 'sobol',
         unc_output : climada.engine.unsequa.UncOutput
             Uncertainty data object in which to store the sensitivity indices
         sensitivity_method : str, optional
-            sensitivity analysis method from SALib.analyse
-            Possible choices:
-                'fast', 'rbd_fact', 'morris', 'sobol', 'delta', 'ff'
-            The default is 'sobol'.
-            Note that in Salib, sampling methods and sensitivity analysis
-            methods should be used in specific pairs.
+            Sensitivity analysis method from SALib.analyse. Possible choices: 'sobol', 'fast',
+            'rbd_fast', 'morris', 'dgsm', 'ff', 'pawn', 'rhdm', 'rsa', 'discrepancy', 'hdmr'.
+            Note that in Salib, sampling methods and sensitivity
+            analysis methods should be used in specific pairs:
             https://salib.readthedocs.io/en/latest/api.html
         sensitivity_kwargs: dict, optional
             Keyword arguments of the chosen SALib analyse method.
             The default is to use SALib's default arguments.
 
+        Notes
+        -----
+        The variables 'Em','Term','X','Y' are removed from the output of the
+        'hdmr' method to ensure compatibility with unsequa.
+        The 'Delta' method is currently not supported.
+
         Returns
         -------
         sens_output : climada.engine.unsequa.UncOutput
@@ -360,7 +393,7 @@ def sensitivity(self, unc_output, sensitivity_method = 'sobol',
 
         sens_output = copy.deepcopy(unc_output)
 
-        #Certaint Salib method required model input (X) and output (Y), others
+        #Certain Salib method required model input (X) and output (Y), others
         #need only ouput (Y)
         salib_kwargs = method.analyze.__code__.co_varnames  # obtain all kwargs of the salib method
         X = unc_output.samples_df.to_numpy() if 'X' in salib_kwargs else None
@@ -500,10 +533,47 @@ def _calc_sens_df(method, problem_sa, sensitivity_kwargs, param_labels, X, unc_d
         else:
             sens_indices = method.analyze(problem_sa, Y,
                                                     **sensitivity_kwargs)
+        #refactor incoherent SALib output
+        nparams = len(param_labels)
+        if method.__name__[-3:] == '.ff': #ff method
+            if sensitivity_kwargs['second_order']:
+                #parse interaction terms of sens_indices to a square matrix
+                #to ensure consistency with unsequa
+                interaction_names = sens_indices.pop('interaction_names')
+                interactions = np.full((nparams, nparams), np.nan)
+                #loop over interaction names and extract each param pair,
+                #then match to the corresponding param from param_labels
+                for i,interaction_name in enumerate(interaction_names):
+                    interactions[param_labels.index(interaction_name[0]),
+                                 param_labels.index(interaction_name[1])] = sens_indices['IE'][i]
+                sens_indices['IE'] = interactions
+
+        if method.__name__[-5:] == '.hdmr': #hdmr method
+            #first, remove variables that are incompatible with unsequa output
+            keys_to_remove = ['Em','Term','select', 'RT', 'Y_em', 'idx', 'X', 'Y']
+            sens_indices = {k: v for k, v in sens_indices.items()
+                            if k not in keys_to_remove}
+            names = sens_indices.pop('names') #names of terms
+
+            #second, refactor to 2D
+            for si, si_val_array in sens_indices.items():
+                if (np.array(si_val_array).ndim == 1 and    #for everything that is 1d and has
+                    np.array(si_val_array).size > nparams): #lentgh > n params, refactor to 2D
+                    si_new_array = np.full((nparams, nparams), np.nan)
+                    np.fill_diagonal(si_new_array, si_val_array[0:nparams]) #simple terms go on diag
+                    for i,interaction_name in enumerate(names[nparams:]):
+                        t1, t2 = interaction_name.split('/') #interaction terms
+                        si_new_array[param_labels.index(t1),
+                                      param_labels.index(t2)] = si_val_array[nparams+i]
+                    sens_indices[si] = si_new_array
+
+
         sens_first_order = np.array([
             np.array(si_val_array)
             for si, si_val_array in sens_indices.items()
-            if (np.array(si_val_array).ndim == 1 and si!='names')  # dirty trick due to Salib incoherent output
+            if (np.array(si_val_array).ndim == 1 # dirty trick due to Salib incoherent output
+                and si!='names'
+                and np.array(si_val_array).size == len(param_labels))
             ]).ravel()
         sens_first_order_dict[submetric_name] = sens_first_order
 
@@ -515,6 +585,7 @@ def _calc_sens_df(method, problem_sa, sensitivity_kwargs, param_labels, X, unc_d
         sens_second_order_dict[submetric_name] = sens_second_order
 
     sens_first_order_df = pd.DataFrame(sens_first_order_dict, dtype=np.number)
+
     if not sens_first_order_df.empty:
         si_names_first_order, param_names_first_order = _si_param_first(param_labels, sens_indices)
         sens_first_order_df.insert(0, 'si', si_names_first_order)