Failing unit tests when the CLIMADA API is not responding #978

bguillod · 2024-11-20T14:48:40Z

Describe the bug
Some unit tests rely on the CLIMADA API. This is a problem when automated workflows (e.g. github workflows) are being run which run these tests, as the workflow might fail if the data API happens to be down or not responding at that moment.

I would like to be able to run the climada.engine.test.test_impact unit tests independently of the CLIMADA API.

To Reproduce
Steps to reproduce the behavior/error:

Run the code below while the climada API is not responding.

Code example:

Run python -m unittest climada.engine.test.test_impact

The following error is raised::

  warnings.warn(
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/test/lib/python3.11/unittest/__main__.py", line 18, in <module>
    main(module=None)
  File "/usr/share/miniconda/envs/test/lib/python3.11/unittest/main.py", line 1, in __init__
    self.parseArgs(argv)
  File "/usr/share/miniconda/envs/test/lib/python3.11/unittest/main.py", line 150, in parseArgs
    self.createTests()
  File "/usr/share/miniconda/envs/test/lib/python3.11/unittest/main.py", line 161, in createTests
    self.test = self.testLoader.loadTestsFromNames(self.testNames,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/test/lib/python3.11/unittest/loader.py", line 232, in loadTestsFromNames
    suites = [self.loadTestsFromName(name, module) for name in names]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/test/lib/python3.11/unittest/loader.py", line 232, in <listcomp>
    suites = [self.loadTestsFromName(name, module) for name in names]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/test/lib/python3.11/unittest/loader.py", line [16]2, in loadTestsFromName
    module = __import__(module_name)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/test/lib/python3.11/site-packages/climada/engine/test/test_impact.py", line 38, in <module>
    from climada.hazard.test.test_base import HAZ_TEST_TC
  File "/usr/share/miniconda/envs/test/lib/python3.11/site-packages/climada/hazard/test/test_base.py", line 44, in <module>
    HAZ_TEST_TC :Path = get_test_file('test_tc_florida')
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/test/lib/python3.11/site-packages/climada/test/__init__.py", line 50, in get_test_file
    client.list_dataset_infos(name=ds_name, status='test_dataset', version='ANY'),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/test/lib/python3.11/site-packages/climada/util/api_client.py", line 4, in list_dataset_infos
    DatasetInfo.from_json(ds) for ds in self._request_200(url, params=params)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/test/lib/python3.11/site-packages/climada/util/api_client.py", line 341, in _request_0
    raise Client.NoConnection(
climada.util.api_client.Client.NoConnection: there is no internet connection and the client has not found any cached result for this request.
Error: Process completed with exit code 1.

Expected behavior
The climada.engine.test.test_impact tests (and imho all unit tests) should ideally not rely on the climada API or any internet connection. The latter should only happen in integration tests.

Climada Version: 5.0.0

System Information (please complete the following information):
This issue should be system-independent.

The text was updated successfully, but these errors were encountered:

bguillod · 2024-11-27T07:57:35Z

@emanuel-schmid I had a closer look and I understand where this comes from, but I am not sure how to go about it because I do not fully understand the rationale behind the test files lying in the climada API. I am sure you can enlighten me on this one 😄

My initial guess (but from my current understanding it is wrong) was that HAZ_TEST_TC in climada/hazard/test/test_base.py, which calls from climada.test.get_test_file(), would only download the file if it is not yet downloaded, in which case I would store it elsewhere and add it, hence removing the dependency on the data API for this unit test. However, it seems like the file is always downloaded via the client. Is this because you can change the test files on the API side whenever you adjust a unit test?

Would it not make more sense to have all test files (beside maybe for integration tests) within the git repository, rather than relying on the API? I am sure there are good reasons for the latter. How would you go about solving my issue? I am happy to create a PR if there are small adjustments which would enable to go around this. Let me know what you think.

emanuel-schmid · 2024-11-27T11:49:44Z

@bguillod : thanks for the note.

Your initial guess has been correct. The Client in get_test_file only downloads files that have not been downloaded before.
To determine, whether a file has been downloaded before the Client looks up the sqlite database ~/climada/data/.downloads.db for entries with the same source url and target path as requested, in this case the entry might look something like:

id	url	path	startdownload	enddownload
42	https://data.iac.ethz.ch/climada/e10f3cdb-2fcb-4f64-921e-3fc628a96a49/test_tc_florida.hdf5	/home/x/climada/data/test/test_tc_florida/v1/test_tc_florida.hdf5	2023-09-06 13:02:11.284926	2023-09-06 13:02:13.064562

The crucial values are url and path, both must match exactly. So if one moves the climada/data directory by moving it and also setting the { "local_data : { "system": "/custom/path" }} value, the client doesn't realize that.

I'd suggest to replace the target pathes in the sqlite database. You can do that, e.g., like this:

import pandas as pd
import sqlite3

old_download_db = "/home/x/climada/data/.downloads.db"
new_download_db = "/home/y/climada/data/.downloads.db"

old_sysdir = "/home/x/climada/data/"
new_sysdir = "/home/y/climada/data/"

downloads = pd.read_sql("select * from download", sqlite3.connect(old_download_db), index_col='id')
downloads.path = downloads.path.apply(lambda x: x.replace(old_sysdir, new_sysdir))
downloads.to_sql("download", sqlite3.connect(new_download_db))

emanuel-schmid · 2024-11-27T11:57:29Z

The reason why we try to avoid having binary files in the git repository is mainly because we try to avoid binary files, as they are useless for git just increase the size of the repo.
The solution with files from the api is not ideal either (as this issue illustrates), but we consider it to be less harm.
The proper solution is to define micro datasets through text/code within the test modules.

bguillod added bug conventions Coding conventions or style external resources Problems encountered when using external resources labels Nov 20, 2024

bguillod assigned emanuel-schmid Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing unit tests when the CLIMADA API is not responding #978

Failing unit tests when the CLIMADA API is not responding #978

bguillod commented Nov 20, 2024 •

edited

Loading

bguillod commented Nov 27, 2024

emanuel-schmid commented Nov 27, 2024 •

edited

Loading

emanuel-schmid commented Nov 27, 2024

Failing unit tests when the CLIMADA API is not responding #978

Failing unit tests when the CLIMADA API is not responding #978

Comments

bguillod commented Nov 20, 2024 • edited Loading

bguillod commented Nov 27, 2024

emanuel-schmid commented Nov 27, 2024 • edited Loading

emanuel-schmid commented Nov 27, 2024

bguillod commented Nov 20, 2024 •

edited

Loading

emanuel-schmid commented Nov 27, 2024 •

edited

Loading