Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing unit tests when the CLIMADA API is not responding #978

Open
bguillod opened this issue Nov 20, 2024 · 3 comments
Open

Failing unit tests when the CLIMADA API is not responding #978

bguillod opened this issue Nov 20, 2024 · 3 comments
Assignees
Labels
bug conventions Coding conventions or style external resources Problems encountered when using external resources

Comments

@bguillod
Copy link
Collaborator

bguillod commented Nov 20, 2024

Describe the bug
Some unit tests rely on the CLIMADA API. This is a problem when automated workflows (e.g. github workflows) are being run which run these tests, as the workflow might fail if the data API happens to be down or not responding at that moment.

I would like to be able to run the climada.engine.test.test_impact unit tests independently of the CLIMADA API.

To Reproduce
Steps to reproduce the behavior/error:

  1. Run the code below while the climada API is not responding.

Code example:

Run python -m unittest climada.engine.test.test_impact

The following error is raised::

  warnings.warn(
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/test/lib/python3.11/unittest/__main__.py", line 18, in <module>
    main(module=None)
  File "/usr/share/miniconda/envs/test/lib/python3.11/unittest/main.py", line 1, in __init__
    self.parseArgs(argv)
  File "/usr/share/miniconda/envs/test/lib/python3.11/unittest/main.py", line 150, in parseArgs
    self.createTests()
  File "/usr/share/miniconda/envs/test/lib/python3.11/unittest/main.py", line 161, in createTests
    self.test = self.testLoader.loadTestsFromNames(self.testNames,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/test/lib/python3.11/unittest/loader.py", line 232, in loadTestsFromNames
    suites = [self.loadTestsFromName(name, module) for name in names]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/test/lib/python3.11/unittest/loader.py", line 232, in <listcomp>
    suites = [self.loadTestsFromName(name, module) for name in names]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/test/lib/python3.11/unittest/loader.py", line [16]2, in loadTestsFromName
    module = __import__(module_name)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/test/lib/python3.11/site-packages/climada/engine/test/test_impact.py", line 38, in <module>
    from climada.hazard.test.test_base import HAZ_TEST_TC
  File "/usr/share/miniconda/envs/test/lib/python3.11/site-packages/climada/hazard/test/test_base.py", line 44, in <module>
    HAZ_TEST_TC :Path = get_test_file('test_tc_florida')
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/test/lib/python3.11/site-packages/climada/test/__init__.py", line 50, in get_test_file
    client.list_dataset_infos(name=ds_name, status='test_dataset', version='ANY'),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/test/lib/python3.11/site-packages/climada/util/api_client.py", line 4, in list_dataset_infos
    DatasetInfo.from_json(ds) for ds in self._request_200(url, params=params)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/test/lib/python3.11/site-packages/climada/util/api_client.py", line 341, in _request_0
    raise Client.NoConnection(
climada.util.api_client.Client.NoConnection: there is no internet connection and the client has not found any cached result for this request.
Error: Process completed with exit code 1.

Expected behavior
The climada.engine.test.test_impact tests (and imho all unit tests) should ideally not rely on the climada API or any internet connection. The latter should only happen in integration tests.

Climada Version: 5.0.0

System Information (please complete the following information):
This issue should be system-independent.

@bguillod bguillod added bug conventions Coding conventions or style external resources Problems encountered when using external resources labels Nov 20, 2024
@bguillod
Copy link
Collaborator Author

@emanuel-schmid I had a closer look and I understand where this comes from, but I am not sure how to go about it because I do not fully understand the rationale behind the test files lying in the climada API. I am sure you can enlighten me on this one 😄

My initial guess (but from my current understanding it is wrong) was that HAZ_TEST_TC in climada/hazard/test/test_base.py, which calls from climada.test.get_test_file(), would only download the file if it is not yet downloaded, in which case I would store it elsewhere and add it, hence removing the dependency on the data API for this unit test. However, it seems like the file is always downloaded via the client. Is this because you can change the test files on the API side whenever you adjust a unit test?

Would it not make more sense to have all test files (beside maybe for integration tests) within the git repository, rather than relying on the API? I am sure there are good reasons for the latter. How would you go about solving my issue? I am happy to create a PR if there are small adjustments which would enable to go around this. Let me know what you think.

@emanuel-schmid
Copy link
Collaborator

emanuel-schmid commented Nov 27, 2024

@bguillod : thanks for the note.

Your initial guess has been correct. The Client in get_test_file only downloads files that have not been downloaded before.
To determine, whether a file has been downloaded before the Client looks up the sqlite database ~/climada/data/.downloads.db for entries with the same source url and target path as requested, in this case the entry might look something like:

id url path startdownload enddownload
42 https://data.iac.ethz.ch/climada/e10f3cdb-2fcb-4f64-921e-3fc628a96a49/test_tc_florida.hdf5 /home/x/climada/data/test/test_tc_florida/v1/test_tc_florida.hdf5 2023-09-06 13:02:11.284926 2023-09-06 13:02:13.064562

The crucial values are url and path, both must match exactly. So if one moves the climada/data directory by moving it and also setting the { "local_data : { "system": "/custom/path" }} value, the client doesn't realize that.

I'd suggest to replace the target pathes in the sqlite database. You can do that, e.g., like this:

import pandas as pd
import sqlite3

old_download_db = "/home/x/climada/data/.downloads.db"
new_download_db = "/home/y/climada/data/.downloads.db"

old_sysdir = "/home/x/climada/data/"
new_sysdir = "/home/y/climada/data/"

downloads = pd.read_sql("select * from download", sqlite3.connect(old_download_db), index_col='id')
downloads.path = downloads.path.apply(lambda x: x.replace(old_sysdir, new_sysdir))
downloads.to_sql("download", sqlite3.connect(new_download_db))

@emanuel-schmid
Copy link
Collaborator

The reason why we try to avoid having binary files in the git repository is mainly because we try to avoid binary files, as they are useless for git just increase the size of the repo.
The solution with files from the api is not ideal either (as this issue illustrates), but we consider it to be less harm.
The proper solution is to define micro datasets through text/code within the test modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug conventions Coding conventions or style external resources Problems encountered when using external resources
Projects
None yet
Development

No branches or pull requests

2 participants