-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cleaning up data api file cache #737
Conversation
… files from ~/climada/data
@chahank: let me know if this acceptably solves the ever increasing api data heap problem. If so I'll add documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a very quick fix, thanks! Just a few small comments, but otherwise it looks great to me (when the documentation will be added :)).
climada/util/api_client.py
Outdated
" or remove cache entry from database by calling" | ||
f" `Client.purge_cache_db(Path('{local_path}'))`!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bit to difficult to understand. When should a user purge the cache? When should a user wait for the download? Any way to help the user better here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception is more verbose now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clear now!
climada/util/api_client.py
Outdated
with the API client, if they are beneath the given directory and if one of the following | ||
is the case: | ||
- there status is neither 'active' nor 'test_dataset' | ||
= their status is 'test_dataset' and keep_testfiles is set to False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
= their status is 'test_dataset' and keep_testfiles is set to False | |
- their status is 'test_dataset' and keep_testfiles is set to False |
climada/util/api_client.py
Outdated
|
||
# collect urls from datasets that should not be removed | ||
test_datasets = self.list_dataset_infos(status='test_dataset') if keep_testfiles else [] | ||
test_urls = set(filinf.url for dsinf in test_datasets for filinf in dsinf.files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand the short variables dsinf
and filinf
. Are these some standard abbreviations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they're called ds_info
and file_info
now
climada/util/api_client.py
Outdated
rm_empty_dirs(subdir) | ||
try: | ||
directory.rmdir() | ||
except OSError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What error is ignored here? I do not understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an inline comment # raised when directory is not empty
Path(temp_dir).joinpath('hazard/tropical_cyclone/rename_files2/v1').is_dir() | ||
) | ||
self.assertEqual( # test files are still there | ||
3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there 3 test files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's just the nature of that test dataset. datasets can have any number of files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. I meant, why does this particular test result in 3 files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beats me. I picked it for being small (in file size) and expired. I suspect it's an experimental dataset that was used to explore the data api itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use a better-known test file then? The test looks rather mysterious like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Wrong. Sorry. The one with 3 files is used in TestStormEurope.test_icon_read
. Reading icon files takes a directory as input and collects data from there. Having more than one makes complete sense. And the test is fairly known. Apart from that I think it doesn't really matter which dataset we pick as long as size is acceptable and status and version make a difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, if it is clear to you, I am fine with it.
@chahank many thanks for the equally quick review! 🙌 |
Changes proposed in this PR:
This PR fixes #
PR Author Checklist
develop
)PR Reviewer Checklist