cleaning up data api file cache #737

emanuel-schmid · 2023-06-16T15:36:26Z

Changes proposed in this PR:

introduce a purge method in the api_client that removes files from the api client file cache if they are outdated

This PR fixes #

PR Author Checklist

PR Reviewer Checklist

… files from ~/climada/data

emanuel-schmid · 2023-06-16T15:40:52Z

@chahank: let me know if this acceptably solves the ever increasing api data heap problem. If so I'll add documentation.

chahank

That was a very quick fix, thanks! Just a few small comments, but otherwise it looks great to me (when the documentation will be added :)).

chahank · 2023-06-16T16:04:20Z

climada/util/api_client.py

+                                      " or remove cache entry from database by calling"
+                                      f" `Client.purge_cache_db(Path('{local_path}'))`!")


I think this is a bit to difficult to understand. When should a user purge the cache? When should a user wait for the download? Any way to help the user better here?

The exception is more verbose now.

Very clear now!

chahank · 2023-06-16T16:05:59Z

climada/util/api_client.py

+        with the API client, if they are beneath the given directory and if one of the following
+        is the case:
+        - there status is neither 'active' nor 'test_dataset'
+        = their status is 'test_dataset' and keep_testfiles is set to False


Suggested change

= their status is 'test_dataset' and keep_testfiles is set to False

- their status is 'test_dataset' and keep_testfiles is set to False

chahank · 2023-06-16T16:07:14Z

climada/util/api_client.py

+
+        # collect urls from datasets that should not be removed
+        test_datasets = self.list_dataset_infos(status='test_dataset') if keep_testfiles else []
+        test_urls = set(filinf.url for dsinf in test_datasets for filinf in dsinf.files)


I do not understand the short variables dsinf and filinf. Are these some standard abbreviations?

they're called ds_info and file_info now

chahank · 2023-06-16T16:09:08Z

climada/util/api_client.py

+                    rm_empty_dirs(subdir)
+            try:
+                directory.rmdir()
+            except OSError:


What error is ignored here? I do not understand.

I've added an inline comment # raised when directory is not empty

chahank · 2023-06-16T16:10:50Z

climada/test/test_api_client.py

+                Path(temp_dir).joinpath('hazard/tropical_cyclone/rename_files2/v1').is_dir()
+            )
+            self.assertEqual(  # test files are still there
+                3,


Why are there 3 test files?

that's just the nature of that test dataset. datasets can have any number of files.

thanks. I meant, why does this particular test result in 3 files?

Beats me. I picked it for being small (in file size) and expired. I suspect it's an experimental dataset that was used to explore the data api itself.

Should we use a better-known test file then? The test looks rather mysterious like this.

Oh. Wrong. Sorry. The one with 3 files is used in TestStormEurope.test_icon_read. Reading icon files takes a directory as input and collects data from there. Having more than one makes complete sense. And the test is fairly known. Apart from that I think it doesn't really matter which dataset we pick as long as size is acceptable and status and version make a difference.

All right, if it is clear to you, I am fine with it.

emanuel-schmid · 2023-06-19T09:31:07Z

@chahank many thanks for the equally quick review! 🙌
Amending the docs....

api_client.Client: introduce purge_cache method that deletes obsolete…

e35768a

… files from ~/climada/data

emanuel-schmid requested a review from chahank June 16, 2023 15:36

api_client.purge_cache: simplication

e6fb801

chahank requested changes Jun 16, 2023

View reviewed changes

util.api_client: improved readabliity

3afcd26

emanuel-schmid added 2 commits June 19, 2023 12:06

doc: Client.purge_cache

ab768b5

doc.api_client: cosmetics

ff06a16

emanuel-schmid merged commit c15fc53 into develop Jun 19, 2023

emanuel-schmid deleted the feature/cleanup_dataapi_filecache branch June 20, 2023 07:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cleaning up data api file cache #737

cleaning up data api file cache #737

emanuel-schmid commented Jun 16, 2023 •

edited

Loading

emanuel-schmid commented Jun 16, 2023

chahank left a comment

chahank Jun 16, 2023

emanuel-schmid Jun 19, 2023

chahank Jun 19, 2023

chahank Jun 16, 2023

chahank Jun 16, 2023

emanuel-schmid Jun 19, 2023

chahank Jun 16, 2023

emanuel-schmid Jun 19, 2023

chahank Jun 16, 2023

emanuel-schmid Jun 19, 2023 •

edited

Loading

chahank Jun 19, 2023

emanuel-schmid Jun 19, 2023

chahank Jun 19, 2023

emanuel-schmid Jun 19, 2023

chahank Jun 19, 2023

emanuel-schmid commented Jun 19, 2023

		" or remove cache entry from database by calling"
		f" `Client.purge_cache_db(Path('{local_path}'))`!")

	= their status is 'test_dataset' and keep_testfiles is set to False
	- their status is 'test_dataset' and keep_testfiles is set to False

cleaning up data api file cache #737

cleaning up data api file cache #737

Conversation

emanuel-schmid commented Jun 16, 2023 • edited Loading

PR Author Checklist

PR Reviewer Checklist

emanuel-schmid commented Jun 16, 2023

chahank left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emanuel-schmid Jun 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emanuel-schmid commented Jun 19, 2023

emanuel-schmid commented Jun 16, 2023 •

edited

Loading

emanuel-schmid Jun 19, 2023 •

edited

Loading