Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New data store preload API #1097

Merged
merged 9 commits into from
Dec 27, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,14 @@

### Enhancements

* Added a new _preload API_ to xcube data stores:
- Enhanced the `xcube.core.store.DataStore` class to optionally support
preloading of datasets via an API represented by the
new `xcube.core.store.DataPreloader` interface.
- Added handy default implementations `NullPreloadHandle` and `ExecutorPreloadHandle`
to be returned by implementations of the `prepare_data()` method of a
given data store.

* A `xy_res` keyword argument was added to the `transform()` method of
`xcube.core.gridmapping.GridMapping`, enabling users to set the grid-mapping
resolution directly, which speeds up the method by avoiding time-consuming
Expand Down
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ dependencies:
- s3fs >=2021.6
- setuptools >=41.0
- shapely >=1.6
- tabulate >=0.9
- tornado >=6.0
- urllib3 >=1.26
- xarray >=2022.6
Expand Down
368 changes: 368 additions & 0 deletions examples/notebooks/datastores/preload.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,368 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also add the context manager example in the notebook.

"cells": [
{
"cell_type": "markdown",
"id": "7f55e16a-46f3-4865-879a-a7cae151daa6",
"metadata": {},
"source": [
"### `ExecutorPreloadHandle` Demo\n",
"\n",
"This notebook is dedicated to developers wanting to enhance their\n",
"data store implementation by the new _data store preload API_. \n",
"This API has been added to `xcube.core.store.DataStore` in xcube 1.8.\n",
"Demonstrated here is the usage of the utility class ` xcube.core.store.preload.ExecutorPreloadHandle` \n",
"for cases where the preload process can be concurrently performed for each indiviual data resource. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "7e27d98a-6519-4c4f-9e59-ae6369e102b9",
"metadata": {},
"outputs": [],
"source": [
"import random\n",
"import time\n",
"\n",
"from xcube.core.store.preload import ExecutorPreloadHandle\n",
"from xcube.core.store.preload import PreloadHandle\n",
"from xcube.core.store.preload import PreloadState\n",
"from xcube.core.store.preload import PreloadStatus"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "92b57a80-14aa-4820-8acc-ab78fa45be0a",
"metadata": {},
"outputs": [],
"source": [
"data_ids = (\n",
" \"tt-data/tinky-winky.nc\", \n",
" \"tt-data/dipsy.zarr\", \n",
" \"tt-data/laa-laa.tiff\", \n",
" \"tt-data/po.zarr.zip\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c3175fc6-084d-4da5-b7af-fe3f03758a7d",
"metadata": {},
"outputs": [],
"source": [
"def preload_data(handle: PreloadHandle, data_id: str):\n",
" duration = random.randint(5, 15) # seconds\n",
" num_ticks = 100 \n",
" for i in range(num_ticks):\n",
" time.sleep(duration / num_ticks)\n",
" if handle.cancelled:\n",
" # TODO: Note clear, why future.cancel() doesn't do the job\n",
" handle.notify(PreloadState(data_id, status=PreloadStatus.cancelled))\n",
" return\n",
" handle.notify(PreloadState(data_id, progress=i / num_ticks))\n",
" if i % 10 == 0:\n",
" handle.notify(PreloadState(data_id, message=f\"Step #{i // 10 + 1}\"))\n",
" handle.notify(PreloadState(data_id, progress=1.0, message=\"Done.\"))"
]
},
{
"cell_type": "markdown",
"id": "48792654-20e5-46db-aa47-37c03f46b0a7",
"metadata": {},
"source": [
"---\n",
"Synchronous / blocking call"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c8bb6812-af18-4e5f-8df0-d77889552e99",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "f88a78a4ffff41cd917f73405a1183e6",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(HTML(value='<table>\\n<thead>\\n<tr><th>Data ID </th><th>Status </th><th>Progress …"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"handle = ExecutorPreloadHandle(data_ids, preload_data=preload_data)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "8911ae1b-765c-4030-aeca-fc997e7a880d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table>\n",
"<thead>\n",
"<tr><th>Data ID </th><th>Status </th><th>Progress </th><th>Message </th><th>Exception </th></tr>\n",
"</thead>\n",
"<tbody>\n",
"<tr><td>tt-data/tinky-winky.nc</td><td>STOPPED </td><td>100% </td><td>Done. </td><td>- </td></tr>\n",
"<tr><td>tt-data/dipsy.zarr </td><td>STOPPED </td><td>100% </td><td>Done. </td><td>- </td></tr>\n",
"<tr><td>tt-data/laa-laa.tiff </td><td>STOPPED </td><td>100% </td><td>Done. </td><td>- </td></tr>\n",
"<tr><td>tt-data/po.zarr.zip </td><td>STOPPED </td><td>100% </td><td>Done. </td><td>- </td></tr>\n",
"</tbody>\n",
"</table>"
],
"text/plain": [
"Data ID Status Progress Message Exception\n",
"---------------------- -------- ---------- --------- -----------\n",
"tt-data/tinky-winky.nc STOPPED 100% Done. -\n",
"tt-data/dipsy.zarr STOPPED 100% Done. -\n",
"tt-data/laa-laa.tiff STOPPED 100% Done. -\n",
"tt-data/po.zarr.zip STOPPED 100% Done. -"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"handle"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "5fe9a8e6-1543-469f-bb29-f1c7b3462953",
"metadata": {},
"outputs": [],
"source": [
"handle.close()"
]
},
{
"cell_type": "markdown",
"id": "fe45c10a-e20f-439f-85bb-4aa95a68f3d4",
"metadata": {},
"source": [
"---\n",
"Asynchronous / non-blocking call"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "69cb1b3d-3fcc-4c66-b077-d474a0083e6e",
"metadata": {},
"outputs": [],
"source": [
"async_handle = ExecutorPreloadHandle(data_ids, blocking=False, preload_data=preload_data)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "d5a50ec1-addf-46a4-90e1-abfc5b2e95de",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "eec94f7b3cdd4d5e96411f980aa55a92",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(HTML(value='<table>\\n<thead>\\n<tr><th>Data ID </th><th>Status </th><th>Progress …"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"async_handle.show()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "7a2f1f9a-9ea0-43b5-82ed-b90a851f938a",
"metadata": {},
"outputs": [],
"source": [
"time.sleep(2)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "436e7042-abf2-4450-a334-eed75f045c37",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"PreloadState(data_id='tt-data/dipsy.zarr', status=PreloadStatus.started, progress=0.17, message='Step #2')"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"async_handle.get_state(\"tt-data/dipsy.zarr\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "07d81416-38e8-4b23-9220-d1f57160d9b6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"data_id=tt-data/dipsy.zarr, status=STARTED, progress=0.17, message=Step #2\n"
]
}
],
"source": [
"print(async_handle.get_state(\"tt-data/dipsy.zarr\"))"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "f17aeecb-6650-43d5-9c41-cedb46baaa3f",
"metadata": {},
"outputs": [],
"source": [
"time.sleep(2)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "a911b93a-fb76-4d2f-88e8-0fd4e635246c",
"metadata": {},
"outputs": [],
"source": [
"async_handle.cancel()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "df2f5a5c-82a8-43ae-ae84-5c9d882e3757",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Data ID Status Progress Message Exception\n",
"---------------------- -------- ---------- --------- -----------\n",
"tt-data/tinky-winky.nc STARTED 38% Step #4 -\n",
"tt-data/dipsy.zarr STARTED 35% Step #4 -\n",
"tt-data/laa-laa.tiff STARTED 55% Step #6 -\n",
"tt-data/po.zarr.zip STARTED 29% Step #3 -\n"
]
}
],
"source": [
"print(async_handle)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "769857c9-bad8-4824-92fb-a195e4d5ccb3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<table>\n",
"<thead>\n",
"<tr><th>Data ID </th><th>Status </th><th>Progress </th><th>Message </th><th>Exception </th></tr>\n",
"</thead>\n",
"<tbody>\n",
"<tr><td>tt-data/tinky-winky.nc</td><td>CANCELLED</td><td>38% </td><td>Step #4 </td><td>- </td></tr>\n",
"<tr><td>tt-data/dipsy.zarr </td><td>STARTED </td><td>35% </td><td>Step #4 </td><td>- </td></tr>\n",
"<tr><td>tt-data/laa-laa.tiff </td><td>STARTED </td><td>55% </td><td>Step #6 </td><td>- </td></tr>\n",
"<tr><td>tt-data/po.zarr.zip </td><td>STARTED </td><td>29% </td><td>Step #3 </td><td>- </td></tr>\n",
"</tbody>\n",
"</table>"
],
"text/plain": [
"Data ID Status Progress Message Exception\n",
"---------------------- --------- ---------- --------- -----------\n",
"tt-data/tinky-winky.nc CANCELLED 38% Step #4 -\n",
"tt-data/dipsy.zarr STARTED 35% Step #4 -\n",
"tt-data/laa-laa.tiff STARTED 55% Step #6 -\n",
"tt-data/po.zarr.zip STARTED 29% Step #3 -"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"async_handle"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "4d0d42ee-5caa-4eef-913a-1f0417670001",
"metadata": {},
"outputs": [],
"source": [
"async_handle.close()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5c3d89bf-f58f-495b-a5d1-c9fe0b9610a7",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ dependencies = [
"s3fs>=2021.6",
"setuptools>=41.0",
"shapely>=1.6",
"tabulate>=0.9",
"tornado>=6.0",
"urllib3>=1.26",
"xarray>=2022.6,<=2024.6",
Expand Down
Loading
Loading