Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add artifical information filter #273

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
bfa0192
add keepbits parameter
Ishaanj18 Apr 5, 2023
f1733d9
Merge branch 'observingClouds:main' into main
Ishaanj18 Sep 14, 2023
2f08096
Merge branch 'observingClouds:main' into main
Ishaanj18 Sep 22, 2023
638f1a7
first_step_information_Filter
Ishaanj18 Sep 22, 2023
59a75e8
corrected_indentationError
Ishaanj18 Sep 22, 2023
2a8ba56
fixed-indentationError
Ishaanj18 Sep 22, 2023
0630abe
added information filter
Ishaanj18 Sep 27, 2023
679866f
Updated the docstring, generalised the code for every data type, wrot…
Ishaanj18 Oct 3, 2023
087c997
Merge branch 'main' into add_artifical-information_filter
Ishaanj18 Oct 3, 2023
30673c9
Merge branch 'observingClouds:main' into main
Ishaanj18 Oct 3, 2023
a21cf21
updated docstring
Ishaanj18 Oct 3, 2023
4e91637
Merge branch 'add_artifical-information_filter' of https://github.com…
Ishaanj18 Oct 3, 2023
d3cdc66
updated docstring
Ishaanj18 Oct 3, 2023
8578df7
updated docstring
Ishaanj18 Oct 6, 2023
b5845ea
updated tests and get_cdf_without_artificial_information function
Ishaanj18 Oct 7, 2023
9eb49cb
Merge branch 'observingClouds:main' into main
Ishaanj18 Oct 13, 2023
872042b
Update xbitinfo/xbitinfo.py
Ishaanj18 Oct 19, 2023
2bfde26
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 19, 2023
0d4b9a4
Update xbitinfo/xbitinfo.py
Ishaanj18 Oct 20, 2023
a875052
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 20, 2023
6cd513f
Merge branch 'main' of https://github.com/Ishaanj18/xbitinfo into add…
Ishaanj18 Oct 20, 2023
12bed64
Merge branch 'add_artifical-information_filter' of https://github.com…
Ishaanj18 Oct 20, 2023
6033c02
changed none from string to object
Ishaanj18 Oct 20, 2023
18e4680
Merge branch 'observingClouds:main' into add_artifical-information_fi…
Ishaanj18 Oct 20, 2023
e81dfe7
Merge branch 'add_artifical-information_filter' of https://github.com…
Ishaanj18 Oct 20, 2023
3156d30
updated docstring
Ishaanj18 Oct 20, 2023
7e9e3c1
Updated Documentation
Ishaanj18 Nov 6, 2023
ecee5c4
Updated Documentation
Ishaanj18 Nov 6, 2023
ef909c2
updated docs
Ishaanj18 Nov 6, 2023
611017c
updated docs
Ishaanj18 Nov 6, 2023
6aae1db
Merge branch 'main' into add_artifical-information_filter
Ishaanj18 Nov 17, 2023
9ab26a9
removed libraries not required
Ishaanj18 Nov 17, 2023
cd8d451
Merge branch 'add_artifical-information_filter' of https://github.com…
Ishaanj18 Nov 17, 2023
2eed4c7
added metpy and intake in enviorment
Ishaanj18 Nov 17, 2023
bab376f
added intake and metpy in enviornment.yml
Ishaanj18 Nov 17, 2023
0befec6
Merge branch 'main' into add_artifical-information_filter
observingClouds Dec 14, 2023
0649284
rename menu subsection
observingClouds Dec 14, 2023
a934aa5
add notebook requirements
observingClouds Dec 14, 2023
784f151
add missing packages for notebook test
observingClouds Dec 14, 2023
7e57f8e
add missing requirement for notebook execution
observingClouds Dec 15, 2023
1dfbbd0
add missing requirement for notebook execution
observingClouds Dec 15, 2023
e6c4efa
variable non existing in dataset
observingClouds Dec 15, 2023
ba5609d
switch to python implementation due to #212
observingClouds Dec 15, 2023
974c3d1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 15, 2023
87c1a9e
explicitly use python implementation
observingClouds Dec 15, 2023
6706fcb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 15, 2023
8c6232a
remove masked_value option
observingClouds Dec 15, 2023
d955ee0
fix syntax
observingClouds Dec 15, 2023
e433105
Delete docs/InformationFilter.ipynb
observingClouds Dec 15, 2023
f59b751
change kernelname
observingClouds Dec 15, 2023
8aaf7e1
Update environment.yml
observingClouds Dec 15, 2023
4130fc8
add missing requirement
observingClouds Dec 15, 2023
1438e72
refactoring notebook
observingClouds Feb 5, 2024
30696a1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 5, 2024
cb9d2b0
fix issue with intake 2.0.0
observingClouds Feb 5, 2024
69395ee
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 5, 2024
9cba082
change to python implementation
observingClouds Feb 5, 2024
0f6cf09
remove masked_value
observingClouds Feb 5, 2024
4adee89
change variable
observingClouds Feb 6, 2024
0f7304a
Merge branch 'observingClouds:main' into main
Ishaanj18 Mar 28, 2024
f3f677e
restoring
Ishaanj18 Mar 28, 2024
4cdfe15
refactoring code
Ishaanj18 Mar 28, 2024
2a51323
Restored lost commits
Ishaanj18 Apr 1, 2024
f767f0c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 1, 2024
f1c5799
Removed informationfilter.ipynb
Ishaanj18 Apr 1, 2024
444afb2
Chnaged example in notebook
Ishaanj18 Apr 13, 2024
243c017
Merge branch 'main' into add_artifical-information_filter
observingClouds Apr 15, 2024
c462b6e
Merge branch 'add_artifical-information_filter' of https://github.com…
observingClouds Apr 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
335 changes: 335 additions & 0 deletions docs/ArtificialInformation_Filter.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,335 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "38ac6d1a",
"metadata": {},
"source": [
"<span style=\"color:red\">**<<<<<<< local**</span>"
]
},
{
"cell_type": "markdown",
"id": "895e1d5a",
"metadata": {},
"source": [
"# Artificial information filtering\n",
"\n",
"In simple terms the bitinformation is retrieved by checking how variable a bit pattern is. However, this approach cannot distinguish between actual information content and artifical information content. By studying the distribution of the information content the user can often identify clear cut-offs of real information content and artificial information content.\n",
"\n",
"The following example shows how such a separation of real information and artificial information can look like. To do so, artificial information is artificially added to an example dataset by applying linear quantization. Linear quantization is often applied to climate datasets (e.g. ERA5) and needs to be accounted for in order to retrieve meaningful bitinformation content. An algorithm that aims at detecting this artificial information itself is introduced."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3c37dd36",
"metadata": {},
"outputs": [],
"source": [
"import xarray as xr\n",
"import xbitinfo as xb\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"id": "e8e1424f",
"metadata": {},
"source": [
"## Loading example dataset\n",
"We use here the openly accessible CONUS dataset. The dataset is available at full precision."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b18b9e24",
"metadata": {},
"outputs": [],
"source": [
"ds = xr.open_zarr(\n",
" \"s3://hytest/conus404/conus404_hourly.zarr\",\n",
" storage_options={\n",
" \"anon\": True,\n",
" \"requester_pays\": False,\n",
" \"client_kwargs\": {\"endpoint_url\": \"https://usgs.osn.mghpcc.org\"},\n",
" },\n",
")\n",
"# selecting water vapor mixing ratio at 2 meters\n",
"data = ds[\"ACSWUPB\"]\n",
"# select subset of data for demonstration purposes\n",
"chunk = data.isel(time=slice(0, 9), y=slice(0, 525), x=slice(0, 525))\n",
"chunk"
]
},
{
"cell_type": "markdown",
"id": "535ce421",
"metadata": {},
"source": [
"## Creating dataset copy with artificial information\n",
"### Functions to encode and decode"
]
},
{
"cell_type": "markdown",
"id": "69543b4c",
"metadata": {},
"source": [
"<span style=\"color:red\">**=======**</span>"
]
},
{
"cell_type": "markdown",
"id": "1842f792",
"metadata": {},
"source": [
"# Artificial information filtering\n",
"\n",
"In simple terms the bitinformation is retrieved by checking how variable a bit pattern is. However, this approach cannot distinguish between actual information content and artifical information content. By studying the distribution of the information content the user can often identify clear cut-offs of real information content and artificial information content.\n",
"\n",
"The following example shows how such a separation of real information and artificial information can look like. To do so, artificial information is artificially added to an example dataset by applying linear quantization. Linear quantization is often applied to climate datasets (e.g. ERA5) and needs to be accounted for in order to retrieve meaningful bitinformation content. An algorithm that aims at detecting this artificial information itself is introduced."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bb998fbb",
"metadata": {},
"outputs": [],
"source": [
"import xarray as xr\n",
"import xbitinfo as xb\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"id": "32ac97e0",
"metadata": {},
"source": [
"## Loading example dataset\n",
"We use here the openly accessible CONUS dataset. The dataset is available at full precision."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9639a618",
"metadata": {},
"outputs": [],
"source": [
"ds = xr.open_zarr(\n",
" \"s3://hytest/conus404/conus404_monthly.zarr\",\n",
" storage_options={\n",
" \"anon\": True,\n",
" \"requester_pays\": False,\n",
" \"client_kwargs\": {\"endpoint_url\": \"https://usgs.osn.mghpcc.org\"},\n",
" },\n",
")\n",
"# selecting water vapor mixing ratio at 2 meters\n",
"data = ds[\"ACSWDNT\"]\n",
"# select subset of data for demonstration purposes\n",
"chunk = data.isel(time=slice(0, 2), y=slice(0, 1015), x=slice(0, 1050))\n",
"chunk"
]
},
{
"cell_type": "markdown",
"id": "3d735e4b",
"metadata": {},
"source": [
"## Creating dataset copy with artificial information\n",
"### Functions to encode and decode"
]
},
{
"cell_type": "markdown",
"id": "0d30feaa",
"metadata": {},
"source": [
"<span style=\"color:red\">**>>>>>>> remote**</span>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3a7c7ae",
"metadata": {},
"outputs": [],
"source": [
"# Encoding function to compress data\n",
"def encode(chunk, scale, offset, dtype, astype):\n",
" enc = (chunk - offset) * scale\n",
" enc = np.around(enc)\n",
" enc = enc.astype(astype, copy=False)\n",
" return enc\n",
"\n",
"\n",
"# Decoding function to decompress data\n",
"def decode(enc, scale, offset, dtype, astype):\n",
" dec = (enc / scale) + offset\n",
" dec = dec.astype(dtype, copy=False)\n",
" return dec"
]
},
{
"cell_type": "markdown",
"id": "fa6f26c7",
"metadata": {},
"source": [
"### Transform dataset to introduce artificial information"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c09e3cf3",
"metadata": {},
"outputs": [],
"source": [
"xmin = np.min(chunk)\n",
"xmax = np.max(chunk)\n",
"scale = (2**16 - 1) / (xmax - xmin)\n",
"offset = xmin\n",
"enc = encode(chunk, scale, offset, \"f4\", \"u2\")\n",
"dec = decode(enc, scale, offset, \"f4\", \"u2\")"
]
},
{
"cell_type": "markdown",
"id": "7126810d",
"metadata": {},
"source": [
"## Comparison of bitinformation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "05ef8a94",
"metadata": {},
"outputs": [],
"source": [
"# original dataset without artificial information\n",
"orig_info = xb.get_bitinformation(\n",
" xr.Dataset({\"w/o artif. info\": chunk}),\n",
" dim=\"x\",\n",
" implementation=\"python\",\n",
")\n",
"\n",
"# dataset with artificial information\n",
"arti_info = xb.get_bitinformation(\n",
" xr.Dataset({\"w artif. info\": dec}),\n",
" dim=\"x\",\n",
" implementation=\"python\",\n",
")\n",
"\n",
"# plotting distribution of bitwise information content\n",
"info = xr.merge([orig_info, arti_info])\n",
"plot = xb.plot_bitinformation(info)"
]
},
{
"cell_type": "markdown",
"id": "de1ecb7e",
"metadata": {},
"source": [
"The figure reveals that artificial information is introduced by applying linear quantization. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8600d4b8",
"metadata": {},
"outputs": [],
"source": [
"keepbits = xb.get_keepbits(info, inflevel=[0.99])\n",
"print(\n",
" f\"The number of keepbits increased from {keepbits['w/o artif. info'].item(0)} bits in the original dataset to {keepbits['w artif. info'].item(0)} bits in the dataset with artificial information.\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "fa80f988",
"metadata": {},
"source": [
"In the following, a gradient based filter is introduced to remove this artificial information again so that even in case artificial information is present in a dataset the number of keepbits remains similar."
]
},
{
"cell_type": "markdown",
"id": "3f7a7c2e",
"metadata": {},
"source": [
"## Artificial information filter\n",
"The filter `gradient` works as follows:\n",
"\n",
"1. It determines the Cumulative Distribution Function(CDF) of the bitwise information content\n",
"2. It computes the gradient of the CDF to identify points where the gradient becomes close to a given tolerance indicating a drop in information.\n",
"3. Simultaneously, it keeps track of the minimum cumulative sum of information content which is threshold here, which signifies at least this much fraction of total information needs to be passed.\n",
"4. So the bit where the intersection of the gradient reaching the tolerance and the cumulative sum exceeding the threshold is our TrueKeepbits. All bits beyond this index are assumed to contain artificial information and are set to zero in order to cut them off.\n",
"5. You can see the above concept implemented in the function get_cdf_without_artificial_information in xbitinfo.py\n",
"\n",
"Please note that this filter relies on a clear separation between real and artificial information content and might not work in all cases."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b0ab6633",
"metadata": {},
"outputs": [],
"source": [
"xb.get_keepbits(\n",
" arti_info,\n",
" inflevel=[0.99],\n",
" information_filter=\"Gradient\",\n",
" **{\"threshold\": 0.7, \"tolerance\": 0.001}\n",
")"
]
},
{
"cell_type": "markdown",
"id": "21c6369d",
"metadata": {},
"source": [
"With the application of the filter the keepbits are closer/identical to their original value in the dataset without artificial information. The plot of the bitinformation visualizes this:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8e9183b2",
"metadata": {},
"outputs": [],
"source": [
"plot = xb.plot_bitinformation(arti_info, information_filter=\"Gradient\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
3 changes: 3 additions & 0 deletions docs/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ dependencies:
- sphinx-book-theme>=0.1.7
- myst-nb
- numcodecs>=0.10.0
- intake-xarray
- metpy
- s3fs
- pip
- pip:
- -e ../.
11 changes: 5 additions & 6 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,17 +96,16 @@ Credits

quick-start.ipynb


**User Guide**

* :doc:`chunking`
* :doc:`artificialinformation`

.. toctree::
:maxdepth: 1
:hidden:
:caption: User Guide
:maxdepth: 1
:hidden:
:caption: User Guide

chunking.ipynb
ArtificialInformation_Filter.ipynb

**Help & Reference**

Expand Down
3 changes: 3 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ dependencies:
- sphinx-book-theme
- myst-nb
- numcodecs>=0.10.0
- pytest-lazy-fixture
- aiohttp
- s3fs
- pip
- pip:
- -e .
Loading
Loading