Skip to content

Commit

Permalink
add tutorial 8
Browse files Browse the repository at this point in the history
  • Loading branch information
zain-sohail committed Jun 12, 2024
1 parent 0c1bf90 commit 41e1392
Show file tree
Hide file tree
Showing 2 changed files with 386 additions and 2 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,8 @@ jobs:
- name: Move cached executed notebooks
run: python docs/move_changed_notebooks.py

# - name: Download datasets
# run: poetry run python docs/download_datasets.py
- name: Download datasets
run: poetry run python docs/download_datasets.py

# - name: Build Flash parquet files
# run: |
Expand Down
384 changes: 384 additions & 0 deletions tutorial/8_jittering_tutorial.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,384 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "8ad4167a-e4e7-498d-909a-c04da9f177ed",
"metadata": {
"tags": []
},
"source": [
"# Correct use of Jittering\n",
"This tutorial discusses the background and correct use of the jittering/dithering method (test) implemented in the package."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fb045e17-fa89-4c11-9d51-7f06e80d96d5",
"metadata": {},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2\n",
"import matplotlib.pyplot as plt\n",
"\n",
"import sed\n",
"from sed.dataset import dataset\n",
"\n",
"%matplotlib widget"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "42a6afaa-17dd-4637-ba75-a28c4ead1adf",
"metadata": {},
"source": [
"## Load Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "34f46d54",
"metadata": {},
"outputs": [],
"source": [
"dataset.get(\"WSe2\") # Put in Path to a storage of at least 20 Gbyte free space.\n",
"data_path = dataset.dir # This is the path to the data\n",
"scandir, _ = dataset.subdirs # scandir contains the data, _ contains the calibration files"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f1f82054",
"metadata": {},
"outputs": [],
"source": [
"# create sed processor using the config file:\n",
"sp = sed.SedProcessor(folder=scandir, config=\"../sed/config/mpes_example_config.yaml\")"
]
},
{
"cell_type": "markdown",
"id": "fb8c50c6",
"metadata": {},
"source": [
"After loading, the dataframe contains the four columns `X`, `Y`, `t`, `ADC`, which have all integer values. They originate from a time-to-digital converter, and correspond to digital \"bins\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "50d0a3b3",
"metadata": {},
"outputs": [],
"source": [
"sp.dataframe.head()"
]
},
{
"cell_type": "markdown",
"id": "121cb571",
"metadata": {},
"source": [
"Let's bin these data along the `t` dimension within a small range:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c5675401",
"metadata": {},
"outputs": [],
"source": [
"axes = ['t']\n",
"bins = [150]\n",
"ranges = [[66000, 67000]]\n",
"res01 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n",
"plt.figure()\n",
"res01.plot()"
]
},
{
"cell_type": "markdown",
"id": "de1d411c",
"metadata": {},
"source": [
"We notice some oscillation ontop of the data. These are re-binning artefacts, originating from a non-integer number of machine-bins per bin, as we can verify by binning with a different number of steps:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "21d89f2c",
"metadata": {},
"outputs": [],
"source": [
"axes = ['t']\n",
"bins = [100]\n",
"ranges = [[66000, 67000]]\n",
"res02 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n",
"plt.figure()\n",
"res01.plot(label=\"6.66/bin\")\n",
"res02.plot(label=\"10/bin\")\n",
"plt.legend()"
]
},
{
"cell_type": "markdown",
"id": "31855799",
"metadata": {},
"source": [
"If we have a very detailed look, with step-sizes smaller than one, we see the digital nature of the original data behind this issue:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e95053bf",
"metadata": {},
"outputs": [],
"source": [
"axes = ['t']\n",
"bins = [200]\n",
"ranges = [[66600, 66605]]\n",
"res11 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n",
"plt.figure()\n",
"res11.plot()"
]
},
{
"cell_type": "markdown",
"id": "c2d82124",
"metadata": {},
"source": [
"To mitigate this problem, we can add some randomness to the data, and re-distribute events into the gaps in-between bins. This is also termed `dithering` and e.g. known from image manipulation. The important factor is to add the right amount and right type of random distribution, to end up at a quasi-continous uniform distribution, but not lose information.\n",
"\n",
"We can use the add_jitter function for this. We can pass it the colums to add jitter to, and the amplitude of a uniform jitter. Importantly, this step should be taken in the very beginning as first step before any dataframe operations are added.\n",
"\n",
"Let's try with a value of 0.2 for the amplitude:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f728aa81",
"metadata": {},
"outputs": [],
"source": [
"df_backup = sp.dataframe\n",
"sp.add_jitter(cols=[\"t\"], amps=[0.2])"
]
},
{
"cell_type": "markdown",
"id": "ae6a2f93",
"metadata": {},
"source": [
"We see that the `t` column is no longer integer-valued:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e60e6c35",
"metadata": {},
"outputs": [],
"source": [
"sp.dataframe.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "10e28a62",
"metadata": {},
"outputs": [],
"source": [
"axes = ['t']\n",
"bins = [200]\n",
"ranges = [[66600, 66605]]\n",
"res12 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n",
"plt.figure()\n",
"res11.plot(label=\"not jittered\")\n",
"res12.plot(label=\"amplitude 0.2\")\n",
"plt.legend()"
]
},
{
"cell_type": "markdown",
"id": "dad0f31f",
"metadata": {},
"source": [
"This is clearly not enough jitter to close the gaps. The ideal (and default) amplitude is 0.5, which exactly fills the gaps:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "84879f75",
"metadata": {},
"outputs": [],
"source": [
"sp.dataframe = df_backup\n",
"sp.add_jitter(cols=[\"t\"], amps=[0.5])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2fcb6361",
"metadata": {},
"outputs": [],
"source": [
"axes = ['t']\n",
"bins = [200]\n",
"ranges = [[66600, 66605]]\n",
"res13 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n",
"plt.figure()\n",
"res11.plot(label=\"not jittered\")\n",
"res12.plot(label=\"amplitude 0.2\")\n",
"res13.plot(label=\"amplitude 0.5\")\n",
"plt.legend()"
]
},
{
"cell_type": "markdown",
"id": "282994b8",
"metadata": {},
"source": [
"This jittering fills the gaps, and produces a continous uniform distribution. Let's check again the longer-range binning that gave us the oscillations initially:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "801a1e3e",
"metadata": {},
"outputs": [],
"source": [
"axes = ['t']\n",
"bins = [150]\n",
"ranges = [[66000, 67000]]\n",
"res03 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n",
"plt.figure()\n",
"res01.plot(label=\"not jittered\")\n",
"res03.plot(label=\"jittered\")\n",
"plt.legend()"
]
},
{
"cell_type": "markdown",
"id": "bc64928b",
"metadata": {},
"source": [
"Now, the artefacts are absent, and similarly will they be in any dataframe columns derived from a column jittered in such a way. Note that this only applies to data present in digital (i.e. machine-binned) format, and not to data that are intrinsically continous. \n",
"\n",
"Also note that too large or not well-aligned jittering amplitudes will\n",
"- deteriorate your resolution along the jittered axis\n",
"- will not solve the problem entirely:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "26cbbfce",
"metadata": {},
"outputs": [],
"source": [
"sp.dataframe = df_backup\n",
"sp.add_jitter(cols=[\"t\"], amps=[0.7])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "748509e2",
"metadata": {},
"outputs": [],
"source": [
"axes = ['t']\n",
"bins = [200]\n",
"ranges = [[66600, 66605]]\n",
"res14 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n",
"plt.figure()\n",
"res13.plot(label=\"Amplitude 0.5\")\n",
"res14.plot(label=\"Amplitude 0.7\")\n",
"plt.legend()"
]
},
{
"cell_type": "markdown",
"id": "307557a9",
"metadata": {},
"source": [
"If the step-size of digitization is different from 1, the corresponding stepssize (half the distance between digitized values) can be adjusted as shown above.\n",
"\n",
"Also, alternatively also normally distributed noise can be added, which is less sensitive to the exact right amplitude, but will lead to mixing of neighboring voxels, and thus loss of resolution. Also, normally distributed noise is substantially more computation-intensive to generate. It can nevertheless be helpful in situations where e.g. the stepsize is non-uniform."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c5603f47",
"metadata": {},
"outputs": [],
"source": [
"sp.dataframe = df_backup\n",
"sp.add_jitter(cols=[\"t\"], amps=[0.7], jitter_type=\"normal\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b64fd59b",
"metadata": {},
"outputs": [],
"source": [
"axes = ['t']\n",
"bins = [200]\n",
"ranges = [[66600, 66605]]\n",
"res15 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n",
"plt.figure()\n",
"res14.plot(label=\"Uniform, Amplitude 0.7\")\n",
"res15.plot(label=\"Normal, Amplitude 0.7\")\n",
"plt.legend()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "48ebf393",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"interpreter": {
"hash": "728003ee06929e5fa5ff815d1b96bf487266025e4b7440930c6bf4536d02d243"
},
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

0 comments on commit 41e1392

Please sign in to comment.