-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
0c1bf90
commit 41e1392
Showing
2 changed files
with
386 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,384 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "8ad4167a-e4e7-498d-909a-c04da9f177ed", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"source": [ | ||
"# Correct use of Jittering\n", | ||
"This tutorial discusses the background and correct use of the jittering/dithering method (test) implemented in the package." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "fb045e17-fa89-4c11-9d51-7f06e80d96d5", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%load_ext autoreload\n", | ||
"%autoreload 2\n", | ||
"import matplotlib.pyplot as plt\n", | ||
"\n", | ||
"import sed\n", | ||
"from sed.dataset import dataset\n", | ||
"\n", | ||
"%matplotlib widget" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "42a6afaa-17dd-4637-ba75-a28c4ead1adf", | ||
"metadata": {}, | ||
"source": [ | ||
"## Load Data" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "34f46d54", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"dataset.get(\"WSe2\") # Put in Path to a storage of at least 20 Gbyte free space.\n", | ||
"data_path = dataset.dir # This is the path to the data\n", | ||
"scandir, _ = dataset.subdirs # scandir contains the data, _ contains the calibration files" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "f1f82054", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# create sed processor using the config file:\n", | ||
"sp = sed.SedProcessor(folder=scandir, config=\"../sed/config/mpes_example_config.yaml\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "fb8c50c6", | ||
"metadata": {}, | ||
"source": [ | ||
"After loading, the dataframe contains the four columns `X`, `Y`, `t`, `ADC`, which have all integer values. They originate from a time-to-digital converter, and correspond to digital \"bins\"." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "50d0a3b3", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"sp.dataframe.head()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "121cb571", | ||
"metadata": {}, | ||
"source": [ | ||
"Let's bin these data along the `t` dimension within a small range:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "c5675401", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"axes = ['t']\n", | ||
"bins = [150]\n", | ||
"ranges = [[66000, 67000]]\n", | ||
"res01 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n", | ||
"plt.figure()\n", | ||
"res01.plot()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "de1d411c", | ||
"metadata": {}, | ||
"source": [ | ||
"We notice some oscillation ontop of the data. These are re-binning artefacts, originating from a non-integer number of machine-bins per bin, as we can verify by binning with a different number of steps:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "21d89f2c", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"axes = ['t']\n", | ||
"bins = [100]\n", | ||
"ranges = [[66000, 67000]]\n", | ||
"res02 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n", | ||
"plt.figure()\n", | ||
"res01.plot(label=\"6.66/bin\")\n", | ||
"res02.plot(label=\"10/bin\")\n", | ||
"plt.legend()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "31855799", | ||
"metadata": {}, | ||
"source": [ | ||
"If we have a very detailed look, with step-sizes smaller than one, we see the digital nature of the original data behind this issue:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "e95053bf", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"axes = ['t']\n", | ||
"bins = [200]\n", | ||
"ranges = [[66600, 66605]]\n", | ||
"res11 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n", | ||
"plt.figure()\n", | ||
"res11.plot()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c2d82124", | ||
"metadata": {}, | ||
"source": [ | ||
"To mitigate this problem, we can add some randomness to the data, and re-distribute events into the gaps in-between bins. This is also termed `dithering` and e.g. known from image manipulation. The important factor is to add the right amount and right type of random distribution, to end up at a quasi-continous uniform distribution, but not lose information.\n", | ||
"\n", | ||
"We can use the add_jitter function for this. We can pass it the colums to add jitter to, and the amplitude of a uniform jitter. Importantly, this step should be taken in the very beginning as first step before any dataframe operations are added.\n", | ||
"\n", | ||
"Let's try with a value of 0.2 for the amplitude:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "f728aa81", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"df_backup = sp.dataframe\n", | ||
"sp.add_jitter(cols=[\"t\"], amps=[0.2])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "ae6a2f93", | ||
"metadata": {}, | ||
"source": [ | ||
"We see that the `t` column is no longer integer-valued:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "e60e6c35", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"sp.dataframe.head()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "10e28a62", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"axes = ['t']\n", | ||
"bins = [200]\n", | ||
"ranges = [[66600, 66605]]\n", | ||
"res12 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n", | ||
"plt.figure()\n", | ||
"res11.plot(label=\"not jittered\")\n", | ||
"res12.plot(label=\"amplitude 0.2\")\n", | ||
"plt.legend()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "dad0f31f", | ||
"metadata": {}, | ||
"source": [ | ||
"This is clearly not enough jitter to close the gaps. The ideal (and default) amplitude is 0.5, which exactly fills the gaps:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "84879f75", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"sp.dataframe = df_backup\n", | ||
"sp.add_jitter(cols=[\"t\"], amps=[0.5])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "2fcb6361", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"axes = ['t']\n", | ||
"bins = [200]\n", | ||
"ranges = [[66600, 66605]]\n", | ||
"res13 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n", | ||
"plt.figure()\n", | ||
"res11.plot(label=\"not jittered\")\n", | ||
"res12.plot(label=\"amplitude 0.2\")\n", | ||
"res13.plot(label=\"amplitude 0.5\")\n", | ||
"plt.legend()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "282994b8", | ||
"metadata": {}, | ||
"source": [ | ||
"This jittering fills the gaps, and produces a continous uniform distribution. Let's check again the longer-range binning that gave us the oscillations initially:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "801a1e3e", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"axes = ['t']\n", | ||
"bins = [150]\n", | ||
"ranges = [[66000, 67000]]\n", | ||
"res03 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n", | ||
"plt.figure()\n", | ||
"res01.plot(label=\"not jittered\")\n", | ||
"res03.plot(label=\"jittered\")\n", | ||
"plt.legend()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "bc64928b", | ||
"metadata": {}, | ||
"source": [ | ||
"Now, the artefacts are absent, and similarly will they be in any dataframe columns derived from a column jittered in such a way. Note that this only applies to data present in digital (i.e. machine-binned) format, and not to data that are intrinsically continous. \n", | ||
"\n", | ||
"Also note that too large or not well-aligned jittering amplitudes will\n", | ||
"- deteriorate your resolution along the jittered axis\n", | ||
"- will not solve the problem entirely:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "26cbbfce", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"sp.dataframe = df_backup\n", | ||
"sp.add_jitter(cols=[\"t\"], amps=[0.7])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "748509e2", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"axes = ['t']\n", | ||
"bins = [200]\n", | ||
"ranges = [[66600, 66605]]\n", | ||
"res14 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n", | ||
"plt.figure()\n", | ||
"res13.plot(label=\"Amplitude 0.5\")\n", | ||
"res14.plot(label=\"Amplitude 0.7\")\n", | ||
"plt.legend()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "307557a9", | ||
"metadata": {}, | ||
"source": [ | ||
"If the step-size of digitization is different from 1, the corresponding stepssize (half the distance between digitized values) can be adjusted as shown above.\n", | ||
"\n", | ||
"Also, alternatively also normally distributed noise can be added, which is less sensitive to the exact right amplitude, but will lead to mixing of neighboring voxels, and thus loss of resolution. Also, normally distributed noise is substantially more computation-intensive to generate. It can nevertheless be helpful in situations where e.g. the stepsize is non-uniform." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "c5603f47", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"sp.dataframe = df_backup\n", | ||
"sp.add_jitter(cols=[\"t\"], amps=[0.7], jitter_type=\"normal\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b64fd59b", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"axes = ['t']\n", | ||
"bins = [200]\n", | ||
"ranges = [[66600, 66605]]\n", | ||
"res15 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n", | ||
"plt.figure()\n", | ||
"res14.plot(label=\"Uniform, Amplitude 0.7\")\n", | ||
"res15.plot(label=\"Normal, Amplitude 0.7\")\n", | ||
"plt.legend()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "48ebf393", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"interpreter": { | ||
"hash": "728003ee06929e5fa5ff815d1b96bf487266025e4b7440930c6bf4536d02d243" | ||
}, | ||
"kernelspec": { | ||
"display_name": "python3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.6" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |