diff --git a/.github/workflows/documentation.yml b/.github/workflows/documentation.yml index 421789ff..867c8628 100644 --- a/.github/workflows/documentation.yml +++ b/.github/workflows/documentation.yml @@ -76,8 +76,8 @@ jobs: - name: Move cached executed notebooks run: python docs/move_changed_notebooks.py - - name: Download datasets - run: poetry run python docs/download_datasets.py + # - name: Download datasets + # run: poetry run python docs/download_datasets.py # - name: Build Flash parquet files # run: | @@ -96,7 +96,14 @@ jobs: docs/tutorial/*ipynb docs/tutorial/tutorial_checksums.txt key: ${{ runner.os }}-docs-build-${{ hashFiles('docs/tutorial/tutorial_checksums.txt') }} - + # upload the cache as artifact + - name: Upload cache + uses: actions/upload-artifact@v2 + with: + path: | + _build/ + docs/tutorial/*ipynb + docs/tutorial/tutorial_checksums.txt - name: Setup Pages uses: actions/configure-pages@v4 diff --git a/tutorial/2_conversion_pipeline_for_example_time-resolved_ARPES_data.ipynb b/tutorial/2_conversion_pipeline_for_example_time-resolved_ARPES_data.ipynb deleted file mode 100644 index af695478..00000000 --- a/tutorial/2_conversion_pipeline_for_example_time-resolved_ARPES_data.ipynb +++ /dev/null @@ -1,669 +0,0 @@ -{ - "cells": [ - { - "attachments": {}, - "cell_type": "markdown", - "id": "8ad4167a-e4e7-498d-909a-c04da9f177ed", - "metadata": { - "tags": [] - }, - "source": [ - "# Demonstration of the conversion pipeline using time-resolved ARPES data stored on Zenodo\n", - "In this example, we pull some time-resolved ARPES data from Zenodo, and load it into the sed package using functions of the mpes package. Then, we run a conversion pipeline on it, containing steps for visualizing the channels, correcting image distortions, calibrating the momentum space, correcting for energy distortions and calibrating the energy axis. Finally, the data are binned in calibrated axes.\n", - "For performance reasons, best store the data on a locally attached storage (no network drive). This can also be achieved transparently using the included MirrorUtil class." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "fb045e17-fa89-4c11-9d51-7f06e80d96d5", - "metadata": {}, - "outputs": [], - "source": [ - "%load_ext autoreload\n", - "%autoreload 2\n", - "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - "import sed\n", - "from sed.dataset import dataset\n", - "\n", - "%matplotlib widget" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "42a6afaa-17dd-4637-ba75-a28c4ead1adf", - "metadata": {}, - "source": [ - "## Load Data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "34f46d54", - "metadata": {}, - "outputs": [], - "source": [ - "dataset.get(\"WSe2\") # Put in Path to a storage of at least 20 Gbyte free space.\n", - "data_path = dataset.dir # This is the path to the data\n", - "scandir, caldir = dataset.subdirs # scandir contains the data, caldir contains the calibration files" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f1f82054", - "metadata": {}, - "outputs": [], - "source": [ - "# create sed processor using the config file:\n", - "sp = sed.SedProcessor(folder=scandir, config=\"../sed/config/mpes_example_config.yaml\", verbose=True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "50d0a3b3", - "metadata": {}, - "outputs": [], - "source": [ - "# Apply jittering to X, Y, t, ADC columns.\n", - "# Columns are defined in the config, or can be provided as list.\n", - "sp.add_jitter()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "0a0d336b", - "metadata": {}, - "outputs": [], - "source": [ - "# Plot of the count rate through the scan\n", - "rate, secs = sp.loader.get_count_rate(range(100))\n", - "plt.plot(secs, rate)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "dfb42777", - "metadata": {}, - "outputs": [], - "source": [ - "# The time elapsed in the scan\n", - "sp.loader.get_elapsed_time()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "fb074f29", - "metadata": {}, - "outputs": [], - "source": [ - "# Inspect data in dataframe Columns:\n", - "# axes = ['X', 'Y', 't', 'ADC']\n", - "# bins = [100, 100, 100, 100]\n", - "# ranges = [(0, 1800), (0, 1800), (130000, 140000), (0, 9000)]\n", - "# sp.view_event_histogram(dfpid=1, axes=axes, bins=bins, ranges=ranges)\n", - "sp.view_event_histogram(dfpid=2)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "70aa4343", - "metadata": {}, - "source": [ - "## Distortion correction and Momentum Calibration workflow\n", - "### Distortion correction\n", - "#### 1. step: \n", - "Bin and load part of the dataframe in detector coordinates, and choose energy plane where high-symmetry points can well be identified. Either use the interactive tool, or pre-select the range:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "76bf8aad", - "metadata": {}, - "outputs": [], - "source": [ - "#sp.bin_and_load_momentum_calibration(df_partitions=20, plane=170)\n", - "sp.bin_and_load_momentum_calibration(df_partitions=100, plane=33, width=10, apply=True)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "fee3ca76", - "metadata": {}, - "source": [ - "#### 2. Step:\n", - "Next, we select a number of features corresponding to the rotational symmetry of the material, plus the center. These can either be auto-detected (for well-isolated points), or provided as a list (these can be read-off the graph in the cell above).\n", - "These are then symmetrized according to the rotational symmetry, and a spline-warping correction for the x/y coordinates is calculated, which corrects for any geometric distortions from the perfect n-fold rotational symmetry." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "fd9666c5", - "metadata": {}, - "outputs": [], - "source": [ - "#features = np.array([[203.2, 341.96], [299.16, 345.32], [350.25, 243.70], [304.38, 149.88], [199.52, 152.48], [154.28, 242.27], [248.29, 248.62]])\n", - "#sp.define_features(features=features, rotation_symmetry=6, include_center=True, apply=True)\n", - "# Manual selection: Use a GUI tool to select peaks:\n", - "#sp.define_features(rotation_symmetry=6, include_center=True)\n", - "# Autodetect: Uses the DAOStarFinder routine to locate maxima.\n", - "# Parameters are:\n", - "# fwhm: Full-width at half maximum of peaks.\n", - "# sigma: Number of standard deviations above the mean value of the image peaks must have.\n", - "# sigma_radius: number of standard deviations around a peak that peaks are fitted\n", - "sp.define_features(rotation_symmetry=6, auto_detect=True, include_center=True, fwhm=10, sigma=12, sigma_radius=4, apply=True)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "f7519ff8", - "metadata": {}, - "source": [ - "#### 3. Step: \n", - "Generate nonlinear correction using splinewarp algorithm. If no landmarks have been defined in previous step, default parameters from the config are used" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "d27cd7a4", - "metadata": {}, - "outputs": [], - "source": [ - "# Option whether a central point shall be fixed in the determiantion fo the correction\n", - "sp.generate_splinewarp(include_center=True)" - ] - }, - { - "cell_type": "markdown", - "id": "4211ac21", - "metadata": {}, - "source": [ - "#### Optional (Step 3a): \n", - "Save distortion correction parameters to configuration file in current data folder: " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "7f32988f", - "metadata": {}, - "outputs": [], - "source": [ - "# Save generated distortion correction parameters for later reuse\n", - "sp.save_splinewarp()" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "b5e69ffa", - "metadata": {}, - "source": [ - "#### 4. Step:\n", - "To adjust scaling, position and orientation of the corrected momentum space image, you can apply further affine transformations to the distortion correction field. Here, first a postential scaling is applied, next a translation, and finally a rotation around the center of the image (defined via the config). One can either use an interactive tool, or provide the adjusted values and apply them directly." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "62abfa41", - "metadata": {}, - "outputs": [], - "source": [ - "#sp.pose_adjustment(xtrans=14, ytrans=18, angle=2)\n", - "sp.pose_adjustment(xtrans=8, ytrans=7, angle=-4, apply=True)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "a78a68e9", - "metadata": {}, - "source": [ - "#### 5. Step:\n", - "Finally, the momentum correction is applied to the dataframe, and corresponding meta data are stored" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "845f002d", - "metadata": {}, - "outputs": [], - "source": [ - "sp.apply_momentum_correction()" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "d9810488", - "metadata": {}, - "source": [ - "### Momentum calibration workflow\n", - "#### 1. Step:\n", - "First, the momentum scaling needs to be calibtrated. Either, one can provide the coordinates of one point outside the center, and provide its distane to the Brillouin zone center (which is assumed to be located in the center of the image), one can specify two points on the image and their distance (where the 2nd point marks the BZ center),or one can provide absolute k-coordinates of two distinct momentum points.\n", - "\n", - "If no points are provided, an interactive tool is created. Here, left mouse click selectes the off-center point (brillouin_zone_cetnered=True) or toggle-selects the off-center and center point." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "a358f07d", - "metadata": {}, - "outputs": [], - "source": [ - "k_distance = 2/np.sqrt(3)*np.pi/3.28 # k-distance of the K-point in a hexagonal Brilloiun zone\n", - "#sp.calibrate_momentum_axes(k_distance = k_distance)\n", - "point_a = [308, 345]\n", - "sp.calibrate_momentum_axes(point_a=point_a, k_distance = k_distance, apply=True)\n", - "#point_b = [247, 249]\n", - "#sp.calibrate_momentum_axes(point_a=point_a, point_b = point_b, k_coord_a = [.5, 1.1], k_coord_b = [0, 0], equiscale=False)" - ] - }, - { - "cell_type": "markdown", - "id": "1a3697b1", - "metadata": {}, - "source": [ - "#### Optional (Step 1a): \n", - "Save momentum calibration parameters to configuration file in current data folder: " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "86bedfa7", - "metadata": {}, - "outputs": [], - "source": [ - "# Save generated momentum calibration parameters for later reuse\n", - "sp.save_momentum_calibration()" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "c2f8a513", - "metadata": {}, - "source": [ - "#### 2. Step:\n", - "Now, the distortion correction and momentum calibration needs to be applied to the dataframe." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f9ae5066", - "metadata": {}, - "outputs": [], - "source": [ - "sp.apply_momentum_calibration()" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "0bce2388", - "metadata": {}, - "source": [ - "## Energy Correction (optional)\n", - "The purpose of the energy correction is to correct for any momentum-dependent distortion of the energy axis, e.g. from geometric effects in the flight tube, or from space charge" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "5289de59", - "metadata": {}, - "source": [ - "#### 1st step:\n", - "Here, one can select the functional form to be used, and adjust its parameters. The binned data used for the momentum calibration is plotted around the Fermi energy (defined by tof_fermi), and the correction function is plotted ontop. Possible correction functions are: \"sperical\" (parameter: diameter), \"Lorentzian\" (parameter: gamma), \"Gaussian\" (parameter: sigma), and \"Lorentzian_asymmetric\" (parameters: gamma, amplitude2, gamma2).\n", - "\n", - "One can either use an interactive alignment tool, or provide parameters directly." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "72c6c8f9", - "metadata": {}, - "outputs": [], - "source": [ - "#sp.adjust_energy_correction(amplitude=2.5, center=(730, 730), gamma=920, tof_fermi = 66200)\n", - "sp.adjust_energy_correction(amplitude=2.5, center=(730, 730), gamma=920, tof_fermi = 66200, apply=True)" - ] - }, - { - "cell_type": "markdown", - "id": "e43fbf33", - "metadata": {}, - "source": [ - "#### Optional (Step 1a): \n", - "Save energy correction parameters to configuration file in current data folder: " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "7699e690", - "metadata": {}, - "outputs": [], - "source": [ - "# Save generated energy correction parameters for later reuse\n", - "sp.save_energy_correction()" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "41a6a3e6", - "metadata": {}, - "source": [ - "#### 2. Step\n", - "After adjustment, the energy correction is directly applied to the TOF axis." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "eb1e2bee", - "metadata": {}, - "outputs": [], - "source": [ - "sp.apply_energy_correction()" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "8b571b4c", - "metadata": {}, - "source": [ - "## 3. Energy calibration\n", - "For calibrating the energy axis, a set of data taken at different bias voltages around the value where the measurement was taken is required." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "6bc28642", - "metadata": {}, - "source": [ - "#### 1. Step:\n", - "In a first step, the data are loaded, binned along the TOF dimension, and normalized. The used bias voltages can be either provided, or read from attributes in the source files if present." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "9f44a586", - "metadata": {}, - "outputs": [], - "source": [ - "# Load energy calibration EDCs\n", - "energycalfolder = caldir\n", - "scans = np.arange(1,12)\n", - "voltages = np.arange(12,23,1)\n", - "files = [energycalfolder + r'/Scan' + str(num).zfill(3) + '_' + str(num+11) + '.h5' for num in scans]\n", - "sp.load_bias_series(data_files=files, normalize=True, biases=voltages, ranges=[(64000, 75000)])" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "314a79c8", - "metadata": {}, - "source": [ - "#### 2. Step:\n", - "Next, the same peak or feature needs to be selected in each curve. For this, one needs to define \"ranges\" for each curve, within which the peak of interest is located. One can either provide these ranges manually, or provide one range for a \"reference\" curve, and infer the ranges for the other curves using a dynamic time warping algorithm." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "2f843244", - "metadata": {}, - "outputs": [], - "source": [ - "# Option 1 = specify the ranges containing a common feature (e.g an equivalent peak) for all bias scans\n", - "# rg = [(129031.03103103103, 129621.62162162163), (129541.54154154155, 130142.14214214214), (130062.06206206206, 130662.66266266267), (130612.61261261262, 131213.21321321322), (131203.20320320321, 131803.8038038038), (131793.7937937938, 132384.38438438438), (132434.43443443443, 133045.04504504506), (133105.10510510512, 133715.71571571572), (133805.8058058058, 134436.43643643643), (134546.54654654654, 135197.1971971972)]\n", - "# sp.find_bias_peaks(ranges=rg, infer_others=False)\n", - "# Option 2 = specify the range for one curve and infer the others\n", - "# This will open an interactive tool to select the correct ranges for the curves.\n", - "# IMPORTANT: Don't choose the range too narrow about a peak, and choose a refid\n", - "# somewhere in the middle or towards larger biases!\n", - "rg = (66100, 67000)\n", - "sp.find_bias_peaks(ranges=rg, ref_id=5, infer_others=True, apply=True)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "b2638818", - "metadata": {}, - "source": [ - "#### 3. Step:\n", - "Next, the detected peak positions and bias voltages are used to determine the calibration function. This can be either done by fitting the functional form d^2/(t-t0)^2 via lmfit (\"lmfit\"), or using a polynomial approxiamtion (\"lstsq\" or \"lsqr\"). Here, one can also define a reference id, and a reference energy. Those define the absolute energy position of the feature used for calibration in the \"reference\" trace, at the bias voltage where the final measurement has been performed. The energy scale can be either \"kinetic\" (decreasing energy with increasing TOF), or \"binding\" (increasing energy with increasing TOF).\n", - "\n", - "After calculating the calibration, all traces corrected with the calibration are plotted ontop of each other, the calibration function together with the extracted features is plotted." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "22e15f5d", - "metadata": {}, - "outputs": [], - "source": [ - "# use the refid of the bias that the measurement was taken at\n", - "# Eref can be used to set the absolute energy (kinetic energy, E-EF) of the feature used for energy calibration (if known)\n", - "refid=4\n", - "Eref=-0.5\n", - "# the lmfit method uses a fit of (d/(t-t0))**2 to determine the energy calibration\n", - "# limits and starting values for the fitting parameters can be provided as dictionaries\n", - "sp.calibrate_energy_axis(\n", - " ref_id=refid,\n", - " ref_energy=Eref,\n", - " method=\"lmfit\",\n", - " energy_scale='kinetic',\n", - " d={'value':1.0,'min': .7, 'max':1.2, 'vary':True},\n", - " t0={'value':8e-7, 'min': 1e-7, 'max': 1e-6, 'vary':True},\n", - " E0={'value': 0., 'min': -100, 'max': 0, 'vary': True},\n", - " verbose=True,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "id": "df63c6c7", - "metadata": {}, - "source": [ - "#### Optional (Step 3a): \n", - "Save energy calibration parameters to configuration file in current data folder: " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "b870293c", - "metadata": {}, - "outputs": [], - "source": [ - "# Save generated energy calibration parameters for later reuse\n", - "sp.save_energy_calibration()" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "563709c7", - "metadata": {}, - "source": [ - "#### 4. Step:\n", - "Finally, the the energy axis is added to the dataframe." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c470ffd9", - "metadata": {}, - "outputs": [], - "source": [ - "sp.append_energy_axis()" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "b2d8cdf9", - "metadata": {}, - "source": [ - "## 4. Delay calibration:\n", - "The delay axis is calculated from the ADC input column based on the provided delay range. ALternatively, the delay scan range can also be extracted from attributes inside a source file, if present." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "0943d349", - "metadata": {}, - "outputs": [], - "source": [ - "#from pathlib import Path\n", - "#datafile = \"file.h5\"\n", - "#print(datafile)\n", - "#sp.calibrate_delay_axis(datafile=datafile)\n", - "delay_range = (-500, 1500)\n", - "sp.calibrate_delay_axis(delay_range=delay_range, preview=True)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "d9d0b018", - "metadata": {}, - "source": [ - "## 5. Visualization of calibrated histograms\n", - "With all calibrated axes present in the dataframe, we can visualize the corresponding histograms, and determine the respective binning ranges" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c330da64", - "metadata": {}, - "outputs": [], - "source": [ - "axes = ['kx', 'ky', 'energy', 'delay']\n", - "ranges = [[-3, 3], [-3, 3], [-6, 2], [-600, 1600]]\n", - "sp.view_event_histogram(dfpid=1, axes=axes, ranges=ranges)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "6902fd56-1456-4da6-83a4-0f3f6b831eb6", - "metadata": {}, - "source": [ - "## Define the binning ranges and compute calibrated data volume" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "a7601cd7-cd51-40a9-8fc7-8b7d32ff15d0", - "metadata": {}, - "outputs": [], - "source": [ - "axes = ['kx', 'ky', 'energy', 'delay']\n", - "bins = [100, 100, 200, 50]\n", - "ranges = [[-2, 2], [-2, 2], [-4, 2], [-600, 1600]]\n", - "res = sp.compute(bins=bins, axes=axes, ranges=ranges, normalize_to_acquisition_time=\"delay\")" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "523794dc", - "metadata": {}, - "source": [ - "## Some visualization:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "99d7d136-b677-4c16-bc8f-31ba8216579c", - "metadata": {}, - "outputs": [], - "source": [ - "fig, axs = plt.subplots(4, 1, figsize=(6, 18), constrained_layout=True)\n", - "res.loc[{'energy':slice(-.1, 0)}].sum(axis=(2,3)).T.plot(ax=axs[0])\n", - "res.loc[{'kx':slice(-.8, -.5)}].sum(axis=(0,3)).T.plot(ax=axs[1])\n", - "res.loc[{'ky':slice(-.2, .2)}].sum(axis=(1,3)).T.plot(ax=axs[2])\n", - "res.loc[{'kx':slice(-.8, -.5), 'energy':slice(.5, 2)}].sum(axis=(0,1)).plot(ax=axs[3])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "596a3217", - "metadata": {}, - "outputs": [], - "source": [ - "fig, ax = plt.subplots(1,1)\n", - "(sp._normalization_histogram*90000).plot(ax=ax)\n", - "sp._binned.sum(axis=(0,1,2)).plot(ax=ax)\n", - "plt.show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "05488944", - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "interpreter": { - "hash": "728003ee06929e5fa5ff815d1b96bf487266025e4b7440930c6bf4536d02d243" - }, - "kernelspec": { - "display_name": "python3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.6" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/tutorial/8_jittering_tutorial.ipynb b/tutorial/8_jittering_tutorial.ipynb deleted file mode 100644 index c2788a4b..00000000 --- a/tutorial/8_jittering_tutorial.ipynb +++ /dev/null @@ -1,384 +0,0 @@ -{ - "cells": [ - { - "attachments": {}, - "cell_type": "markdown", - "id": "8ad4167a-e4e7-498d-909a-c04da9f177ed", - "metadata": { - "tags": [] - }, - "source": [ - "# Correct use of Jittering\n", - "This tutorial discusses the background and correct use of the jittering/dithering method implemented in the package." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "fb045e17-fa89-4c11-9d51-7f06e80d96d5", - "metadata": {}, - "outputs": [], - "source": [ - "%load_ext autoreload\n", - "%autoreload 2\n", - "import matplotlib.pyplot as plt\n", - "\n", - "import sed\n", - "from sed.dataset import dataset\n", - "\n", - "%matplotlib widget" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "42a6afaa-17dd-4637-ba75-a28c4ead1adf", - "metadata": {}, - "source": [ - "## Load Data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "34f46d54", - "metadata": {}, - "outputs": [], - "source": [ - "dataset.get(\"WSe2\") # Put in Path to a storage of at least 20 Gbyte free space.\n", - "data_path = dataset.dir # This is the path to the data\n", - "scandir, _ = dataset.subdirs # scandir contains the data, _ contains the calibration files" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f1f82054", - "metadata": {}, - "outputs": [], - "source": [ - "# create sed processor using the config file:\n", - "sp = sed.SedProcessor(folder=scandir, config=\"../sed/config/mpes_example_config.yaml\")" - ] - }, - { - "cell_type": "markdown", - "id": "fb8c50c6", - "metadata": {}, - "source": [ - "After loading, the dataframe contains the four columns `X`, `Y`, `t`, `ADC`, which have all integer values. They originate from a time-to-digital converter, and correspond to digital \"bins\"." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "50d0a3b3", - "metadata": {}, - "outputs": [], - "source": [ - "sp.dataframe.head()" - ] - }, - { - "cell_type": "markdown", - "id": "121cb571", - "metadata": {}, - "source": [ - "Let's bin these data along the `t` dimension within a small range:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c5675401", - "metadata": {}, - "outputs": [], - "source": [ - "axes = ['t']\n", - "bins = [150]\n", - "ranges = [[66000, 67000]]\n", - "res01 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n", - "plt.figure()\n", - "res01.plot()" - ] - }, - { - "cell_type": "markdown", - "id": "de1d411c", - "metadata": {}, - "source": [ - "We notice some oscillation ontop of the data. These are re-binning artefacts, originating from a non-integer number of machine-bins per bin, as we can verify by binning with a different number of steps:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "21d89f2c", - "metadata": {}, - "outputs": [], - "source": [ - "axes = ['t']\n", - "bins = [100]\n", - "ranges = [[66000, 67000]]\n", - "res02 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n", - "plt.figure()\n", - "res01.plot(label=\"6.66/bin\")\n", - "res02.plot(label=\"10/bin\")\n", - "plt.legend()" - ] - }, - { - "cell_type": "markdown", - "id": "31855799", - "metadata": {}, - "source": [ - "If we have a very detailed look, with step-sizes smaller than one, we see the digital nature of the original data behind this issue:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e95053bf", - "metadata": {}, - "outputs": [], - "source": [ - "axes = ['t']\n", - "bins = [200]\n", - "ranges = [[66600, 66605]]\n", - "res11 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n", - "plt.figure()\n", - "res11.plot()" - ] - }, - { - "cell_type": "markdown", - "id": "c2d82124", - "metadata": {}, - "source": [ - "To mitigate this problem, we can add some randomness to the data, and re-distribute events into the gaps in-between bins. This is also termed `dithering` and e.g. known from image manipulation. The important factor is to add the right amount and right type of random distribution, to end up at a quasi-continous uniform distribution, but not lose information.\n", - "\n", - "We can use the add_jitter function for this. We can pass it the colums to add jitter to, and the amplitude of a uniform jitter. Importantly, this step should be taken in the very beginning as first step before any dataframe operations are added.\n", - "\n", - "Let's try with a value of 0.2 for the amplitude:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f728aa81", - "metadata": {}, - "outputs": [], - "source": [ - "df_backup = sp.dataframe\n", - "sp.add_jitter(cols=[\"t\"], amps=[0.2])" - ] - }, - { - "cell_type": "markdown", - "id": "ae6a2f93", - "metadata": {}, - "source": [ - "We see that the `t` column is no longer integer-valued:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e60e6c35", - "metadata": {}, - "outputs": [], - "source": [ - "sp.dataframe.head()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "10e28a62", - "metadata": {}, - "outputs": [], - "source": [ - "axes = ['t']\n", - "bins = [200]\n", - "ranges = [[66600, 66605]]\n", - "res12 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n", - "plt.figure()\n", - "res11.plot(label=\"not jittered\")\n", - "res12.plot(label=\"amplitude 0.2\")\n", - "plt.legend()" - ] - }, - { - "cell_type": "markdown", - "id": "dad0f31f", - "metadata": {}, - "source": [ - "This is clearly not enough jitter to close the gaps. The ideal (and default) amplitude is 0.5, which exactly fills the gaps:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "84879f75", - "metadata": {}, - "outputs": [], - "source": [ - "sp.dataframe = df_backup\n", - "sp.add_jitter(cols=[\"t\"], amps=[0.5])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "2fcb6361", - "metadata": {}, - "outputs": [], - "source": [ - "axes = ['t']\n", - "bins = [200]\n", - "ranges = [[66600, 66605]]\n", - "res13 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n", - "plt.figure()\n", - "res11.plot(label=\"not jittered\")\n", - "res12.plot(label=\"amplitude 0.2\")\n", - "res13.plot(label=\"amplitude 0.5\")\n", - "plt.legend()" - ] - }, - { - "cell_type": "markdown", - "id": "282994b8", - "metadata": {}, - "source": [ - "This jittering fills the gaps, and produces a continous uniform distribution. Let's check again the longer-range binning that gave us the oscillations initially:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "801a1e3e", - "metadata": {}, - "outputs": [], - "source": [ - "axes = ['t']\n", - "bins = [150]\n", - "ranges = [[66000, 67000]]\n", - "res03 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n", - "plt.figure()\n", - "res01.plot(label=\"not jittered\")\n", - "res03.plot(label=\"jittered\")\n", - "plt.legend()" - ] - }, - { - "cell_type": "markdown", - "id": "bc64928b", - "metadata": {}, - "source": [ - "Now, the artefacts are absent, and similarly will they be in any dataframe columns derived from a column jittered in such a way. Note that this only applies to data present in digital (i.e. machine-binned) format, and not to data that are intrinsically continous. \n", - "\n", - "Also note that too large or not well-aligned jittering amplitudes will\n", - "- deteriorate your resolution along the jittered axis\n", - "- will not solve the problem entirely:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "26cbbfce", - "metadata": {}, - "outputs": [], - "source": [ - "sp.dataframe = df_backup\n", - "sp.add_jitter(cols=[\"t\"], amps=[0.7])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "748509e2", - "metadata": {}, - "outputs": [], - "source": [ - "axes = ['t']\n", - "bins = [200]\n", - "ranges = [[66600, 66605]]\n", - "res14 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n", - "plt.figure()\n", - "res13.plot(label=\"Amplitude 0.5\")\n", - "res14.plot(label=\"Amplitude 0.7\")\n", - "plt.legend()" - ] - }, - { - "cell_type": "markdown", - "id": "307557a9", - "metadata": {}, - "source": [ - "If the step-size of digitization is different from 1, the corresponding stepssize (half the distance between digitized values) can be adjusted as shown above.\n", - "\n", - "Also, alternatively also normally distributed noise can be added, which is less sensitive to the exact right amplitude, but will lead to mixing of neighboring voxels, and thus loss of resolution. Also, normally distributed noise is substantially more computation-intensive to generate. It can nevertheless be helpful in situations where e.g. the stepsize is non-uniform." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c5603f47", - "metadata": {}, - "outputs": [], - "source": [ - "sp.dataframe = df_backup\n", - "sp.add_jitter(cols=[\"t\"], amps=[0.7], jitter_type=\"normal\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "b64fd59b", - "metadata": {}, - "outputs": [], - "source": [ - "axes = ['t']\n", - "bins = [200]\n", - "ranges = [[66600, 66605]]\n", - "res15 = sp.compute(bins=bins, axes=axes, ranges=ranges, df_partitions=20)\n", - "plt.figure()\n", - "res14.plot(label=\"Uniform, Amplitude 0.7\")\n", - "res15.plot(label=\"Normal, Amplitude 0.7\")\n", - "plt.legend()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "48ebf393", - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "interpreter": { - "hash": "728003ee06929e5fa5ff815d1b96bf487266025e4b7440930c6bf4536d02d243" - }, - "kernelspec": { - "display_name": "python3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.6" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -}