Merge pull request #454 from OpenCOMPES/spelling_fixes_for_main

Spelling fixes for main
OpenCOMPES · Jul 2, 2024 · 2bf9a13 · 2bf9a13
2 parents 66603d8 + 14c7ebb
commit 2bf9a13
Show file tree

Hide file tree

Showing 49 changed files with 656 additions and 217 deletions.
diff --git a/.cspell/custom-dictionary.txt b/.cspell/custom-dictionary.txt
diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
@@ -34,7 +34,7 @@ jobs:
       - name: Install project dependencies
         run: poetry install
 
-      # Run benchmakrs
+      # Run benchmarks
       - name: Run benchmarks on python 3.8
         run: |
           poetry run pytest --full-trace --show-capture=no -sv benchmarks/benchmark_*.py

diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml
@@ -22,15 +22,22 @@ jobs:
           python-version: 3.8
           poetry-version: 1.2.2
 
-      # Linting steps, excute all linters even if one fails
+      # Linting steps, execute all linters even if one fails
       - name: ruff
         run:
           poetry run ruff sed tests
-      - name: ruff formating
+      - name: ruff formatting
         if: ${{ always() }}
         run:
           poetry run ruff format --check sed tests
       - name: mypy
         if: ${{ always() }}
         run:
           poetry run mypy sed tests
+      - name: spellcheck
+        if: ${{ always() }}
+        uses: streetsidesoftware/cspell-action@v6
+        with:
+          check_dot_files: false
+          incremental_files_only: false
+          config: './cspell.json'
diff --git a/.github/workflows/update_dependencies.yml b/.github/workflows/update_dependencies.yml
@@ -1,4 +1,4 @@
-name: Update depencies in poetry lockfile
+name: Update dependencies in poetry lockfile
 
 on:
   schedule:

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -42,3 +42,7 @@ repos:
     rev: 0.6.0
     hooks:
     -   id: nbstripout
+-   repo: https://github.com/streetsidesoftware/cspell-cli
+    rev: v6.31.1
+    hooks:
+      - id: cspell
diff --git a/benchmarks/Binning Benchmarks.ipynb b/benchmarks/Binning Benchmarks.ipynb
@@ -10,7 +10,7 @@
    "source": [
     "# Binning demonstration on locally generated fake data\n",
     "In this example, we generate a table with random data simulating a single event dataset.\n",
-    "We showcase the binning method, first on a simple single table using the bin_partition method and then in the distributed mehthod bin_dataframe, using daks dataframes.\n",
+    "We showcase the binning method, first on a simple single table using the bin_partition method and then in the distributed method bin_dataframe, using daks dataframes.\n",
     "The first method is never really called directly, as it is simply the function called by the bin_dataframe on each partition of the dask dataframe."
    ]
   },
@@ -200,7 +200,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "data_path = '../../' # Put in Path to a storage of at least 20 Gbyte free space.\n",
+    "data_path = '../../' # Put in Path to a storage of at least 20 GByte free space.\n",
     "if not os.path.exists(data_path + \"/WSe2.zip\"):\n",
     "    os.system(f\"curl --output {data_path}/WSe2.zip https://zenodo.org/record/6369728/files/WSe2.zip\")\n",
     "if not os.path.isdir(data_path + \"/Scan049_1\") or not os.path.isdir(data_path + \"energycal_2019_01_08/\"):\n",

diff --git a/benchmarks/mpes_sed_benchmarks.ipynb b/benchmarks/mpes_sed_benchmarks.ipynb
@@ -48,7 +48,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "dataPath = '../../' # Put in Path to a storage of at least 20 Gbyte free space.\n",
+    "dataPath = '../../' # Put in Path to a storage of at least 20 GByte free space.\n",
     "if not os.path.exists(dataPath + \"/WSe2.zip\"):\n",
     "    os.system(f\"curl --output {dataPath}/WSe2.zip https://zenodo.org/record/6369728/files/WSe2.zip\")\n",
     "if not os.path.isdir(dataPath + \"/Scan049_1\") or not os.path.isdir(dataPath + \"energycal_2019_01_08/\"):\n",
@@ -106,7 +106,7 @@
    "metadata": {},
    "source": [
     "## compute distributed binning on the partitioned dask dataframe\n",
-    "We generated 100 dataframe partiions from the 100 files in the dataset, which we will bin parallelly with the dataframe binning function into a 3D grid"
+    "We generated 100 dataframe partitions from the 100 files in the dataset, which we will bin parallelly with the dataframe binning function into a 3D grid"
    ]
   },
   {

diff --git a/cspell.json b/cspell.json
@@ -0,0 +1,22 @@
+{
+    "version": "0.2",
+    "ignorePaths": [
+        "./tests/data/*",
+        "*.toml",
+        "Makefile",
+        "*.bat"
+    ],
+    "dictionaryDefinitions": [
+        {
+            "name": "custom-dictionary",
+            "path": "./.cspell/custom-dictionary.txt",
+            "addWords": true
+        }
+    ],
+    "dictionaries": [  "custom-dictionary"
+    ],
+    "words": [],
+    "ignoreWords": [],
+    "import": [],
+    "language": "en-GB, en-US"
+}
diff --git a/docs/misc/contributing.rst b/docs/misc/contributing.rst
@@ -73,7 +73,7 @@ Development Workflow
 
 3. **Write Tests:** If your contribution introduces new features or fixes a bug, add tests to cover your changes.
 
-4. **Run Tests:** To ensure no funtionality is broken, run the tests:
+4. **Run Tests:** To ensure no functionality is broken, run the tests:
 
     .. code-block:: bash
 

diff --git a/docs/misc/maintain.rst b/docs/misc/maintain.rst
@@ -140,7 +140,7 @@ To create a release, follow these steps:
    c. **If you don't see update on PyPI:**
 
       - Visit the GitHub Actions page and monitor the Release workflow (https://github.com/OpenCOMPES/sed/actions/workflows/release.yml).
-      - Check if errors occured.
+      - Check if errors occurred.
 
 
 **Understanding the Release Workflow**

diff --git a/docs/sed/config.rst b/docs/sed/config.rst
@@ -1,11 +1,11 @@
 Config
 ===================================================
-The config module contains a mechanis to collect configuration parameters from various sources and configuration files, and to combine them in a hierachical manner into a single, consistent configuration dictionary.
+The config module contains a mechanics to collect configuration parameters from various sources and configuration files, and to combine them in a hierarchical manner into a single, consistent configuration dictionary.
 It will load an (optional) provided config file, or alternatively use a passed python dictionary as initial config dictionary, and subsequently look for the following additional config files to load:
 
 * ``folder_config``: A config file of name :file:`sed_config.yaml` in the current working directory. This is mostly intended to pass calibration parameters of the workflow between different notebook instances.
-* ``user_config``: A config file provided by the user, stored as :file:`.sed/config.yaml` in the current user's home directly. This is intended to give a user the option for individual configuration modifications of system settings.
-* ``system_config``: A config file provided by the system administrator, stored as :file:`/etc/sed/config.yaml` on Linux-based systems, and :file:`%ALLUSERPROFILE%/sed/config.yaml` on Windows. This should provide all necessary default parameters for using the sed processor with a given setup. For an example for an mpes setup, see :ref:`example_config`
+* ``user_config``: A config file provided by the user, stored as :file:`.config/sed/config.yaml` in the current user's home directly. This is intended to give a user the option for individual configuration modifications of system settings.
+* ``system_config``: A config file provided by the system administrator, stored as :file:`/etc/sed/config.yaml` on Linux-based systems, and :file:`%ALLUSERSPROFILE%/sed/config.yaml` on Windows. This should provide all necessary default parameters for using the sed processor with a given setup. For an example for an mpes setup, see :ref:`example_config`
 * ``default_config``: The default configuration shipped with the package. Typically, all parameters here should be overwritten by any of the other configuration files.
 
 The config mechanism returns the combined dictionary, and reports the loaded configuration files. In order to disable or overwrite any of the configuration files, they can be also given as optional parameters (path to a file, or python dictionary).

diff --git a/docs/sed/dataset.rst b/docs/sed/dataset.rst
@@ -64,7 +64,7 @@ Setting the “use_existing” keyword to False allows to download the data in a
 Interrupting extraction has similar behavior to download and just continues from where it stopped.
 ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
 
-Or if user deletes the extracted documents, it reextracts from zip file
+Or if user deletes the extracted documents, it re-extracts from zip file
 '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
 
 .. code:: python

diff --git a/sed/binning/binning.py b/sed/binning/binning.py
@@ -52,7 +52,7 @@ def bin_partition(
                 - an integer describing the number of bins for all dimensions. This
                   requires "ranges" to be defined as well.
                 - A sequence containing one entry of the following types for each
-                  dimenstion:
+                  dimension:
 
                     - an integer describing the number of bins. This requires "ranges"
                       to be defined as well.
@@ -83,14 +83,14 @@ def bin_partition(
             jittering. To specify the jitter amplitude or method (normal or uniform
             noise) a dictionary can be passed. This should look like
             jitter={'axis':{'amplitude':0.5,'mode':'uniform'}}.
-            This example also shows the default behaviour, in case None is
+            This example also shows the default behavior, in case None is
             passed in the dictionary, or jitter is a list of strings.
             Warning: this is not the most performing approach. Applying jitter
             on the dataframe before calling the binning is much faster.
             Defaults to None.
         return_edges (bool, optional): If True, returns a list of D arrays
             describing the bin edges for each dimension, similar to the
-            behaviour of ``np.histogramdd``. Defaults to False.
+            behavior of ``np.histogramdd``. Defaults to False.
         skip_test (bool, optional): Turns off input check and data transformation.
             Defaults to False as it is intended for internal use only.
             Warning: setting this True might make error tracking difficult.
@@ -134,7 +134,7 @@ def bin_partition(
     else:
         bins = cast(List[int], bins)
         # shift ranges by half a bin size to align the bin centers to the given ranges,
-        # as the histogram functions interprete the ranges as limits for the edges.
+        # as the histogram functions interpret the ranges as limits for the edges.
         for i, nbins in enumerate(bins):
             halfbinsize = (ranges[i][1] - ranges[i][0]) / (nbins) / 2
             ranges[i] = (
@@ -234,7 +234,7 @@ def bin_dataframe(
                 - an integer describing the number of bins for all dimensions. This
                   requires "ranges" to be defined as well.
                 - A sequence containing one entry of the following types for each
-                  dimenstion:
+                  dimension:
 
                     - an integer describing the number of bins. This requires "ranges"
                       to be defined as well.
@@ -273,7 +273,7 @@ def bin_dataframe(
             jittering. To specify the jitter amplitude or method (normal or uniform
             noise) a dictionary can be passed. This should look like
             jitter={'axis':{'amplitude':0.5,'mode':'uniform'}}.
-            This example also shows the default behaviour, in case None is
+            This example also shows the default behavior, in case None is
             passed in the dictionary, or jitter is a list of strings.
             Warning: this is not the most performing approach. applying jitter
             on the dataframe before calling the binning is much faster.
@@ -479,7 +479,7 @@ def normalization_histogram_from_timed_dataframe(
     bin_centers: np.ndarray,
     time_unit: float,
 ) -> xr.DataArray:
-    """Get a normalization histogram from a timed datafram.
+    """Get a normalization histogram from a timed dataframe.
 
     Args:
         df (dask.dataframe.DataFrame): a dask.DataFrame on which to perform the

diff --git a/sed/binning/numba_bin.py b/sed/binning/numba_bin.py
@@ -24,7 +24,7 @@ def _hist_from_bin_range(
     bit integers.
 
     Args:
-        sample (np.ndarray): The data to be histogrammed with shape N,D.
+        sample (np.ndarray): The data to be histogram'd with shape N,D.
         bins (Sequence[int]): The number of bins for each dimension D.
         ranges (np.ndarray): A sequence of length D, each an optional (lower,
             upper) tuple giving the outer bin edges to be used if the edges are
@@ -49,7 +49,7 @@ def _hist_from_bin_range(
 
     for i in range(ndims):
         delta[i] = 1 / ((ranges[i, 1] - ranges[i, 0]) / bins[i])
-        strides[i] = hist.strides[i] // hist.itemsize  # pylint: disable=E1136
+        strides[i] = hist.strides[i] // hist.itemsize
 
     for t in range(sample.shape[0]):
         is_inside = True
@@ -157,7 +157,7 @@ def numba_histogramdd(
     bins: Union[int, Sequence[int], Sequence[np.ndarray], np.ndarray],
     ranges: Sequence = None,
 ) -> Tuple[np.ndarray, List[np.ndarray]]:
-    """Multidimensional histogramming function, powered by Numba.
+    """Multidimensional histogram function, powered by Numba.
 
     Behaves in total much like numpy.histogramdd. Returns uint32 arrays.
     This was chosen because it has a significant performance improvement over
@@ -167,7 +167,7 @@ def numba_histogramdd(
     sizes.
 
     Args:
-        sample (np.ndarray): The data to be histogrammed with shape N,D
+        sample (np.ndarray): The data to be histogram'd with shape N,D
         bins (Union[int, Sequence[int], Sequence[np.ndarray], np.ndarray]): The number
             of bins for each dimension D, or a sequence of bin edges on which to calculate
             the histogram.

diff --git a/sed/binning/utils.py b/sed/binning/utils.py
@@ -39,7 +39,7 @@ def simplify_binning_arguments(
                 - an integer describing the number of bins for all dimensions. This
                   requires "ranges" to be defined as well.
                 - A sequence containing one entry of the following types for each
-                  dimenstion:
+                  dimension:
 
                     - an integer describing the number of bins. This requires "ranges"
                       to be defined as well.
@@ -123,7 +123,7 @@ def simplify_binning_arguments(
                 f"Ranges must be a sequence, not {type(ranges)}.",
             )
 
-    # otherwise, all bins should by np.ndarrays here
+    # otherwise, all bins should be of type np.ndarray here
     elif all(isinstance(x, np.ndarray) for x in bins):
         bins = cast(List[np.ndarray], list(bins))
     else:

diff --git a/sed/calibrator/delay.py b/sed/calibrator/delay.py
@@ -103,7 +103,7 @@ def append_delay_axis(
 
         Returns:
             Union[pd.DataFrame, dask.dataframe.DataFrame]: dataframe with added column
-            and delay calibration metdata dictionary.
+            and delay calibration metadata dictionary.
         """
         # pylint: disable=duplicate-code
         if calibration is None:
@@ -407,7 +407,7 @@ def mm_to_ps(
     delay_mm: Union[float, np.ndarray],
     time0_mm: float,
 ) -> Union[float, np.ndarray]:
-    """Converts a delaystage position in mm into a relative delay in picoseconds
+    """Converts a delay stage position in mm into a relative delay in picoseconds
     (double pass).
 
     Args:

diff --git a/sed/calibrator/energy.py b/sed/calibrator/energy.py
@@ -446,7 +446,7 @@ def add_ranges(
             traces (np.ndarray, optional): Collection of energy dispersion curves.
                 Defaults to self.traces_normed.
             infer_others (bool, optional): Option to infer the feature detection range
-                in other traces from a given one using a time warp algorthm.
+                in other traces from a given one using a time warp algorithm.
                 Defaults to True.
             mode (str, optional): Specification on how to change the feature ranges
                 ('append' or 'replace'). Defaults to "replace".
@@ -1157,7 +1157,7 @@ def common_apply_func(apply: bool):  # noqa: ARG001
                 update(correction["amplitude"], x_center, y_center, diameter=correction["diameter"])
             except KeyError as exc:
                 raise ValueError(
-                    "Parameter 'diameter' required for correction type 'sperical', ",
+                    "Parameter 'diameter' required for correction type 'spherical', ",
                     "but not present!",
                 ) from exc
 
@@ -1339,7 +1339,7 @@ def apply_energy_correction(
                 Defaults to config["energy"]["correction_type"].
             amplitude (float, optional): Amplitude of the time-of-flight correction
                 term. Defaults to config["energy"]["correction"]["correction_type"].
-            correction (dict, optional): Correction dictionary containing paramters
+            correction (dict, optional): Correction dictionary containing parameters
                 for the correction. Defaults to self.correction or
                 config["energy"]["correction"].
             verbose (bool, optional): Option to print out diagnostic information.
@@ -1939,7 +1939,7 @@ def _datacheck_peakdetect(
     x_axis: np.ndarray,
     y_axis: np.ndarray,
 ) -> Tuple[np.ndarray, np.ndarray]:
-    """Input format checking for 1D peakdtect algorithm
+    """Input format checking for 1D peakdetect algorithm
 
     Args:
         x_axis (np.ndarray): x-axis array
@@ -2109,7 +2109,7 @@ def fit_energy_calibration(
         binwidth (float): Time width of each original TOF bin in ns.
         binning (int): Binning factor of the TOF values.
         ref_id (int, optional): Reference dataset index. Defaults to 0.
-        ref_energy (float, optional): Energy value of the feature in the refence
+        ref_energy (float, optional): Energy value of the feature in the reference
             trace (eV). required to output the calibration. Defaults to None.
         t (Union[List[float], np.ndarray], optional): Array of TOF values. Required
             to calculate calibration trace. Defaults to None.
@@ -2131,7 +2131,7 @@ def fit_energy_calibration(
     Returns:
         dict: A dictionary of fitting parameters including the following,
 
-        - "coeffs": Fitted function coefficents.
+        - "coeffs": Fitted function coefficients.
         - "axis": Fitted energy axis.
     """
     vals = np.asarray(vals)
@@ -2248,7 +2248,7 @@ def poly_energy_calibration(
             each EDC.
         order (int, optional): Polynomial order of the fitting function. Defaults to 3.
         ref_id (int, optional): Reference dataset index. Defaults to 0.
-        ref_energy (float, optional): Energy value of the feature in the refence
+        ref_energy (float, optional): Energy value of the feature in the reference
             trace (eV). required to output the calibration. Defaults to None.
         t (Union[List[float], np.ndarray], optional): Array of TOF values. Required
             to calculate calibration trace. Defaults to None.