Merge pull request #13 from AxFoundation/master

merge upstream changes
jmosbacher · May 19, 2021 · 62473a4 · 62473a4
2 parents 9f84b26 + c1469c8
commit 62473a4
Show file tree

Hide file tree

Showing 28 changed files with 1,733 additions and 628 deletions.
diff --git a/.bumpversion.cfg b/.bumpversion.cfg
@@ -1,5 +1,5 @@
 [bumpversion]
-current_version = 0.13.9
+current_version = 0.15.1
 files = setup.py strax/__init__.py docs/source/conf.py
 commit = True
 tag = True
diff --git a/.travis.yml b/.travis.yml
@@ -24,13 +24,14 @@ jobs:
   include:
     - name: "Python 3.7"
       env: PYTHON=3.7 DEPLOY_ME=true
-    - name: "Python 3.7 numbaless (for coverage)"
-      env: PYTHON=3.7 NUMBA_DISABLE_JIT=1
+    - name: "Python 3.8 numbaless (for coverage)"
+      env: PYTHON=3.8 NUMBA_DISABLE_JIT=1
     - name: "Python 3.6 (legacy)"
       env: PYTHON=3.6
-    - name: "Python 3.8 (beta)"
+    - name: "Python 3.8"
       env: PYTHON=3.8
-
+    - name: "Python 3.9 (beta)"
+      env: PYTHON=3.9
 install:
   - wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
   - chmod +x miniconda.sh
@@ -45,7 +46,7 @@ install:
   - echo "download requirements from base_environment"
   - wget -O pre_requirements.txt https://raw.githubusercontent.com/XENONnT/base_environment/master/requirements.txt &> /dev/null
   - echo "select important dependencies for strax(en)"
-  - cat pre_requirements.txt | grep 'numpy\|numba\|scikit-learn\|coveralls\|pandas' &> sel_pre_requirements.txt
+  - cat pre_requirements.txt | grep 'numpy\|numba\|scikit-learn\|coveralls\|pandas\|zstd' &> sel_pre_requirements.txt
   - echo "Will pre-install:"
   - cat sel_pre_requirements.txt
   - echo "Start preinstall and rm pre-requirements:"

diff --git a/HISTORY.md b/HISTORY.md
@@ -1,3 +1,36 @@
+0.15.1 / 2021-05-04
+---------------------
+- Refactor hitlets (#430, #436)
+- Update classifiers for pipy #437
+- Allow Py39 in travis tests (#427) 
+
+0.15.0 / 2021-04-16
+---------------------
+- Use int32 for peak dt, fix #397 (#403, #426)
+- max peak duration (#420)
+- Loopplugin touching windows + plugin documentation (#424)
+- Move apply selection from context to utils (#425)
+- Context testing functions + copy_to_frontend documented (#423)
+- Apply function to data & test (#422)
+
+0.14.0 / 2021-04-09
+---------------------
+- Check data availability for single run (#416) 
+
+0.13.11 / 2021-04-02
+---------------------
+- Allow re-compression at copy to frontend (#407)
+- Bug fix, in processing hitlets (#411)
+- Cleanup requirements for boto3 (#414)
+
+0.13.10 / 2021-03-24
+---------------------
+- Allow multiple targets to be computed simultaneously (#408, #409)
+- Numbafy split by containment (#402)
+- Infer start/stop from any dtype (#405)
+- Add property provided_dtypes to Context (#404)
+- Updated OverlapWindowPlugin docs (#401)
+
 0.13.9 / 2021-02-22
 ---------------------
 - Clip progress progressbar (#399)

diff --git a/docs/source/advanced/plugin_dev.rst b/docs/source/advanced/plugin_dev.rst
@@ -6,7 +6,7 @@ Special time fields
 The ``time``, ``endtime``, ``dt`` and ``length`` fields have special meaning for strax.
 
 It is useful for most plugins to output a ``time`` and ``endtime`` field, indicating the
-start and (exclusive) end time of the entitities you are producing.
+start and (exclusive) end time of the entities you are producing.
 If you do not do this, your plugin cannot be loaded for part of a run (e.g. with ``seconds_range``).
 
 Both ``time`` and ``endtime`` should be 64-bit integer timestamps in nanoseconds since the unix epoch. Instead of ``endtime``, you can provide ``dt`` (an integer time resolution in ns) and ``length`` (integer); strax will then compute the endtime as ``dt * length``. Lower-level datatypes commonly use this.
@@ -28,7 +28,7 @@ To return multiple outputs from a plugin:
 Options and defaults
 ----------------------
 
-You can specify options using the `strax.takes_config` decorator and the `strax.Option` objects. See any plugin source code for example (todo: don't be lazy and explain).
+You can specify options using the ``strax.takes_config`` decorator and the ``strax.Option`` objects. See any plugin source code for example (todo: don't be lazy and explain).
 
 There is a single configuration dictionary in a strax context, shared by all plugins. Be judicious in how you name your options to avoid clashes. "Threshold" is probably a bad name, "peak_min_channels" is better.
 
@@ -40,25 +40,158 @@ You can specify defaults in several ways:
 
 - ``default``: Use the given value as default.
 - ``default_factory``: Call the given function (with no arguments) to produce a default. Use for mutable values such as lists.
-- ``default_per_run``: Specify a list of 2-tuples: ``(start_run, default)``. Here start_run is a numerized run name (e.g 170118_1327; note the underscore is valid in integers since python 3.6) and ``default`` the option that applies from that run onwards.
+- ``default_per_run``: Specify a list of 2-tuples: ``(start_run, default)``. Here start_run is a numerized run name (e.g ``170118_1327``; note the underscore is valid in integers since python 3.6) and ``default`` the option that applies from that run onwards.
 - The ``strax_defaults`` dictionary in the run metadata. This overrides any defaults specified in the plugin code, but take care -- if you change a value here, there will be no record anywhere of what value was used previously, so you cannot reproduce your results anymore!
 
 
 Plugin types
 ----------------------
 
 There are several plugin types:
-   * `Plugin`: The general type of plugin. Should contain at least `depends_on = <datakind>`, `provides = <datatype>`, `def compute(self, <datakind>)`, and `dtype = <dtype> ` or `def infer_dtype(): <>`.
-   * `OverlapWindowPlugin`: Allows a plugin to look for data in adjacent chunks. A OverlapWindowPlugin assumes: all inputs are sorted by *endtime*. This only works for disjoint intervals such as peaks or events, but NOT records! The user has to define get_window_size(self) along with the plugin which returns the required chunk extension in nanoseconds. 
-   * `LoopPlugin`: Allows user to loop over a given datakind and find the corresponding data of a lower datakind using for example `def compute_loop(self, events, peaks)` where we loop over events and get the corresponding peaks that are within the time range of the event. Currently the second argument (`peaks`) must be fully contained in the first argument (`events` ).
-   * `CutPlugin`: Plugin type where using `def cut_by(self, <datakind>)` inside the plugin a user can return a boolean array that can be used to select data.
-   * `MergeOnlyPlugin`: This is for internal use and only merges two plugins into a new one. See as an example in straxen the `EventInfo` plugin where the following datatypes are merged `'events', 'event_basics', 'event_positions', 'corrected_areas', 'energy_estimates'`.
-   * `ParallelSourcePlugin`: For internal use only to parallelize the processing of low level plugins. This can be activated using stating `parallel = 'process'` in a plugin.
+   * ``Plugin``: The general type of plugin. Should contain at least ``depends_on = <datakind>``, ``provides = <datatype>``, ``def compute(self, <datakind>)``, and ``dtype = <dtype>`` or ``def infer_dtype(): <>``.
+   * ``OverlapWindowPlugin``: Allows a plugin to look for data in adjacent chunks. A ``OverlapWindowPlugin`` assumes all inputs are sorted by *endtime*. This only works for disjoint intervals such as peaks or events, but NOT records! The user has to define ``get_window_size(self)`` along with the plugin which returns the required chunk extension in nanoseconds.
+   * ``LoopPlugin``: Allows user to loop over a given datakind and find the corresponding data of a lower datakind using for example `def compute_loop(self, events, peaks)` where we loop over events and get the corresponding peaks that are within the time range of the event. By default the second argument (``peaks``) must be fully contained in the first argument (``events`` ). If a touching time window is desired set the class attribute ``time_selection`` to `'`touching'``.
+   * ``CutPlugin``: Plugin type where using ``def cut_by(self, <datakind>)`` inside the plugin a user can return a boolean array that can be used to select data.
+   * ``MergeOnlyPlugin``: This is for internal use and only merges two plugins into a new one. See as an example in straxen the ``EventInfo`` plugin where the following datatypes are merged ``'events', 'event_basics', 'event_positions', 'corrected_areas', 'energy_estimates'``.
+   * ``ParallelSourcePlugin``: For internal use only to parallelize the processing of low level plugins. This can be activated using stating ``parallel = 'process'`` in a plugin.
+
+
+Minimal examples
+----------------------
+Below, each of the plugins is minimally worked out, each plugin can be worked
+out into much greater detail, see e.g. the
+`plugins in straxen <https://github.com/XENONnT/straxen/tree/master/straxen/plugins>`_.
+
+strax.Plugin
+____________
+.. code-block:: python
+
+    # To tests, one can use these dummy Peaks and Records from strax
+    import strax
+    import numpy as np
+    from strax.testutils import Records, Peaks, run_id
+    st = strax.Context(register=[Records, Peaks])
+
+    class BasePlugin(strax.Plugin):
+        """The most common plugin where computations on data are performed in strax"""
+        depends_on = 'records'
+
+        # For good practice always specify the version and provide argument
+        provides = 'simple_data'
+        __version__ = '0.0.0'
+
+        # We need to specify the datatype, for this example, we are
+        # going to calculate some areas
+        dtype = strax.time_fields + [(("Total ADC counts",'area'), np.int32)]
+
+        def compute(self, records):
+            result = np.zeros(len(records), dtype=self.dtype)
+
+            # All data in strax must have some sort of time fields
+            result['time'] = records['time']
+            result['endtime'] = strax.endtime(records)
+
+            # For this example, we calculate the total sum of the records-data
+            result['area'] = np.sum(records['data'], axis = 1)
+            return result
+
+    st.register(BasePlugin)
+    st.get_df(run_id, 'simple_data')
+
+
+strax.OverlapWindowPlugin
+_________________________
+.. code-block:: python
+
+    class OverlapPlugin(strax.OverlapWindowPlugin):
+        """
+        Allow peaks get_window_size() left and right to get peaks
+            within the time range
+        """
+        depends_on = 'peaks'
+        provides = 'overlap_data'
+
+        dtype = strax.time_fields + [(("total peaks", 'n_peaks'), np.int16)]
+
+        def get_window_size(self):
+            # Look 10 ns left and right of each peak
+            return 10
+
+        def compute(self, peaks):
+            result = np.zeros(1, dtype=self.dtype)
+            result['time'] = np.min(peaks['time'])
+            result['endtime'] = np.max(strax.endtime(peaks))
+            result['n_peaks'] = len(peaks)
+            return result
+
+    st.register(OverlapPlugin)
+    st.get_df(run_id, 'overlap_data')
+
+
+strax.LoopPlugin
+__________
+.. code-block:: python
+
+    class LoopData(strax.LoopPlugin):
+        """Loop over peaks and find the records within each of those peaks."""
+        depends_on = 'peaks', 'records'
+        provides = 'looped_data'
+
+        dtype = strax.time_fields + [(("total records", 'n_records'), np.int16)]
+
+        # The LoopPlugin specific requirements
+        time_selection = 'fully_contained' # other option is 'touching'
+        loop_over = 'peaks'
+
+        # Use the compute_loop() instead of compute()
+        def compute_loop(self, peaks, records):
+            result = np.zeros(len(peaks), dtype=self.dtype)
+            result['time'] = np.min(peaks['time'])
+            result['endtime'] = np.max(strax.endtime(peaks))
+            result['n_records'] = len(records)
+            return result
+    st.register(LoopData)
+    st.get_df(run_id, 'looped_data')
+
+
+strax.CutPlugin
+_________________________
+.. code-block:: python
+
+    class CutData(strax.CutPlugin):
+        """
+        Create a boolean array if an entry passes a given cut,
+            in this case if the peak has a positive area
+        """
+        depends_on = 'peaks'
+        provides = 'cut_data'
+
+        # Use cut_by() instead of compute() to generate a boolean array
+        def cut_by(self, peaks):
+            return peaks['area']>0
+
+    st.register(CutData)
+    st.get_df(run_id, 'cut_data')
+
+
+strax.MergeOnlyPlugin
+________
+.. code-block:: python
+
+    class MergeData(strax.MergeOnlyPlugin):
+        """Merge datatypes of the same datakind into a single datatype"""
+        depends_on = ('peaks', 'cut_data')
+        provides = 'merged_data'
+
+        # You only need specify the dependencies, those are merged.
+
+    st.register(MergeData)
+    st.get_array(run_id, 'merged_data')
 
 
 Plugin inheritance
 ----------------------
-It is possible to inherit the `compute()` method of an already existing plugin with another plugin. We call these types of plugins child plugins. Child plugins are recognized by strax when the `child_plugin` attribute of the plugin is set to `True`. Below you can find a simple example of a child plugin with its parent plugin:
+It is possible to inherit the ``compute()`` method of an already existing plugin with another plugin. We call these types of plugins child plugins. Child plugins are recognized by strax when the ``child_plugin`` attribute of the plugin is set to ``True``. Below you can find a simple example of a child plugin with its parent plugin:
 
 .. code-block:: python
 
@@ -103,10 +236,10 @@ It is possible to inherit the `compute()` method of an already existing plugin w
             res['width'] = self.config['option_unique_child']
             return res
 
-The `super().compute()` statement in the `compute` method of `ChildPlugin` allows us to execute the code of the parent's compute method without duplicating it. Additionally, if needed, we can extend the code with some for the child-plugin unique computation steps.
+The ``super().compute()`` statement in the ``compute`` method of ``ChildPlugin`` allows us to execute the code of the parent's compute method without duplicating it. Additionally, if needed, we can extend the code with some for the child-plugin unique computation steps.
 
-To allow for the child plugin to have different settings then its parent (e.g. `'by_child_overwrite_option'` in `self.config['by_child_overwrite_option']` of the parent's `compute` method), we have to use specific child option. These options will be recognized by strax and overwrite the config values of the parent parameter during the initialization of the child-plugin. Hence, these changes only affect the child, but not the parent.
+To allow for the child plugin to have different settings then its parent (e.g. ``'by_child_overwrite_option'`` in ``self.config['by_child_overwrite_option']`` of the parent's ``compute`` method), we have to use specific child option. These options will be recognized by strax and overwrite the config values of the parent parameter during the initialization of the child-plugin. Hence, these changes only affect the child, but not the parent.
 
-An option can be flagged as a child option if the corresponding option attribute is set `child_option=True`. Further, the option name which should be overwritten must be specified via the option attribute `parent_option_name`.
+An option can be flagged as a child option if the corresponding option attribute is set ``child_option=True``. Further, the option name which should be overwritten must be specified via the option attribute ``parent_option_name``.
 
 The lineage of a child plugin contains in addition to its options the name and version of the parent plugin.