Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge upstream changes #13

Merged
merged 33 commits into from
May 19, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
f39b73b
bumpversion
JoranAngevaare Mar 24, 2021
3881354
Bump version: 0.13.9 → 0.13.10
JoranAngevaare Mar 24, 2021
8ccb170
Allow re-compression at copy to frontend (#407)
JoranAngevaare Mar 25, 2021
c0f128c
Patch concat hits (#411)
WenzDaniel Mar 29, 2021
40ddfe3
Cleanup requirements for boto3 (#414)
JoranAngevaare Mar 30, 2021
535fea3
Update HISTORY.md
WenzDaniel Apr 2, 2021
d1d9f42
Bump version: 0.13.10 → 0.13.11
WenzDaniel Apr 2, 2021
2710156
Merge pull request #415 from AxFoundation/make_release
WenzDaniel Apr 2, 2021
85e1610
Check data availability for single run (#416)
JoranAngevaare Apr 7, 2021
33f37ed
Update HISTORY.md
WenzDaniel Apr 9, 2021
e440b91
Bump version: 0.13.11 → 0.14.0
WenzDaniel Apr 9, 2021
cd30a4b
Merge pull request #419 from AxFoundation/make_release_2021_04_09
WenzDaniel Apr 9, 2021
8beb0ac
update apply function to data & test (#422)
JoranAngevaare Apr 12, 2021
6773c7d
context testing functions + copy_to_frontend documented (#423)
JoranAngevaare Apr 12, 2021
dfc6260
Move apply selection from context to utils (#425)
JoranAngevaare Apr 12, 2021
9e68d20
Loopplugin touching windows + plugin documentation (#424)
JoranAngevaare Apr 12, 2021
1ed4811
keep copy of targets for selection
JoranAngevaare Apr 12, 2021
25c823e
Add failing dt test - and solve it with max peak duration (#420)
JoranAngevaare Apr 12, 2021
d547825
Use int32 for peak dt, fix #397 (#403)
JoranAngevaare Apr 12, 2021
58544a7
quick fix for find_peak_groups (#426)
JoranAngevaare Apr 12, 2021
fd423cf
bumpversion
JoranAngevaare Apr 16, 2021
7d88f88
Bump version: 0.14.0 → 0.15.0
JoranAngevaare Apr 16, 2021
adf4b62
Allow Py39 in travis tests (#427)
JoranAngevaare Apr 23, 2021
e5b0b42
Refactor concat and get data (#430)
WenzDaniel Apr 30, 2021
56a3807
Refactor hitlets (#436)
WenzDaniel May 3, 2021
ce201b7
Update setup.py (#437)
JoranAngevaare May 4, 2021
5702998
Make release 20210504 (#438)
WenzDaniel May 4, 2021
1c6dc2b
Use zstd for from base-env for testing (#441)
JoranAngevaare May 17, 2021
ca734a3
use zstd v1.4
JoranAngevaare May 17, 2021
a0aa24d
Speed up run selection by ~100x for fixed defaults (#440)
JoranAngevaare May 17, 2021
95479b3
use zstd 1.4
JoranAngevaare May 17, 2021
a6b2b93
Update requirements.txt
JoranAngevaare May 17, 2021
c1469c8
Add MB/s pbar (#442)
JoranAngevaare May 18, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.13.9
current_version = 0.15.1
files = setup.py strax/__init__.py docs/source/conf.py
commit = True
tag = True
11 changes: 6 additions & 5 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,14 @@ jobs:
include:
- name: "Python 3.7"
env: PYTHON=3.7 DEPLOY_ME=true
- name: "Python 3.7 numbaless (for coverage)"
env: PYTHON=3.7 NUMBA_DISABLE_JIT=1
- name: "Python 3.8 numbaless (for coverage)"
env: PYTHON=3.8 NUMBA_DISABLE_JIT=1
- name: "Python 3.6 (legacy)"
env: PYTHON=3.6
- name: "Python 3.8 (beta)"
- name: "Python 3.8"
env: PYTHON=3.8

- name: "Python 3.9 (beta)"
env: PYTHON=3.9
install:
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
- chmod +x miniconda.sh
Expand All @@ -45,7 +46,7 @@ install:
- echo "download requirements from base_environment"
- wget -O pre_requirements.txt https://raw.githubusercontent.com/XENONnT/base_environment/master/requirements.txt &> /dev/null
- echo "select important dependencies for strax(en)"
- cat pre_requirements.txt | grep 'numpy\|numba\|scikit-learn\|coveralls\|pandas' &> sel_pre_requirements.txt
- cat pre_requirements.txt | grep 'numpy\|numba\|scikit-learn\|coveralls\|pandas\|zstd' &> sel_pre_requirements.txt
- echo "Will pre-install:"
- cat sel_pre_requirements.txt
- echo "Start preinstall and rm pre-requirements:"
Expand Down
33 changes: 33 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,36 @@
0.15.1 / 2021-05-04
---------------------
- Refactor hitlets (#430, #436)
- Update classifiers for pipy #437
- Allow Py39 in travis tests (#427)

0.15.0 / 2021-04-16
---------------------
- Use int32 for peak dt, fix #397 (#403, #426)
- max peak duration (#420)
- Loopplugin touching windows + plugin documentation (#424)
- Move apply selection from context to utils (#425)
- Context testing functions + copy_to_frontend documented (#423)
- Apply function to data & test (#422)

0.14.0 / 2021-04-09
---------------------
- Check data availability for single run (#416)

0.13.11 / 2021-04-02
---------------------
- Allow re-compression at copy to frontend (#407)
- Bug fix, in processing hitlets (#411)
- Cleanup requirements for boto3 (#414)

0.13.10 / 2021-03-24
---------------------
- Allow multiple targets to be computed simultaneously (#408, #409)
- Numbafy split by containment (#402)
- Infer start/stop from any dtype (#405)
- Add property provided_dtypes to Context (#404)
- Updated OverlapWindowPlugin docs (#401)

0.13.9 / 2021-02-22
---------------------
- Clip progress progressbar (#399)
Expand Down
159 changes: 146 additions & 13 deletions docs/source/advanced/plugin_dev.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Special time fields
The ``time``, ``endtime``, ``dt`` and ``length`` fields have special meaning for strax.

It is useful for most plugins to output a ``time`` and ``endtime`` field, indicating the
start and (exclusive) end time of the entitities you are producing.
start and (exclusive) end time of the entities you are producing.
If you do not do this, your plugin cannot be loaded for part of a run (e.g. with ``seconds_range``).

Both ``time`` and ``endtime`` should be 64-bit integer timestamps in nanoseconds since the unix epoch. Instead of ``endtime``, you can provide ``dt`` (an integer time resolution in ns) and ``length`` (integer); strax will then compute the endtime as ``dt * length``. Lower-level datatypes commonly use this.
Expand All @@ -28,7 +28,7 @@ To return multiple outputs from a plugin:
Options and defaults
----------------------

You can specify options using the `strax.takes_config` decorator and the `strax.Option` objects. See any plugin source code for example (todo: don't be lazy and explain).
You can specify options using the ``strax.takes_config`` decorator and the ``strax.Option`` objects. See any plugin source code for example (todo: don't be lazy and explain).

There is a single configuration dictionary in a strax context, shared by all plugins. Be judicious in how you name your options to avoid clashes. "Threshold" is probably a bad name, "peak_min_channels" is better.

Expand All @@ -40,25 +40,158 @@ You can specify defaults in several ways:

- ``default``: Use the given value as default.
- ``default_factory``: Call the given function (with no arguments) to produce a default. Use for mutable values such as lists.
- ``default_per_run``: Specify a list of 2-tuples: ``(start_run, default)``. Here start_run is a numerized run name (e.g 170118_1327; note the underscore is valid in integers since python 3.6) and ``default`` the option that applies from that run onwards.
- ``default_per_run``: Specify a list of 2-tuples: ``(start_run, default)``. Here start_run is a numerized run name (e.g ``170118_1327``; note the underscore is valid in integers since python 3.6) and ``default`` the option that applies from that run onwards.
- The ``strax_defaults`` dictionary in the run metadata. This overrides any defaults specified in the plugin code, but take care -- if you change a value here, there will be no record anywhere of what value was used previously, so you cannot reproduce your results anymore!


Plugin types
----------------------

There are several plugin types:
* `Plugin`: The general type of plugin. Should contain at least `depends_on = <datakind>`, `provides = <datatype>`, `def compute(self, <datakind>)`, and `dtype = <dtype> ` or `def infer_dtype(): <>`.
* `OverlapWindowPlugin`: Allows a plugin to look for data in adjacent chunks. A OverlapWindowPlugin assumes: all inputs are sorted by *endtime*. This only works for disjoint intervals such as peaks or events, but NOT records! The user has to define get_window_size(self) along with the plugin which returns the required chunk extension in nanoseconds.
* `LoopPlugin`: Allows user to loop over a given datakind and find the corresponding data of a lower datakind using for example `def compute_loop(self, events, peaks)` where we loop over events and get the corresponding peaks that are within the time range of the event. Currently the second argument (`peaks`) must be fully contained in the first argument (`events` ).
* `CutPlugin`: Plugin type where using `def cut_by(self, <datakind>)` inside the plugin a user can return a boolean array that can be used to select data.
* `MergeOnlyPlugin`: This is for internal use and only merges two plugins into a new one. See as an example in straxen the `EventInfo` plugin where the following datatypes are merged `'events', 'event_basics', 'event_positions', 'corrected_areas', 'energy_estimates'`.
* `ParallelSourcePlugin`: For internal use only to parallelize the processing of low level plugins. This can be activated using stating `parallel = 'process'` in a plugin.
* ``Plugin``: The general type of plugin. Should contain at least ``depends_on = <datakind>``, ``provides = <datatype>``, ``def compute(self, <datakind>)``, and ``dtype = <dtype>`` or ``def infer_dtype(): <>``.
* ``OverlapWindowPlugin``: Allows a plugin to look for data in adjacent chunks. A ``OverlapWindowPlugin`` assumes all inputs are sorted by *endtime*. This only works for disjoint intervals such as peaks or events, but NOT records! The user has to define ``get_window_size(self)`` along with the plugin which returns the required chunk extension in nanoseconds.
* ``LoopPlugin``: Allows user to loop over a given datakind and find the corresponding data of a lower datakind using for example `def compute_loop(self, events, peaks)` where we loop over events and get the corresponding peaks that are within the time range of the event. By default the second argument (``peaks``) must be fully contained in the first argument (``events`` ). If a touching time window is desired set the class attribute ``time_selection`` to `'`touching'``.
* ``CutPlugin``: Plugin type where using ``def cut_by(self, <datakind>)`` inside the plugin a user can return a boolean array that can be used to select data.
* ``MergeOnlyPlugin``: This is for internal use and only merges two plugins into a new one. See as an example in straxen the ``EventInfo`` plugin where the following datatypes are merged ``'events', 'event_basics', 'event_positions', 'corrected_areas', 'energy_estimates'``.
* ``ParallelSourcePlugin``: For internal use only to parallelize the processing of low level plugins. This can be activated using stating ``parallel = 'process'`` in a plugin.


Minimal examples
----------------------
Below, each of the plugins is minimally worked out, each plugin can be worked
out into much greater detail, see e.g. the
`plugins in straxen <https://github.com/XENONnT/straxen/tree/master/straxen/plugins>`_.

strax.Plugin
____________
.. code-block:: python

# To tests, one can use these dummy Peaks and Records from strax
import strax
import numpy as np
from strax.testutils import Records, Peaks, run_id
st = strax.Context(register=[Records, Peaks])

class BasePlugin(strax.Plugin):
"""The most common plugin where computations on data are performed in strax"""
depends_on = 'records'

# For good practice always specify the version and provide argument
provides = 'simple_data'
__version__ = '0.0.0'

# We need to specify the datatype, for this example, we are
# going to calculate some areas
dtype = strax.time_fields + [(("Total ADC counts",'area'), np.int32)]

def compute(self, records):
result = np.zeros(len(records), dtype=self.dtype)

# All data in strax must have some sort of time fields
result['time'] = records['time']
result['endtime'] = strax.endtime(records)

# For this example, we calculate the total sum of the records-data
result['area'] = np.sum(records['data'], axis = 1)
return result

st.register(BasePlugin)
st.get_df(run_id, 'simple_data')


strax.OverlapWindowPlugin
_________________________
.. code-block:: python

class OverlapPlugin(strax.OverlapWindowPlugin):
"""
Allow peaks get_window_size() left and right to get peaks
within the time range
"""
depends_on = 'peaks'
provides = 'overlap_data'

dtype = strax.time_fields + [(("total peaks", 'n_peaks'), np.int16)]

def get_window_size(self):
# Look 10 ns left and right of each peak
return 10

def compute(self, peaks):
result = np.zeros(1, dtype=self.dtype)
result['time'] = np.min(peaks['time'])
result['endtime'] = np.max(strax.endtime(peaks))
result['n_peaks'] = len(peaks)
return result

st.register(OverlapPlugin)
st.get_df(run_id, 'overlap_data')


strax.LoopPlugin
__________
.. code-block:: python

class LoopData(strax.LoopPlugin):
"""Loop over peaks and find the records within each of those peaks."""
depends_on = 'peaks', 'records'
provides = 'looped_data'

dtype = strax.time_fields + [(("total records", 'n_records'), np.int16)]

# The LoopPlugin specific requirements
time_selection = 'fully_contained' # other option is 'touching'
loop_over = 'peaks'

# Use the compute_loop() instead of compute()
def compute_loop(self, peaks, records):
result = np.zeros(len(peaks), dtype=self.dtype)
result['time'] = np.min(peaks['time'])
result['endtime'] = np.max(strax.endtime(peaks))
result['n_records'] = len(records)
return result
st.register(LoopData)
st.get_df(run_id, 'looped_data')


strax.CutPlugin
_________________________
.. code-block:: python

class CutData(strax.CutPlugin):
"""
Create a boolean array if an entry passes a given cut,
in this case if the peak has a positive area
"""
depends_on = 'peaks'
provides = 'cut_data'

# Use cut_by() instead of compute() to generate a boolean array
def cut_by(self, peaks):
return peaks['area']>0

st.register(CutData)
st.get_df(run_id, 'cut_data')


strax.MergeOnlyPlugin
________
.. code-block:: python

class MergeData(strax.MergeOnlyPlugin):
"""Merge datatypes of the same datakind into a single datatype"""
depends_on = ('peaks', 'cut_data')
provides = 'merged_data'

# You only need specify the dependencies, those are merged.

st.register(MergeData)
st.get_array(run_id, 'merged_data')


Plugin inheritance
----------------------
It is possible to inherit the `compute()` method of an already existing plugin with another plugin. We call these types of plugins child plugins. Child plugins are recognized by strax when the `child_plugin` attribute of the plugin is set to `True`. Below you can find a simple example of a child plugin with its parent plugin:
It is possible to inherit the ``compute()`` method of an already existing plugin with another plugin. We call these types of plugins child plugins. Child plugins are recognized by strax when the ``child_plugin`` attribute of the plugin is set to ``True``. Below you can find a simple example of a child plugin with its parent plugin:

.. code-block:: python

Expand Down Expand Up @@ -103,10 +236,10 @@ It is possible to inherit the `compute()` method of an already existing plugin w
res['width'] = self.config['option_unique_child']
return res

The `super().compute()` statement in the `compute` method of `ChildPlugin` allows us to execute the code of the parent's compute method without duplicating it. Additionally, if needed, we can extend the code with some for the child-plugin unique computation steps.
The ``super().compute()`` statement in the ``compute`` method of ``ChildPlugin`` allows us to execute the code of the parent's compute method without duplicating it. Additionally, if needed, we can extend the code with some for the child-plugin unique computation steps.

To allow for the child plugin to have different settings then its parent (e.g. `'by_child_overwrite_option'` in `self.config['by_child_overwrite_option']` of the parent's `compute` method), we have to use specific child option. These options will be recognized by strax and overwrite the config values of the parent parameter during the initialization of the child-plugin. Hence, these changes only affect the child, but not the parent.
To allow for the child plugin to have different settings then its parent (e.g. ``'by_child_overwrite_option'`` in ``self.config['by_child_overwrite_option']`` of the parent's ``compute`` method), we have to use specific child option. These options will be recognized by strax and overwrite the config values of the parent parameter during the initialization of the child-plugin. Hence, these changes only affect the child, but not the parent.

An option can be flagged as a child option if the corresponding option attribute is set `child_option=True`. Further, the option name which should be overwritten must be specified via the option attribute `parent_option_name`.
An option can be flagged as a child option if the corresponding option attribute is set ``child_option=True``. Further, the option name which should be overwritten must be specified via the option attribute ``parent_option_name``.

The lineage of a child plugin contains in addition to its options the name and version of the parent plugin.
Loading