Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic template for hpge post processing #2

Closed
wants to merge 91 commits into from
Closed
Show file tree
Hide file tree
Changes from 69 commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
578de78
basic template for hpge post processing
tdixon97 Oct 21, 2024
a6d3d12
style: pre-commit fixes
pre-commit-ci[bot] Oct 21, 2024
05b42e6
pc fixes
tdixon97 Oct 21, 2024
3dfbf23
style improvements
tdixon97 Oct 21, 2024
0bd5d6f
fix merge
tdixon97 Oct 21, 2024
8b1fd87
basic functionality for time-windowing
tdixon97 Oct 22, 2024
223a7d0
detectors file to config
tdixon97 Oct 22, 2024
5ded4c9
[fix] sort by time
tdixon97 Oct 22, 2024
4261348
[wip] adding functions for DL calculations
tdixon97 Oct 24, 2024
2f888fb
[wip] adding functions for DL calc and __init__ for hpge subpackage
tdixon97 Oct 24, 2024
ddea2d5
style: pre-commit fixes
pre-commit-ci[bot] Oct 24, 2024
ca94f10
Update __init__.py
tdixon97 Oct 24, 2024
adb1f6e
style: pre-commit fixes
pre-commit-ci[bot] Oct 24, 2024
534882c
Update src/reboost/hpge/processors.py
tdixon97 Oct 28, 2024
3d5a535
Update src/reboost/hpge/hit.py
tdixon97 Oct 28, 2024
39c587c
style: pre-commit fixes
pre-commit-ci[bot] Oct 28, 2024
78e130a
[wip] starting functionality for DLs
tdixon97 Oct 28, 2024
592bdfc
Merge branch 'main' of github.com:tdixon97/reboost into main
tdixon97 Oct 28, 2024
a2e2599
style: pre-commit fixes
pre-commit-ci[bot] Oct 28, 2024
16550ae
fix some parts of the docs
tdixon97 Oct 28, 2024
4c4f603
style: pre-commit fixes
pre-commit-ci[bot] Oct 28, 2024
f8778cc
generate proccesing chain from config file
tdixon97 Nov 7, 2024
0e7aaf5
bit of clean up / improved docs
tdixon97 Nov 7, 2024
5a9595b
[tests] add test for the windowing
tdixon97 Nov 7, 2024
63ca42b
style: pre-commit fixes
pre-commit-ci[bot] Nov 7, 2024
d65ffad
[tests] adding more tests
tdixon97 Nov 8, 2024
bfe9782
style: pre-commit fixes
pre-commit-ci[bot] Nov 8, 2024
670f5df
[tests] test merging arrays
tdixon97 Nov 8, 2024
4747230
Merge branch 'main' of github.com:tdixon97/reboost into main
tdixon97 Nov 8, 2024
cdc8767
style: pre-commit fixes
pre-commit-ci[bot] Nov 8, 2024
8e8de77
update to be able to read json or yaml
tdixon97 Nov 8, 2024
6cf37d6
fix merge
tdixon97 Nov 8, 2024
3dcc0f6
processor for distance to surface
tdixon97 Nov 10, 2024
0d22b95
[docs] improved documentation
tdixon97 Nov 12, 2024
83be05d
add hpges and pyg4ometry to the dependencies
tdixon97 Nov 12, 2024
a2e8d4a
add pyg4ometry
tdixon97 Nov 12, 2024
58e6f36
[docs] add legendtestdata to deps
tdixon97 Nov 12, 2024
8ec0df6
[tests] test on the whole of build_hit (IO)
tdixon97 Nov 12, 2024
ee048fa
precommit
tdixon97 Nov 12, 2024
410bb22
remove dependency
tdixon97 Nov 12, 2024
a4140bf
[tests] fix the test data
tdixon97 Nov 12, 2024
b615949
add the option to just read n evtid starting at a particular index.
tdixon97 Nov 14, 2024
42390b8
trying to fix tests
tdixon97 Nov 14, 2024
40329f5
style: pre-commit fixes
pre-commit-ci[bot] Nov 14, 2024
432bd70
add awkward to dependencies
tdixon97 Nov 14, 2024
8616562
Merge branch 'main' of github.com:tdixon97/reboost into main
tdixon97 Nov 14, 2024
dea22dd
update main.yaml
tdixon97 Nov 14, 2024
6237681
improving documentation
tdixon97 Nov 14, 2024
1370087
change FileInfo into class (cleaner)
tdixon97 Nov 14, 2024
54663e0
[docs] improve documentation and start working on locals option
tdixon97 Nov 14, 2024
097b9e6
add option to specify local objects in config
tdixon97 Nov 15, 2024
133da5e
ak.min to np.min for 1D array
tdixon97 Nov 15, 2024
9f0a9ba
style: pre-commit fixes
pre-commit-ci[bot] Nov 15, 2024
f075594
style fixes
tdixon97 Nov 15, 2024
9f70f59
Merge branch 'main' of github.com:tdixon97/reboost into main
tdixon97 Nov 15, 2024
cd16454
ak -> np to fix CI failures
tdixon97 Nov 15, 2024
a28d1a4
style: pre-commit fixes
pre-commit-ci[bot] Nov 15, 2024
b9bc3a4
[docs] adding a basic tutorial
tdixon97 Nov 16, 2024
28115f1
Merge branch 'main' of github.com:tdixon97/reboost into main
tdixon97 Nov 16, 2024
50017cc
style: pre-commit fixes
pre-commit-ci[bot] Nov 16, 2024
e301bd7
[docs] fix spelling
tdixon97 Nov 16, 2024
7293579
Merge branch 'main' of github.com:tdixon97/reboost into main
tdixon97 Nov 16, 2024
03e6088
[docs] fix
tdixon97 Nov 16, 2024
90112a2
[docs] remove nbspinx
tdixon97 Nov 16, 2024
e4feb8f
[docs] update conf.p
tdixon97 Nov 16, 2024
66c63fc
Update pyproject.toml
tdixon97 Nov 17, 2024
22c4b78
[docs] fix tutorial
tdixon97 Nov 17, 2024
c2bc963
[docs] fix
tdixon97 Nov 17, 2024
387ac50
[docs] more format fixes
tdixon97 Nov 17, 2024
e78e90e
clean up build hit
tdixon97 Nov 18, 2024
9530b65
first version of building tcm
tdixon97 Nov 18, 2024
78116a4
remove timing debug (cleanup)
tdixon97 Nov 18, 2024
0abd90f
[evt] first version of build_tcm code
tdixon97 Nov 18, 2024
d9509bc
[docs] small fix
tdixon97 Nov 18, 2024
abd9450
pre-commit
tdixon97 Nov 18, 2024
01cc000
[docs] fix build-hit docsring
tdixon97 Nov 19, 2024
f0d13cb
change evtid to _evtid and global_evtid to _global_evtid since its no…
tdixon97 Nov 20, 2024
f594231
additions to documentation
tdixon97 Nov 20, 2024
9509c2a
[docs] switch tutorials from rst to ipynb (easier to mantain)
tdixon97 Nov 20, 2024
df7e6d9
change to notebook for docs
tdixon97 Nov 20, 2024
ae1cea5
[docs] switch back to rst (dont want to run the notebooks)
tdixon97 Nov 20, 2024
ecd90e7
[evt] adding build_tcm functionality
tdixon97 Nov 21, 2024
d6b6d27
[docs] documentation for event tier
tdixon97 Nov 22, 2024
235aefb
[docs] documentation for event tier
tdixon97 Nov 22, 2024
c61ef07
[docs] remove nbsphinx
tdixon97 Nov 22, 2024
2d8c634
[docs] fix
tdixon97 Nov 22, 2024
0a97068
style: pre-commit fixes
pre-commit-ci[bot] Nov 22, 2024
5790828
[docs] ipython --> python
tdixon97 Nov 22, 2024
6383880
Merge branch 'main' of github.com:tdixon97/reboost into main
tdixon97 Nov 22, 2024
b779c4d
[docs] small fixes
tdixon97 Nov 22, 2024
e608760
Update .pre-commit-config.yaml
tdixon97 Nov 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install non-python (homebrew) dependencies
if: ${{ matrix.os == 'macOS-latest' }}
run: |
brew install opencascade cgal gmp mpfr boost
- name: Get dependencies and install reboost
run: |
python -m pip install --upgrade pip wheel setuptools
Expand Down
1 change: 1 addition & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
"myst_parser",
]


source_suffix = {
".rst": "restructuredtext",
".md": "markdown",
Expand Down
48 changes: 46 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,54 @@
Welcome to reboost's documentation!
==========================================

Table of Contents
-----------------
*reboost* is a python package for the post-processing of `remage <https://remage.readthedocs.io/en/stable/>`_ monte-carlo Simulations.

Getting started
---------------

*reboost* can be installed with *pip*:

.. code-block:: console

$ git clone [email protected]:legend-exp/reboost.git
$ cd reboost
$ pip install .

*reboost* is currently divided into two programs:
- *reboost-optical* for processing optical simulations,
- *reboost-hpge* for processing HPGe detector simulations.

Both can be run on the command line with:

.. code-block:: console

$ reboost-optical -h
$ reboost-hpge -h

Next steps
----------

.. toctree::
:maxdepth: 2

User Manual <manual/index>

.. toctree::
:maxdepth: 1

tutorial

.. toctree::
:maxdepth: 1

Package API reference <api/modules>


See also
--------
- `remage <https://remage.readthedocs.io/en/stable/>`_: Modern *Geant4* application for HPGe and LAr experiments,
- `legend-pygeom-hpges <https://legend-pygeom-hpges.readthedocs.io/en/latest/>`_: Package for handling HPGe detector geometry in python,
- `pyg4ometry <https://pyg4ometry.readthedocs.io/en/stable/>`_: Package to create simulation geometry in python,
- `legend-pygeom-optics <https://legend-pygeom-optics.readthedocs.io/en/stable/>`_: Package to handle optical properties in python,
- `legend-pygeom-l200 <https://github.com/legend-exp/legend-pygeom-l200>`_: Implementation of the LEGEND-200 experiment (**private**),
- `pyvertexgen <https://github.com/tdixon97/pyvertexgen/>`_: Generation of vertices for simulations.
285 changes: 285 additions & 0 deletions docs/source/manual/hpge.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
HPGe detector simulations
=========================

*reboost-hpge* is a sub-package for post-processing the high purity Germanium detector (HPGe) part of *remage* simulations.
It provides a flexible framework to implement a user customised post-processing.

Command line interface
----------------------

A command line tool *reboost-hpge* is created to run the processing.

.. code-block:: console

$ reboost-hpge -h

Different modes are implemented to run the tiers. For example to run the *hit* tier processing (more details in the next section).

.. code-block:: console

$ reboost-hpge hit -h


*remage* lh5 output format
--------------------------

The output simulations from *remage* are described in `remage-docs <https://remage.readthedocs.io/en/stable/output.html>`_.
By default two ``lgdo.Table`` `docs <https://legend-pydataobj.readthedocs.io/en/stable/api/lgdo.types.html#lgdo.types.table.Table>`_ are stored with the
following format:

.. code-block:: console

/
└── hit · HDF5 group
├── det000 · table{evtid,particle,edep,time,xloc,yloc,zloc}
│ ├── edep · array<1>{real}
│ ├── evtid · array<1>{real}
│ ├── particle · array<1>{real}
│ ├── time · array<1>{real}
│ ├── xloc · array<1>{real}
│ ├── yloc · array<1>{real}
│ └── zloc · array<1>{real}
├── det001 · table{evtid,particle,edep,time,xloc,yloc,zloc}
| ....
| ....
└── vertices · table{evtid,time,xloc,yloc,zloc,n_part}
├── evtid · array<1>{real}
├── n_part · array<1>{real}
├── time · array<1>{real}
├── xloc · array<1>{real}
├── yloc · array<1>{real}
└── zloc · array<1>{real}



One table is stored per sensitive Germanium detector and a Table of the vertices is also stored.
All the data is stored as (flat) 1D arrays.

- *edep*: energy deposited in Germanium (in keV).
- *evtid*: index of the simulated event,
- *particle*: Geant4 code for the particle type,
- *time*: time of the event relative to the start of the event,
- *xloc/yloc/xzloc*: Position of the interaction / vertex,
- *n_part*: Number of particles emitted.

However, this format is not directly comparable to experimental data.


Data tiers
----------

The processing is defined in terms of several *tiers*, mirroring the logic of the `pygama <https://pygama.readthedocs.io/en/stable/>`_ data processing software used for LEGEND.

- **stp** or "step" the raw *remage* outputs corresponding to Geant4 steps,
- **hit** the data from each channel independently after grouping in discrete physical interactions in the detector.
- **evt** or "event" the data combining the information from various detectors.

The processing is divided into two steps :func:`build_hit` ``build_evt`` [WIP].

Hit tier processing
-------------------

The hit tier converts the raw remage file based on Geant4 steps to a file corresponding to the physical interactions in the detectors.
Only steps corresponding to individual detectors are performed in this step.
The processing is based on a YAML or JSON configuration file. For example:

.. code-block:: json

{
"channels": [
"det000",
"det001",
"det002",
"det003"
],
"outputs": [
"t0",
"truth_energy_sum",
"smeared_energy_sum",
"evtid"
],
"step_group": {
"description": "group steps by time and evtid.",
"expression": "reboost.hpge.processors.group_by_time(stp,window=10)"
},
"locals": {
"hpge": "reboost.hpge.utils(meta_path=meta,pars=pars,detector=detector)"
},
"operations": {
"t0": {
"description": "first time in the hit.",
"mode": "eval",
"expression": "ak.fill_none(ak.firsts(hit.time,axis=-1),np.nan)"
},
"truth_energy_sum": {
"description": "truth summed energy in the hit.",
"mode": "eval",
"expression": "ak.sum(hit.edep,axis=-1)"
},
"smeared_energy_sum": {
"description": "summed energy after convolution with energy response.",
"mode": "function",
"expression": "reboost.hpge.processors.smear_energies(hit.truth_energy_sum,reso=pars.reso)"
}

}
}

It is necessary to provide several sub-dictionaries:

- **channels**: list of HPGe channels to process.
- **outputs**: list of fields for the output file.
- **locals**: get objects used by the processors (passed as ``locals`` to ``LGDO.Table.eval``), more details below.
- **step_group**: this should describe the function that groups the Geant4 steps into physical *hits*.
- **operations**: further computations / manipulations to apply.

The **step_group** block sets the structure of the output file, this function reformats the flat input table into a table
with a jagged structure where each row corresponds to a physical hit in the detector. For example:

.. code-block:: console

evtid: [0 , 0, 1, ... ]
edep: [101.2, 201.2, 303.7, ... ]
time: [0 , 0.1 , 0, ... ]
....

Becomes a Table of ``VectorOfVectors`` with a jagged structure. For example:

.. code-block:: console

evtid: [[0 , 0], [ 1],[...],... ]
edep: [[101.2, 201.2], [303.7],[...],... ]
time: [[0 , 0.1], [ 0],[...],... ]
....

The recommended tool to manipulate jagged arrays is awkward `[docs] <https://awkward-array.org/doc/main/>`_ and much of *reboost* is based on this.


It is necessary to chose a function to perform this step grouping, this function must take in the *remage* output table and return
a table where all the input arrays are converted to ``LGDO.VectorOfVectors`` with a jagged structure. In the expression of the function *stp* is an alias
for the input *remage* Table. This then must return the original LH5 table with the same fields as above restructured so each field is a ``VectorOfVectors``.
In addition a ``global_evtid`` field is adding which represents the index of the event over all input files.

Next a set of operations can be specified, these can perform any operation that doesn't change the length of the data. They can be either basic numerical operations
(including awkward or numpy) or be specified by a function. The functions can reference several variables:

- **hit** the output table of step grouping (note that the table is constantly updated so the order of operations is important),
- **pars** a named tuple of parameters (more details later) for this detector,
- **hpge** the ``legendhpges.HPGe`` object for this detector,
- **phy_vol** the ``pygometry`` physical volume for the detector.

Finally the outputs field specifies the columns of the Table to include in the output table.

lh5 i/o operations
^^^^^^^^^^^^^^^^^^

:func:`build_hit` contains several options to handle i/o of lh5 files.

Typically raw geant4 output files can be very large (many GB) so it is not desirable or feasible to read the full file into memory.
Instead the :class:`lgdo.lh5.LH5Ierator` is used to handle iteration over chunks of files keeping memory use reasonable. The *buffer* keyword argument
to :func:`build_hit` controls the size of the buffer.

It is possible to specify a list of files of use wildcards, the *merge_input_files* argument controls whether the outputs are merged or kept as separate files.

Finally, it is sometimes desirable to process a subset of the simulated events, for example to split the simulation by run or period. The *n_evtid* and *start_evtid*
keywords arguments control the first simulation index to process and the number of events. Note that the indices refer to the *global* evtid when multiple files are used.

parameters and other *local* variables
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Often it is necessary to include processors that depend on parameters (which) may vary by detector. To enable this the user can specify a dictionary of
parameters with the *pars* keyword, this should contain a sub-dictionary per detector for example:

.. code-block:: json

{
"det000": {
"reso": 1,
"fccd": 0.1,
"phy_vol_name":"det_phy",
"meta_name": "icpc.json"
}
}

This dictionary is internally converted into a python ``NamedTuple`` to make cleaner syntax. The named tuple for each detector is then passed as a
``local`` dictionary to the evaluation of the operations with name "pars".

In addition, for many post-processing applications it is necessary for the processor functions to know the geometry. This is made possible
by passing the path to the GDML file and the path to the metadata ("diodes" folder) with the *gdml* and *meta_path* arguments to build_hit.
From the GDML file the ``pyg4ometry.geant4.Registry`` is extracted.

To allow the flexibility to write processors depending on arbitrary (more complicated python objects), it is possible to add the *locals* dictionary
to the config file. The code will then evaluate the supplied expression for each sub-dictionary. These expressions can depend on:

- **detector**: the *remage* detector name,
- **meta**: the path to the metadata,
- **reg**: the geant4 registry,
- **pars**: the parameters for this detector.

These expressions are then evaluated (once per detector) and added to the *locals* dictionary of ``Table.eval``, so can be references in the expressions.

For example one useful object for post-processing is the `legendhpges.base.HPGe <https://legend-pygeom-hpges.readthedocs.io/en/latest/api/legendhpges.html#legendhpges.base.HPGe>`_ object for the detector.
This can be constructed from the metadata using.

.. code-block:: json

{"hpge": "reboost.hpge.utils(meta_path=meta,pars=pars,detector=detector)"}

This will then create the hpge object for each detector and add it to the "locals" mapping of "eval" so it can be used.

Possible intended use case of this functionality are:

- extracting detector mappings (eg drift time maps),
- extracting the kernel of a machine learning model.
- any more complicated (non-JSON serialisable objects).

Adding new processors
^^^^^^^^^^^^^^^^^^^^^

Any python function can be a ``reboost.hit`` processor. The only requirement is that it should return a:

- :class:`VectorOfVectors`,
- :class:`Array`` or
- :class:`ArrayOfEqualSizedArrays`

with the same length as the hit table. This means processors can act on subarrays (``axis=-1`` in awkward syntax) but should not combine multiple rows of the hit table.

It is simple to accommodate most of the current and future envisiged post-processing in this framework. For example:

- clustering hits would result in a new VectorOfVectors with the same number of rows but fewer entries per vector,
- pulse shape simulations to produce waveforms (or ML emmulation of this) would give an ArrayOfEqualSizedArrays,
- processing in parallel many parameters (eg for systematic) studies would give a nested VectorOfVectors.

Event tier processing (work in progress)
----------------------------------------

The event tier combines the information from various detector systems. Including in future the optical detector channels. This step is thus only necessary for experiments with
many output channels.

The processing is again based on a YAML or JSON configuration file. For example:

.. code-block:: json

{

"channels":{
"geds_usable":[
"det000",
"det001",
"det002"
],
"geds_ac":[
"det003"
]
},
"outputs": [
"energy",
"detector",
"is_good_hit",
"multiplicity"
],
"event_group": {
"description": "group hits by time and evtid.",
"expression": "reboost.hpge.processors.group_by_time(stp,window=10)"
}
}
8 changes: 8 additions & 0 deletions docs/source/manual/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
User Manual
===========

.. toctree::
:maxdepth: 2

optical
hpge
4 changes: 4 additions & 0 deletions docs/source/manual/optical.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Optical (SiPM) simulations processing
=====================================

Come back later for more complete documentation.
Binary file added docs/source/notebooks/images/output_20_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/notebooks/images/output_24_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/notebooks/images/output_27_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/notebooks/images/output_28_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/notebooks/images/output_31_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/notebooks/images/output_34_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/notebooks/images/output_35_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/notebooks/images/output_37_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading