Skip to content

Commit

Permalink
Merge pull request #5 from msmk0/v2
Browse files Browse the repository at this point in the history
Changes for version 2
  • Loading branch information
dhrou authored Apr 27, 2018
2 parents 568cc23 + 71fe6ea commit 8e4bc0d
Show file tree
Hide file tree
Showing 3 changed files with 64 additions and 22 deletions.
65 changes: 54 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ TrackML utility library
=======================

A python library to simplify working with the
[High Energy Physics Tracking Machine Learning challenge](kaggle_trackml)
[High Energy Physics Tracking Machine Learning challenge][kaggle_trackml]
dataset.

Installation
Expand Down Expand Up @@ -96,11 +96,10 @@ some hits can be left unassigned). The training dataset contains the recorded
hits, their truth association to particles, and the initial parameters of those
particles. The test dataset contains only the recorded hits.

The dataset is provided as a set of plain `.csv` files (`.csv.gz` or `.csv.bz2`
are also allowed). Each event has four associated files that contain hits, hit
cells, particles, and the ground truth association between them. The common
prefix (like `event000000000`) is fully constrained to be `event` followed by 9
digits.
The dataset is provided as a set of plain `.csv` files. Each event has four
associated files that contain hits, hit cells, particles, and the ground truth
association between them. The common prefix, e.g. `event000000010`, is always
`event` followed by 9 digits.

event000000000-hits.csv
event000000000-cells.csv
Expand All @@ -122,7 +121,7 @@ a name starting with `submission`, e.g.
The hits file contains the following values for each hit/entry:

* **hit_id**: numerical identifier of the hit inside the event.
* **x, y, z**: measured x, y, z position (in millimeters) of the hit in
* **x, y, z**: measured x, y, z position (in millimeter) of the hit in
global coordinates.
* **volume_id**: numerical identifier of the detector group.
* **layer_id**: numerical identifier of the detector layer inside the
Expand Down Expand Up @@ -159,7 +158,7 @@ The particles files contains the following values for each particle/entry:
coordinates.
* **px, py, pz**: initial momentum (in GeV/c) along each global axis.
* **q**: particle charge (as multiple of the absolute electron charge).
* **nhits**: number of hits generated by this particle
* **nhits**: number of hits generated by this particle.

All entries contain the generated information or ground truth.

Expand All @@ -171,7 +170,8 @@ particle/track.

* **hit_id**: numerical identifier of the hit as defined in the hits file.
* **particle_id**: numerical identifier of the generating particle as defined
in the particles file.
in the particles file. A value of 0 means that the hit did not originate
from a reconstructible particle, but e.g. from detector noise.
* **tx, ty, tz** true intersection point in global coordinates (in
millimeters) between the particle trajectory and the sensitive surface.
* **tpx, tpy, tpz** true particle momentum (in GeV/c) in the global
Expand All @@ -186,14 +186,57 @@ The submission file must associate each hit in each event to one and only one
reconstructed particle track. The reconstructed tracks must be uniquely
identified only within each event. Participants are advised to compress the
submission file (with zip, bzip2, gzip) before submission to the
[Kaggle site](kaggle_trackml).
[Kaggle site][kaggle_trackml].

* **event_id**: numerical identifier of the event; corresponds to the number
found in the per-event file name prefix.
* **hit_id**: numerical identifier of the hit inside the event as defined in
the per-event hits file.
* **track_id**: user-defined numerical identifier (non-negative integer) of
the track
the track.

### Additional detector geometry information

The detector modules that measure particles and generated the hits are organized
into detector groups or volumes identified by a volume id. Inside a volume they
are further grouped into layers identified by a layer id. Each layer can contain
an arbitrary number of detector modules, the smallest geometrically distinct
detector object, each identified by a module_id. Within each group detector
modules are of the same type have e.g. the same granularity. All simulated
detector modules are so-called semiconductor sensors that are build from thin
silicon sensor chips. Each module can be represented by a two-dimensional,
planar, bounded sensitive surface. These sensitive surfaces are subdivided into
regular grids that define the detectors cells, the smallest granularity within
the detector.

Each module has a different position and orientation described in the detectors
file. A local, right-handed coordinate system is defined on each sensitive
surface such that the first two coordinates u and v are on the sensitive surface
and the third coordinate w is normal to the surface. The orientation and
position are defined by the following transformation

pos_xyz = rotation_matrix * pos_uvw + offset

that transform a position described in local coordinates u,v,w into the
equivalent position x,y,z in global coordinates using a rotation matrix and
an offset.

* **volume_id**: numerical identifier of the detector group.
* **layer_id**: numerical identifier of the detector layer inside the
group.
* **module_id**: numerical identifier of the detector module inside
the layer.
* **cx, cy, cz**: position of the local origin in the described in the global
coordinate system (in millimeter).
* **rot_xu, rot_xv, rot_xw, rot_yu, ...**: components of the rotation matrix
to rotate from local u,v,w to global x,y,z coordinates.
* **module_t**: thickness of the detector module (in millimeter).
* **module_minhu, module_maxhu**: the minimum/maximum half-length of the
module boundary along the local u direction (in millimeter).
* **module_hv**: the half-length of the module boundary along the local v
direction (in millimeter).
* **pitch_u, pitch_v**: the size of detector cells along the local u and v
direction (in millimeter).


[cern]: https://home.cern
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

setup(
name='trackml',
version='1',
version='2',
description='TrackML utility library',
long_description=long_description,
long_description_content_type='text/markdown',
Expand Down
19 changes: 9 additions & 10 deletions trackml/randomize.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,11 @@
import numpy
import numpy.random

def _make_submission(mapping, track_ids, renumber=True):
def _make_submission(hit_ids, track_ids, renumber=True):
"""Create a submission DataFrame with hit_id and track_id columns.
Optionally renumbers the track_id to random small integers.
"""
hit_ids = mapping['hit_id']
if renumber:
unique_ids, inverse = numpy.unique(track_ids, return_inverse=True)
numbers = numpy.arange(1, len(unique_ids) + 1, dtype=unique_ids.dtype)
Expand All @@ -23,18 +22,18 @@ def set_seed(seed):
"""Set the random seed used for randomness in this module."""
numpy.random.seed(seed)

def random_solution(truth, ntracks):
"""Generate a completely random solution with the given number of particles.
def random_solution(hits, ntracks):
"""Generate a completely random solution with the given number of tracks.
Parameters
----------
truth : pandas.DataFrame
Truth mapping must contain hit_id and particle_id columns.
hits : pandas.DataFrame
Hits information must contain hit_id column.
ntracks : int
Number of tracks the submission should contain.
"""
ids = numpy.random.randint(1, nparticles + 1, size=len(mapping), dtype='i4')
return _make_submission(truth, ids, renumber=False)
ids = numpy.random.randint(1, ntracks + 1, size=len(hits), dtype='i4')
return _make_submission(hits['hit_id'], ids, renumber=False)

def drop_hits(truth, probability):
"""Drop hits from each track with a certain probability.
Expand All @@ -55,7 +54,7 @@ def drop_hits(truth, probability):
fakeids = numpy.arange(fakeid0, fakeid0 + dropped_count, dtype='i8')
# replace masked particle ids with fakes ones
numpy.place(out, dropped_mask, fakeids)
return _make_submission(truth, out)
return _make_submission(truth['hit_id'], out)

def shuffle_hits(truth, probability):
"""Randomly assign hits to a wrong particle with a certain probability.
Expand All @@ -73,4 +72,4 @@ def shuffle_hits(truth, probability):
wrongparticles = numpy.random.choice(numpy.unique(out), size=shuffled_count)
# replace masked particle ids with random valid ids
numpy.place(out, shuffled_mask, wrongparticles)
return _make_submission(truth, out)
return _make_submission(truth['hit_id'], out)

0 comments on commit 8e4bc0d

Please sign in to comment.