Merge pull request #5 from msmk0/v2

Changes for version 2
LAL · Apr 27, 2018 · 8e4bc0d · 8e4bc0d
2 parents 568cc23 + 71fe6ea
commit 8e4bc0d
Show file tree

Hide file tree

Showing 3 changed files with 64 additions and 22 deletions.
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@ TrackML utility library
 =======================
 
 A python library to simplify working with the
-[High Energy Physics Tracking Machine Learning challenge](kaggle_trackml)
+[High Energy Physics Tracking Machine Learning challenge][kaggle_trackml]
 dataset.
 
 Installation
@@ -96,11 +96,10 @@ some hits can be left unassigned). The training dataset contains the recorded
 hits, their truth association to particles, and the initial parameters of those
 particles. The test dataset contains only the recorded hits.
 
-The dataset is provided as a set of plain `.csv` files (`.csv.gz` or `.csv.bz2`
-are also allowed). Each event has four associated files that contain hits, hit
-cells, particles, and the ground truth association between them. The common
-prefix (like `event000000000`) is fully constrained to be `event` followed by 9
-digits.
+The dataset is provided as a set of plain `.csv` files. Each event has four
+associated files that contain hits, hit cells, particles, and the ground truth
+association between them. The common prefix, e.g. `event000000010`, is always
+`event` followed by 9 digits.
 
     event000000000-hits.csv
     event000000000-cells.csv
@@ -122,7 +121,7 @@ a name starting with `submission`, e.g.
 The hits file contains the following values for each hit/entry:
 
 *   **hit_id**: numerical identifier of the hit inside the event.
-*   **x, y, z**: measured x, y, z position (in millimeters) of the hit in
+*   **x, y, z**: measured x, y, z position (in millimeter) of the hit in
     global coordinates.
 *   **volume_id**: numerical identifier of the detector group.
 *   **layer_id**: numerical identifier of the detector layer inside the
@@ -159,7 +158,7 @@ The particles files contains the following values for each particle/entry:
     coordinates.
 *   **px, py, pz**: initial momentum (in GeV/c) along each global axis.
 *   **q**: particle charge (as multiple of the absolute electron charge).
-*   **nhits**: number of hits generated by this particle
+*   **nhits**: number of hits generated by this particle.
 
 All entries contain the generated information or ground truth.
 
@@ -171,7 +170,8 @@ particle/track.
 
 *   **hit_id**: numerical identifier of the hit as defined in the hits file.
 *   **particle_id**: numerical identifier of the generating particle as defined
-    in the particles file.
+    in the particles file. A value of 0 means that the hit did not originate
+    from a reconstructible particle, but e.g. from detector noise.
 *   **tx, ty, tz** true intersection point in global coordinates (in
     millimeters) between the particle trajectory and the sensitive surface.
 *   **tpx, tpy, tpz** true particle momentum (in GeV/c) in the global
@@ -186,14 +186,57 @@ The submission file must associate each hit in each event to one and only one
 reconstructed particle track. The reconstructed tracks must be uniquely
 identified only within each event.  Participants are advised to compress the
 submission file (with zip, bzip2, gzip) before submission to the
-[Kaggle site](kaggle_trackml).
+[Kaggle site][kaggle_trackml].
 
 *   **event_id**: numerical identifier of the event; corresponds to the number
     found in the per-event file name prefix.
 *   **hit_id**: numerical identifier of the hit inside the event as defined in
     the per-event hits file.
 *   **track_id**: user-defined numerical identifier (non-negative integer) of
-    the track
+    the track.
+
+### Additional detector geometry information
+
+The detector modules that measure particles and generated the hits are organized
+into detector groups or volumes identified by a volume id. Inside a volume they
+are further grouped into layers identified by a layer id. Each layer can contain
+an arbitrary number of detector modules, the smallest geometrically distinct
+detector object, each identified by a module_id. Within each group detector
+modules are of the same type have e.g. the same granularity. All simulated
+detector modules are so-called semiconductor sensors that are build from thin
+silicon sensor chips. Each module can be represented by a two-dimensional,
+planar, bounded sensitive surface. These sensitive surfaces are subdivided into
+regular grids that define the detectors cells, the smallest granularity within
+the detector.
+
+Each module has a different position and orientation described in the detectors
+file. A local, right-handed coordinate system is defined on each sensitive
+surface such that the first two coordinates u and v are on the sensitive surface
+and the third coordinate w is normal to the surface. The orientation and
+position are defined by the following transformation
+
+    pos_xyz = rotation_matrix * pos_uvw + offset
+
+that transform a position described in local coordinates u,v,w into the
+equivalent position x,y,z in global coordinates using a rotation matrix and
+an offset.
+
+*   **volume_id**: numerical identifier of the detector group.
+*   **layer_id**: numerical identifier of the detector layer inside the
+    group.
+*   **module_id**: numerical identifier of the detector module inside
+    the layer.
+*   **cx, cy, cz**: position of the local origin in the described in the global
+    coordinate system (in millimeter).
+*   **rot_xu, rot_xv, rot_xw, rot_yu, ...**: components of the rotation matrix
+    to rotate from local u,v,w to global x,y,z coordinates.
+*   **module_t**: thickness of the detector module (in millimeter).
+*   **module_minhu, module_maxhu**: the minimum/maximum half-length of the
+    module boundary along the local u direction (in millimeter).
+*   **module_hv**: the half-length of the module boundary along the local v
+    direction (in millimeter).
+*   **pitch_u, pitch_v**: the size of detector cells along the local u and v
+    direction (in millimeter).
 
 
 [cern]: https://home.cern

diff --git a/setup.py b/setup.py
@@ -11,7 +11,7 @@
 
 setup(
     name='trackml',
-    version='1',
+    version='2',
     description='TrackML utility library',
     long_description=long_description,
     long_description_content_type='text/markdown',

diff --git a/trackml/randomize.py b/trackml/randomize.py
@@ -6,12 +6,11 @@
 import numpy
 import numpy.random
 
-def _make_submission(mapping, track_ids, renumber=True):
+def _make_submission(hit_ids, track_ids, renumber=True):
     """Create a submission DataFrame with hit_id and track_id columns.
 
     Optionally renumbers the track_id to random small integers.
     """
-    hit_ids = mapping['hit_id']
     if renumber:
         unique_ids, inverse = numpy.unique(track_ids, return_inverse=True)
         numbers = numpy.arange(1, len(unique_ids) + 1, dtype=unique_ids.dtype)
@@ -23,18 +22,18 @@ def set_seed(seed):
     """Set the random seed used for randomness in this module."""
     numpy.random.seed(seed)
 
-def random_solution(truth, ntracks):
-    """Generate a completely random solution with the given number of particles.
+def random_solution(hits, ntracks):
+    """Generate a completely random solution with the given number of tracks.
 
     Parameters
     ----------
-    truth : pandas.DataFrame
-        Truth mapping must contain hit_id and particle_id columns.
+    hits : pandas.DataFrame
+        Hits information must contain hit_id column.
     ntracks : int
         Number of tracks the submission should contain.
     """
-    ids = numpy.random.randint(1, nparticles + 1, size=len(mapping), dtype='i4')
-    return _make_submission(truth, ids, renumber=False)
+    ids = numpy.random.randint(1, ntracks + 1, size=len(hits), dtype='i4')
+    return _make_submission(hits['hit_id'], ids, renumber=False)
 
 def drop_hits(truth, probability):
     """Drop hits from each track with a certain probability.
@@ -55,7 +54,7 @@ def drop_hits(truth, probability):
     fakeids = numpy.arange(fakeid0, fakeid0 + dropped_count, dtype='i8')
     # replace masked particle ids with fakes ones
     numpy.place(out, dropped_mask, fakeids)
-    return _make_submission(truth, out)
+    return _make_submission(truth['hit_id'], out)
 
 def shuffle_hits(truth, probability):
     """Randomly assign hits to a wrong particle with a certain probability.
@@ -73,4 +72,4 @@ def shuffle_hits(truth, probability):
     wrongparticles = numpy.random.choice(numpy.unique(out), size=shuffled_count)
     # replace masked particle ids with random valid ids
     numpy.place(out, shuffled_mask, wrongparticles)
-    return _make_submission(truth, out)
+    return _make_submission(truth['hit_id'], out)