additional section in README for the analyze (statistics) module

#143
monocongo · Feb 7, 2020 · 6f39dee · 6f39dee
1 parent 7d21613
commit 6f39dee
Showing 1 changed file with 24 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -255,6 +255,30 @@ $ cvdata_mask --images /data/images --masks /data/masks \
 >       --tfrecords /data/tfrecords \
 >       --shards 4 -- train_pct 0.8
 ```
+## Dataset statistics
+Basic statistics about a dataset are available via the script `cvdata/analyze.py` 
+or the corresponding script entry point `cvdata_analyze`.
+
+For example, we can count the number of examples in a collection of TFRecord files 
+(specify a directory containing only TFRecod files):
+```bash
+$ cvdata_analyze --format tfrecord --annotations /data/animals/tfrecord
+Total number of examples: 100
+```
+The above functionality can be utilized within Python code like so:
+```python
+from cvdata.analyze import count_tfrecord_examples
+tfrecords_dir = "/data/animals/tfrecord"
+number_of_examples = count_tfrecord_examples(tfrecords_dir)
+print(f"Number of examples: {number_of_examples}")
+```
+For datasets containing annotation files in COCO, Darknet (YOLO), KITTI, or PASCAL 
+formats we can get the number of images per class label. For example:
+```bash
+$ cvdata_analyze --format kitti --annotations /data/scissors/kitti --images /data/scissors/images
+Label: scissors   Count: 100 
+```
+
 ## Visualize annotations
 In order to visualize images and corresponding annotations use the script 
 `cvdata/visualize.py` or the corresponding script entry point `cvdata_visualize`.