From 6f39deee524215e5dad453f57df7f0ace0cfdb69 Mon Sep 17 00:00:00 2001 From: James Adams Date: Fri, 7 Feb 2020 14:55:02 -0500 Subject: [PATCH] additional section in README for the analyze (statistics) module #143 --- README.md | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/README.md b/README.md index e06b7c4..ce16ad8 100644 --- a/README.md +++ b/README.md @@ -255,6 +255,30 @@ $ cvdata_mask --images /data/images --masks /data/masks \ > --tfrecords /data/tfrecords \ > --shards 4 -- train_pct 0.8 ``` +## Dataset statistics +Basic statistics about a dataset are available via the script `cvdata/analyze.py` +or the corresponding script entry point `cvdata_analyze`. + +For example, we can count the number of examples in a collection of TFRecord files +(specify a directory containing only TFRecod files): +```bash +$ cvdata_analyze --format tfrecord --annotations /data/animals/tfrecord +Total number of examples: 100 +``` +The above functionality can be utilized within Python code like so: +```python +from cvdata.analyze import count_tfrecord_examples +tfrecords_dir = "/data/animals/tfrecord" +number_of_examples = count_tfrecord_examples(tfrecords_dir) +print(f"Number of examples: {number_of_examples}") +``` +For datasets containing annotation files in COCO, Darknet (YOLO), KITTI, or PASCAL +formats we can get the number of images per class label. For example: +```bash +$ cvdata_analyze --format kitti --annotations /data/scissors/kitti --images /data/scissors/images +Label: scissors Count: 100 +``` + ## Visualize annotations In order to visualize images and corresponding annotations use the script `cvdata/visualize.py` or the corresponding script entry point `cvdata_visualize`.