Skip to content

Commit

Permalink
docs: typo in PERFORMANCE.md
Browse files Browse the repository at this point in the history
[skip ci]
  • Loading branch information
jqnatividad committed Oct 28, 2024
1 parent ee86754 commit c0b6b55
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/PERFORMANCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ export QSV_AUTOINDEX_SIZE=10000000
## Stats Cache
`stats` is the primary reason qsv was created. Several projects we were working on required GUARANTEED data type inferences at speed when we first working on it in 2021. As we iterated and started additional projects, we started needing additional capabilities to enable the ["automagical metadata"](https://dathere.com/2023/11/automagical-metadata/) inferencing workflow we wanted for our data ingestion pipelines.

From the original 11 summary statistics in xsv (type, sum, min/max, min/max length, mean, stddev, median, mode & cardinality ), 22 more were added incrementally over time (is_ascii, range, sort_order, sum_length, avg_length, mean_length, sem, variance, cv, nullcount, max_precision, sparsity, mad, lower outer/inner fence, q1, q2_median, q3, iqr, upper inner/outer fence, skewness, mode_count, mode_occurences, antimode, antimode_count, antimode_occurences). Check the [Wiki](https://github.com/jqnatividad/qsv/wiki/Supplemental#stats-command-output-explanation) for more info.
From the original 11 summary statistics in xsv (type, sum, min/max, min/max length, mean, stddev, median, mode & cardinality ), 22 more were added incrementally over time (is_ascii, range, sort_order, sum_length, avg_length, mean_length, sem, variance, cv, nullcount, max_precision, sparsity, mad, lower outer/inner fence, q1, q2_median, q3, iqr, upper inner/outer fence, skewness, mode_count, mode_occurrences, antimode, antimode_count, antimode_occurrences). Check the [Wiki](https://github.com/jqnatividad/qsv/wiki/Supplemental#stats-command-output-explanation) for more info.

And some of these stats were relatively expensive to compute, so qsv started caching statistics so it didn't need to recompute them if a file hasn't changed (as most of the files we were working on were historical data).

Expand Down

0 comments on commit c0b6b55

Please sign in to comment.