[FEA] Add file size counter to cuIO benchmarks #10154

vuule · 2022-01-28T00:59:25Z

Most cuIO benchmarks used dataframes of fixed size as input. After writing to a file in the given format, its size can vary greatly depending on the encoding and compression.
This PR adds a counter to output the file size, as it can be often corelated with the performance of readers/writers.

vuule · 2022-01-28T00:59:53Z

Sample output:

OrcRead/integral_file_input/30/0/1/1/0/manual_time               97.2 ms         63.0 ms            7 bytes_per_second=5.1425G/s file_size=389.961M peak_memory_usage=1096.41M
OrcRead/integral_file_input/30/1000/1/1/0/manual_time             112 ms         82.1 ms            6 bytes_per_second=4.47958G/s file_size=331.69M peak_memory_usage=1.14553G
OrcRead/integral_file_input/30/0/32/1/0/manual_time              69.9 ms         67.2 ms           10 bytes_per_second=7.15121G/s file_size=22.2183M peak_memory_usage=683.382M
OrcRead/integral_file_input/30/1000/32/1/0/manual_time           69.6 ms         67.2 ms           10 bytes_per_second=7.18037G/s file_size=20.5541M peak_memory_usage=683.241M
OrcRead/integral_file_input/30/0/1/0/0/manual_time               81.7 ms         47.1 ms            8 bytes_per_second=6.12364G/s file_size=396.36M peak_memory_usage=951.602M
OrcRead/integral_file_input/30/1000/1/0/0/manual_time            78.0 ms         43.5 ms            9 bytes_per_second=6.40907G/s file_size=396.308M peak_memory_usage=951.549M
OrcRead/integral_file_input/30/0/32/0/0/manual_time              65.1 ms         62.0 ms           11 bytes_per_second=7.6859G/s file_size=24.9541M peak_memory_usage=580.196M
OrcRead/integral_file_input/30/1000/32/0/0/manual_time           64.5 ms         61.4 ms           11 bytes_per_second=7.75274G/s file_size=24.916M peak_memory_usage=580.158M
OrcRead/integral_buffer_input/30/0/1/1/1/manual_time              105 ms          105 ms            7 bytes_per_second=4.76301G/s file_size=389.961M peak_memory_usage=1096.41M
OrcRead/integral_buffer_input/30/1000/1/1/1/manual_time           118 ms          118 ms            6 bytes_per_second=4.24934G/s file_size=331.69M peak_memory_usage=1.14553G
OrcRead/integral_buffer_input/30/0/32/1/1/manual_time            68.8 ms         68.8 ms           10 bytes_per_second=7.27146G/s file_size=22.2183M peak_memory_usage=683.382M
OrcRead/integral_buffer_input/30/1000/32/1/1/manual_time         68.6 ms         68.7 ms           10 bytes_per_second=7.28465G/s file_size=20.5542M peak_memory_usage=683.241M
OrcRead/integral_buffer_input/30/0/1/0/1/manual_time             88.9 ms         88.9 ms            7 bytes_per_second=5.62704G/s file_size=396.36M peak_memory_usage=951.602M
OrcRead/integral_buffer_input/30/1000/1/0/1/manual_time          87.2 ms         87.2 ms            8 bytes_per_second=5.73492G/s file_size=396.308M peak_memory_usage=951.549M
OrcRead/integral_buffer_input/30/0/32/0/1/manual_time            63.9 ms         63.9 ms           11 bytes_per_second=7.8284G/s file_size=24.9541M peak_memory_usage=580.196M
OrcRead/integral_buffer_input/30/1000/32/0/1/manual_time         64.1 ms         64.1 ms           11 bytes_per_second=7.80321G/s file_size=24.916M peak_memory_usage=580.158M

vuule · 2022-01-28T01:00:36Z

cpp/benchmarks/io/csv/csv_reader.cpp

@@ -132,6 +130,7 @@ void BM_csv_read_varying_options(benchmark::State& state)
  auto const data_processed = data_size * cols_to_read.size() / view.num_columns();
  state.SetBytesProcessed(data_processed * state.iterations());
  state.counters["peak_memory_usage"] = mem_stats_logger.peak_memory_usage();
+  state.counters["file_size"]         = source_sink.size();


or "encoded_size" maybe?

"encoded_file_size" ?

codecov · 2022-01-28T02:34:59Z

Codecov Report

Merging #10154 (c426ce9) into branch-22.04 (e24fa8f) will increase coverage by 0.10%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           branch-22.04   #10154      +/-   ##
================================================
+ Coverage         10.37%   10.48%   +0.10%     
================================================
  Files               119      122       +3     
  Lines             20149    20493     +344     
================================================
+ Hits               2091     2148      +57     
- Misses            18058    18345     +287

Impacted Files	Coverage Δ
python/cudf/cudf/errors.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/csv.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/hdf.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/orc.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/__init__.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/_version.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/abc.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/api/types.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/dlpack.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/frame.py	`0.00% <0.00%> (ø)`
... and 66 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5dd1c39...c426ce9. Read the comment docs.

vuule · 2022-01-28T02:47:43Z

CC @GregoryKimball who sort of asked for this feature

rgsl888prabhu

Rest looks good

rgsl888prabhu · 2022-01-28T16:19:04Z

cpp/benchmarks/io/csv/csv_reader.cpp

@@ -132,6 +130,7 @@ void BM_csv_read_varying_options(benchmark::State& state)
  auto const data_processed = data_size * cols_to_read.size() / view.num_columns();
  state.SetBytesProcessed(data_processed * state.iterations());
  state.counters["peak_memory_usage"] = mem_stats_logger.peak_memory_usage();
+  state.counters["file_size"]         = source_sink.size();


"encoded_file_size" ?

vuule · 2022-01-29T09:24:34Z

@gpucibot merge

vuule added 2 commits January 27, 2022 16:17

add size to source_sink_pair

19fa1a2

add file size counter to cuIO benchmarks

441d254

vuule added feature request New feature or request cuIO cuIO issue Performance Performance related issue non-breaking Non-breaking change labels Jan 28, 2022

vuule self-assigned this Jan 28, 2022

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jan 28, 2022

vuule commented Jan 28, 2022

View reviewed changes

vuule marked this pull request as ready for review January 28, 2022 02:46

vuule requested a review from a team as a code owner January 28, 2022 02:46

vuule requested review from harrism and rgsl888prabhu January 28, 2022 02:46

rgsl888prabhu reviewed Jan 28, 2022

View reviewed changes

vuule added 2 commits January 28, 2022 10:40

rename counter

5815cc9

style

c426ce9

rgsl888prabhu approved these changes Jan 28, 2022

View reviewed changes

harrism approved these changes Jan 29, 2022

View reviewed changes

rapids-bot bot merged commit cf81b1a into rapidsai:branch-22.04 Jan 29, 2022

vuule deleted the fea-cuio-bm-file-size branch January 29, 2022 09:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Add file size counter to cuIO benchmarks #10154

[FEA] Add file size counter to cuIO benchmarks #10154

vuule commented Jan 28, 2022

vuule commented Jan 28, 2022

vuule Jan 28, 2022

rgsl888prabhu Jan 28, 2022

codecov bot commented Jan 28, 2022 •

edited

Loading

vuule commented Jan 28, 2022

rgsl888prabhu left a comment

rgsl888prabhu Jan 28, 2022

vuule commented Jan 29, 2022

[FEA] Add file size counter to cuIO benchmarks #10154

[FEA] Add file size counter to cuIO benchmarks #10154

Conversation

vuule commented Jan 28, 2022

vuule commented Jan 28, 2022

vuule Jan 28, 2022

Choose a reason for hiding this comment

rgsl888prabhu Jan 28, 2022

Choose a reason for hiding this comment

codecov bot commented Jan 28, 2022 • edited Loading

Codecov Report

vuule commented Jan 28, 2022

rgsl888prabhu left a comment

Choose a reason for hiding this comment

rgsl888prabhu Jan 28, 2022

Choose a reason for hiding this comment

vuule commented Jan 29, 2022

codecov bot commented Jan 28, 2022 •

edited

Loading