Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adaptive fragment sizes in Parquet writer #12627

Closed
wants to merge 10 commits into from

Conversation

etseidl
Copy link
Contributor

@etseidl etseidl commented Jan 26, 2023

Description

Trying to write Parquet files where rows are very wide can result in pages that are much too large due to the default fragment size of 5000. This in turn can have an adverse effect on file size when using Zstandard compression. This PR attempts to address this by modifying the fragment size to a value where each fragment will still fit in the desired page size.

Fixes #12613

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@rapids-bot
Copy link

rapids-bot bot commented Jan 26, 2023

Pull requests from external contributors require approval from a rapidsai organization member with write or admin permissions before CI can begin.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jan 26, 2023
@etseidl
Copy link
Contributor Author

etseidl commented Jan 26, 2023

@vuule you had mentioned awhile back that you thought there was a function somewhere that calculates a column's size. I couldn't find it so implemented my own. If you can point me to something better I'll use it.

@etseidl
Copy link
Contributor Author

etseidl commented Jan 26, 2023

Side benefit is better performance in benchmarks for list columns, without the performance regression when just setting the default fragment size to 1000.

Before this PR

## parquet_write_encode

### [0] NVIDIA RTX A6000

| data_type | cardinality | run_length | Samples |  CPU Time  | Noise |  GPU Time  | Noise | bytes_per_second | peak_memory_usage | encoded_file_size |
|-----------|-------------|------------|---------|------------|-------|------------|-------|------------------|-------------------|-------------------|
|  INTEGRAL |           0 |          1 |      5x | 111.840 ms | 0.32% | 111.835 ms | 0.31% |       4800565202 |         2.146 GiB |       498.123 MiB |
|  INTEGRAL |        1000 |          1 |     12x |  45.089 ms | 0.38% |  45.083 ms | 0.38% |      11908481790 |         2.770 GiB |       161.438 MiB |
|  INTEGRAL |           0 |         32 |     21x |  34.929 ms | 0.50% |  34.924 ms | 0.50% |      15372537807 |         2.770 GiB |        27.720 MiB |
|  INTEGRAL |        1000 |         32 |     17x |  29.901 ms | 0.14% |  29.896 ms | 0.14% |      17957667626 |         2.770 GiB |        14.403 MiB |
|     FLOAT |           0 |          1 |      5x | 106.580 ms | 0.08% | 106.575 ms | 0.08% |       5037494184 |         1.100 GiB |       510.303 MiB |
|     FLOAT |        1000 |          1 |    112x |  30.835 ms | 0.58% |  30.829 ms | 0.58% |      17414302741 |         1.765 GiB |       110.206 MiB |
|     FLOAT |           0 |         32 |     21x |  23.975 ms | 0.20% |  23.970 ms | 0.20% |      22397993527 |         1.765 GiB |        23.640 MiB |
|     FLOAT |        1000 |         32 |     26x |  19.871 ms | 0.24% |  19.866 ms | 0.24% |      27024975803 |         1.765 GiB |         9.888 MiB |
|   DECIMAL |           0 |          1 |     10x |  51.763 ms | 0.14% |  51.758 ms | 0.14% |      10372678425 |       811.156 MiB |       221.308 MiB |
|   DECIMAL |        1000 |          1 |     28x |  18.359 ms | 0.31% |  18.354 ms | 0.31% |      29251120183 |         1.145 GiB |        48.997 MiB |
|   DECIMAL |           0 |         32 |     33x |  15.604 ms | 0.31% |  15.598 ms | 0.31% |      34418188596 |         1.145 GiB |        10.298 MiB |
|   DECIMAL |        1000 |         32 |     37x |  13.608 ms | 0.32% |  13.603 ms | 0.31% |      39468368025 |         1.145 GiB |         4.717 MiB |
| TIMESTAMP |           0 |          1 |      5x | 131.251 ms | 0.07% | 131.245 ms | 0.07% |       4090588948 |         1.170 GiB |       458.580 MiB |
| TIMESTAMP |        1000 |          1 |     19x |  26.401 ms | 0.37% |  26.396 ms | 0.37% |      20339412646 |         1.474 GiB |        92.808 MiB |
| TIMESTAMP |           0 |         32 |    272x |  27.217 ms | 0.57% |  27.211 ms | 0.57% |      19729735749 |         1.474 GiB |        20.948 MiB |
| TIMESTAMP |        1000 |         32 |     29x |  17.745 ms | 0.42% |  17.740 ms | 0.41% |      30263290620 |         1.474 GiB |         8.718 MiB |
|  DURATION |           0 |          1 |      7x |  79.987 ms | 0.10% |  79.982 ms | 0.10% |       6712426044 |       957.340 MiB |       355.521 MiB |
|  DURATION |        1000 |          1 |     34x |  25.773 ms | 0.50% |  25.768 ms | 0.50% |      20835191262 |         1.474 GiB |        90.214 MiB |
|  DURATION |           0 |         32 |     24x |  21.270 ms | 0.40% |  21.264 ms | 0.40% |      25247594804 |         1.474 GiB |        17.107 MiB |
|  DURATION |        1000 |         32 |     30x |  16.814 ms | 0.48% |  16.809 ms | 0.48% |      31939875607 |         1.474 GiB |         8.113 MiB |
|    STRING |           0 |          1 |      5x | 130.180 ms | 0.22% | 130.174 ms | 0.22% |       4124244709 |         1.342 GiB |       597.486 MiB |
|    STRING |        1000 |          1 |    568x |  26.363 ms | 1.36% |  26.358 ms | 1.36% |      20368583961 |       677.964 MiB |        46.473 MiB |
|    STRING |           0 |         32 |      5x | 130.524 ms | 0.05% | 130.519 ms | 0.05% |       4113363887 |         1.342 GiB |       597.486 MiB |
|    STRING |        1000 |         32 |     35x |  14.426 ms | 0.48% |  14.421 ms | 0.48% |      37228695840 |       677.964 MiB |         8.504 MiB |
|      LIST |           0 |          1 |      5x | 534.754 ms | 0.04% | 534.747 ms | 0.04% |       1003971377 |         1.602 GiB |       498.003 MiB |
|      LIST |        1000 |          1 |      5x | 355.656 ms | 0.03% | 355.649 ms | 0.03% |       1509550797 |         2.752 GiB |       166.640 MiB |
|      LIST |           0 |         32 |      5x | 270.222 ms | 0.05% | 270.217 ms | 0.05% |       1986816725 |         2.752 GiB |        37.257 MiB |
|      LIST |        1000 |         32 |      5x | 274.311 ms | 0.11% | 274.306 ms | 0.11% |       1957200557 |         2.752 GiB |        24.421 MiB |
|    STRUCT |           0 |          1 |      5x | 136.164 ms | 0.09% | 136.159 ms | 0.09% |       3942984412 |         1.283 GiB |       569.525 MiB |
|    STRUCT |        1000 |          1 |    369x |  40.609 ms | 0.63% |  40.603 ms | 0.63% |      13222497588 |         1.324 GiB |        90.699 MiB |
|    STRUCT |           0 |         32 |      5x | 109.019 ms | 0.48% | 109.013 ms | 0.48% |       4924833685 |         1.473 GiB |       409.317 MiB |
|    STRUCT |        1000 |         32 |     19x |  27.182 ms | 0.32% |  27.176 ms | 0.32% |      19755007841 |         1.324 GiB |        15.400 MiB |

With these changes:

| data_type | cardinality | run_length | Samples |  CPU Time  | Noise |  GPU Time  | Noise | bytes_per_second | peak_memory_usage | encoded_file_size |
|-----------|-------------|------------|---------|------------|-------|------------|-------|------------------|-------------------|-------------------|
|  INTEGRAL |           0 |          1 |      5x | 111.332 ms | 0.10% | 111.327 ms | 0.10% |       4822460334 |         2.146 GiB |       498.123 MiB |
|  INTEGRAL |        1000 |          1 |     43x |  44.586 ms | 0.50% |  44.581 ms | 0.50% |      12042558020 |         2.770 GiB |       161.438 MiB |
|  INTEGRAL |           0 |         32 |     15x |  34.555 ms | 0.30% |  34.550 ms | 0.29% |      15538786092 |         2.770 GiB |        27.720 MiB |
|  INTEGRAL |        1000 |         32 |     17x |  29.718 ms | 0.45% |  29.712 ms | 0.46% |      18069054636 |         2.770 GiB |        14.403 MiB |
|     FLOAT |           0 |          1 |      5x | 106.595 ms | 0.14% | 106.590 ms | 0.14% |       5036783215 |         1.100 GiB |       510.303 MiB |
|     FLOAT |        1000 |          1 |     17x |  30.503 ms | 0.36% |  30.498 ms | 0.36% |      17603747240 |         1.765 GiB |       110.206 MiB |
|     FLOAT |           0 |         32 |     21x |  23.825 ms | 0.26% |  23.820 ms | 0.26% |      22538378841 |         1.765 GiB |        23.640 MiB |
|     FLOAT |        1000 |         32 |     26x |  19.719 ms | 0.50% |  19.714 ms | 0.50% |      27233359575 |         1.765 GiB |         9.888 MiB |
|   DECIMAL |           0 |          1 |     10x |  51.820 ms | 0.38% |  51.815 ms | 0.38% |      10361253329 |       811.156 MiB |       221.308 MiB |
|   DECIMAL |        1000 |          1 |     28x |  18.281 ms | 0.49% |  18.276 ms | 0.49% |      29375372176 |         1.145 GiB |        48.997 MiB |
|   DECIMAL |           0 |         32 |     33x |  15.538 ms | 0.33% |  15.532 ms | 0.33% |      34564602643 |         1.145 GiB |        10.298 MiB |
|   DECIMAL |        1000 |         32 |    608x |  13.573 ms | 0.58% |  13.567 ms | 0.58% |      39570530602 |         1.145 GiB |         4.717 MiB |
| TIMESTAMP |           0 |          1 |      5x | 129.445 ms | 0.19% | 129.440 ms | 0.19% |       4147628465 |         1.170 GiB |       458.580 MiB |
| TIMESTAMP |        1000 |          1 |     20x |  26.116 ms | 0.21% |  26.111 ms | 0.21% |      20561446173 |         1.474 GiB |        92.808 MiB |
| TIMESTAMP |           0 |         32 |    528x |  26.924 ms | 0.62% |  26.918 ms | 0.62% |      19944485006 |         1.474 GiB |        20.948 MiB |
| TIMESTAMP |        1000 |         32 |     56x |  17.680 ms | 0.50% |  17.674 ms | 0.50% |      30375588647 |         1.474 GiB |         8.718 MiB |
|  DURATION |           0 |          1 |      7x |  79.813 ms | 0.14% |  79.808 ms | 0.14% |       6727042871 |       957.340 MiB |       355.521 MiB |
|  DURATION |        1000 |          1 |     20x |  25.534 ms | 0.22% |  25.529 ms | 0.22% |      21029918810 |         1.474 GiB |        90.214 MiB |
|  DURATION |           0 |         32 |     24x |  21.126 ms | 0.34% |  21.121 ms | 0.34% |      25418552644 |         1.474 GiB |        17.108 MiB |
|  DURATION |        1000 |         32 |    138x |  16.675 ms | 0.50% |  16.670 ms | 0.50% |      32206674557 |         1.474 GiB |         8.113 MiB |
|    STRING |           0 |          1 |      5x | 129.812 ms | 0.24% | 129.807 ms | 0.24% |       4135924927 |         1.342 GiB |       597.486 MiB |
|    STRING |        1000 |          1 |    560x |  26.538 ms | 0.86% |  26.533 ms | 0.86% |      20234323957 |       677.964 MiB |        46.473 MiB |
|    STRING |           0 |         32 |      5x | 130.361 ms | 0.13% | 130.356 ms | 0.13% |       4118511404 |         1.342 GiB |       597.486 MiB |
|    STRING |        1000 |         32 |     72x |  14.652 ms | 0.50% |  14.647 ms | 0.50% |      36654441681 |       677.964 MiB |         8.504 MiB |
|      LIST |           0 |          1 |      9x | 263.635 ms | 0.48% | 263.630 ms | 0.48% |       2036458592 |         1.602 GiB |       498.638 MiB |
|      LIST |        1000 |          1 |      5x | 141.951 ms | 0.21% | 141.945 ms | 0.21% |       3782243064 |         2.752 GiB |       167.795 MiB |
|      LIST |           0 |         32 |     13x |  94.726 ms | 0.50% |  94.721 ms | 0.50% |       5667941431 |         2.752 GiB |        39.319 MiB |
|      LIST |        1000 |         32 |    163x |  92.155 ms | 0.58% |  92.149 ms | 0.58% |       5826129787 |         2.752 GiB |        25.456 MiB |
|    STRUCT |           0 |          1 |      5x | 136.310 ms | 0.28% | 136.304 ms | 0.28% |       3938768656 |         1.283 GiB |       569.525 MiB |
|    STRUCT |        1000 |          1 |     13x |  40.940 ms | 0.32% |  40.935 ms | 0.32% |      13115251265 |         1.324 GiB |        90.700 MiB |
|    STRUCT |           0 |         32 |      5x | 109.057 ms | 0.35% | 109.052 ms | 0.35% |       4923061272 |         1.473 GiB |       409.317 MiB |
|    STRUCT |        1000 |         32 |    528x |  27.551 ms | 0.59% |  27.546 ms | 0.59% |      19490103598 |         1.324 GiB |        15.400 MiB |

@vuule
Copy link
Contributor

vuule commented Jan 26, 2023

Side benefit is better performance in benchmarks for list columns

Another side benefit is faster reading of files with list columns, when written with this change :)

@etseidl
Copy link
Contributor Author

etseidl commented Jan 26, 2023

Another side benefit is faster reading of files with list columns, when written with this change :)

😃 Right you are

Before:

## parquet_read_decode

### [0] NVIDIA RTX A6000

| data_type | cardinality | run_length | Samples |  CPU Time  | Noise |  GPU Time  | Noise | bytes_per_second | peak_memory_usage | encoded_file_size |
|-----------|-------------|------------|---------|------------|-------|------------|-------|------------------|-------------------|-------------------|
|      LIST |           0 |          1 |     38x | 403.717 ms | 2.24% | 403.711 ms | 2.24% |       1329839927 |         1.004 GiB |       498.003 MiB |
|      LIST |        1000 |          1 |     10x | 345.366 ms | 0.49% | 345.360 ms | 0.49% |       1554524822 |       698.783 MiB |       166.640 MiB |
|      LIST |           0 |         32 |      5x | 263.115 ms | 0.16% | 263.109 ms | 0.16% |       2040488804 |       567.905 MiB |        37.258 MiB |
|      LIST |        1000 |         32 |      5x | 264.765 ms | 0.08% | 264.759 ms | 0.08% |       2027769035 |       555.236 MiB |        24.422 MiB |

This PR:

| data_type | cardinality | run_length | Samples |  CPU Time  | Noise |  GPU Time  | Noise | bytes_per_second | peak_memory_usage | encoded_file_size |
|-----------|-------------|------------|---------|------------|-------|------------|-------|------------------|-------------------|-------------------|
|      LIST |           0 |          1 |     84x | 180.044 ms | 1.38% | 180.039 ms | 1.38% |       2981977419 |         1.004 GiB |       498.638 MiB |
|      LIST |        1000 |          1 |    126x | 119.529 ms | 0.95% | 119.523 ms | 0.95% |       4491779278 |       699.813 MiB |       167.795 MiB |
|      LIST |           0 |         32 |    174x |  86.238 ms | 0.80% |  86.233 ms | 0.80% |       6225825854 |       569.638 MiB |        39.319 MiB |
|      LIST |        1000 |         32 |    177x |  85.076 ms | 0.87% |  85.070 ms | 0.87% |       6310920104 |       556.266 MiB |        25.456 MiB |

auto const avg_len = column_size(column, stream) / num_rows;

if (avg_len > 0) {
size_type frag_size = max_page_size_bytes / avg_len;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this too large? IIUC, max_page_size_bytes / avg_len is the average number of rows in each page. That means that any deviation in size between rows would cause us to overshoot the max page size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, avg will tend to overshoot. But with the deeply nested cases, it we overshoot anyway, now just by much less. 😅 I could try max row length perhaps, but that will be trickier to calculate than total size for the nested case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to suggest a better option, but anything small enough to get really precise page sizes in the largest column would significantly degrade performance for other columns.
Current implementation is actually a good compromise.


if (avg_len > 0) {
size_type frag_size = max_page_size_bytes / avg_len;
max_page_fragment_size_ = std::min(frag_size, max_page_fragment_size_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so we use the fragment size based on the largest column.
Do you expect (perf) issues when we have a single large column and many small columns? The benchmarks show the best case scenario, where each table has columns of similar size.

When we talked about dynamic fragment size I envisioned per-column fragment size. That seems more optimal than the static size in all cases. I'm trying to figure out if we can claim that this PR is also always better than the (current) static option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will probably be bad in the single large column/many fixed length columns case. I'm interested to see what this does with @jbrennan333's user data, which seems more mixed than the test data he generated.

I really see this as a POC to demonstrate the value of changing up the fragment size. I agree having a per column fragment size would be best, but that's a heavier lift too. But maybe with these numbers it can be justified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you ran the "parquet_write_io_compression" group of benchmarks? That one has an even mix of all supported types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have, but my WS is shut down now. I can post tomorrow, but IIRC there was not a big difference, with this code being maybe a percent or two faster in most cases. I actually want to run that benchmark with ZSTD too, since that has issues with the run length=32 cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try this with the customer data. Have you already verified it with the test data I provided in #12613?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the parquet-tools inspect output for the gpu file:

############ file meta data ############
created_by:
num_columns: 7
num_rows: 17105
num_row_groups: 1
format_version: 1.0
serialized_size: 5922


############ Columns ############
format
hash
data
id
part
offset
relayTs

############ Column(format) ############
name: format
path: format
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: None
converted_type (legacy): NONE
compression: ZSTD (space_saved: 100%)

############ Column(hash) ############
name: hash
path: hash
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: None
converted_type (legacy): NONE
compression: ZSTD (space_saved: 47%)

############ Column(data) ############
name: data
path: data
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: None
converted_type (legacy): NONE
compression: ZSTD (space_saved: 85%)

############ Column(id) ############
name: id
path: origin.id
max_definition_level: 2
max_repetition_level: 0
physical_type: INT32
logical_type: None
converted_type (legacy): NONE
compression: UNCOMPRESSED (space_saved: 0%)

############ Column(part) ############
name: part
path: origin.part
max_definition_level: 2
max_repetition_level: 0
physical_type: INT32
logical_type: None
converted_type (legacy): NONE
compression: UNCOMPRESSED (space_saved: 0%)

############ Column(offset) ############
name: offset
path: origin.offset
max_definition_level: 2
max_repetition_level: 0
physical_type: INT64
logical_type: None
converted_type (legacy): NONE
compression: ZSTD (space_saved: 89%)

############ Column(relayTs) ############
name: relayTs
path: relayTs
max_definition_level: 1
max_repetition_level: 0
physical_type: INT64
logical_type: None
converted_type (legacy): NONE
compression: ZSTD (space_saved: 54%)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's somewhat disappointing. Can you also run parquet-tools dump -d -n on the gpu and cpu files?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have those options in my version of parquet-tools. Here is the washed inspect --detail output for each.
cust-inspect-detail-cpu.txt
cust-inspect-detail-gpu.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't realize parquet-tools was an overloaded name. I was referring to the (now deprecated it seems) jar that comes with parquet-mr. Thanks for the extra details...combing through it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrennan333 So the compression is less good for the 'data' column ("only" 85% vs 92%). That may be down to there being fewer pages, given the 1MB default page size for parquet-mr vs 512KB for libcudf. But it seems the fragment sizes are allowing zstd to compress things, so that's good news.

@etseidl
Copy link
Contributor Author

etseidl commented Jan 27, 2023

@vuule here are the parquet_write_io_compression benchmarks:

before:

|    io    | compression | cardinality | run_length | Samples | CPU Time | Noise | GPU Time | Noise | bytes_per_second | peak_memory_usage | encoded_file_size |
|----------|-------------|-------------|------------|---------|----------|-------|----------|-------|------------------|-------------------|-------------------|
| FILEPATH |      SNAPPY |           0 |          1 |      5x |  3.466 s | 1.42% |  3.466 s | 1.42% |        154889730 |         1.556 GiB |       493.950 MiB |
| FILEPATH |      SNAPPY |        1000 |          1 |      8x |  2.019 s | 1.27% |  2.019 s | 1.27% |        265967925 |         2.536 GiB |       161.238 MiB |
| FILEPATH |      SNAPPY |           0 |         32 |      5x |  1.597 s | 0.35% |  1.597 s | 0.35% |        336229281 |         2.532 GiB |        49.703 MiB |
| FILEPATH |      SNAPPY |        1000 |         32 |     10x |  1.592 s | 0.55% |  1.592 s | 0.55% |        337129111 |         2.536 GiB |        23.416 MiB |
| FILEPATH |        NONE |           0 |          1 |      9x |  1.752 s | 0.74% |  1.752 s | 0.74% |        306399699 |         1.556 GiB |       501.137 MiB |
| FILEPATH |        NONE |        1000 |          1 |     10x |  1.277 s | 0.49% |  1.277 s | 0.49% |        420275736 |         2.536 GiB |       169.774 MiB |
| FILEPATH |        NONE |           0 |         32 |     11x |  1.373 s | 0.75% |  1.373 s | 0.75% |        390934488 |         2.532 GiB |        56.544 MiB |
| FILEPATH |        NONE |        1000 |         32 |      5x |  1.323 s | 0.48% |  1.323 s | 0.48% |        405657348 |         2.536 GiB |        30.410 MiB |

This PR:

|    io    | compression | cardinality | run_length | Samples | CPU Time | Noise | GPU Time | Noise | bytes_per_second | peak_memory_usage | encoded_file_size |
|----------|-------------|-------------|------------|---------|----------|-------|----------|-------|------------------|-------------------|-------------------|
| FILEPATH |      SNAPPY |           0 |          1 |      5x |  3.485 s | 1.38% |  3.485 s | 1.38% |        154041364 |         1.559 GiB |       494.432 MiB |
| FILEPATH |      SNAPPY |        1000 |          1 |      5x |  2.008 s | 0.32% |  2.008 s | 0.32% |        267352736 |         2.539 GiB |       161.603 MiB |
| FILEPATH |      SNAPPY |           0 |         32 |      5x |  1.598 s | 0.30% |  1.598 s | 0.30% |        336051961 |         2.535 GiB |        50.053 MiB |
| FILEPATH |      SNAPPY |        1000 |         32 |      5x |  1.591 s | 0.21% |  1.591 s | 0.21% |        337388398 |         2.539 GiB |        23.724 MiB |
| FILEPATH |        NONE |           0 |          1 |      5x |  1.742 s | 0.34% |  1.742 s | 0.34% |        308227480 |         1.559 GiB |       501.113 MiB |
| FILEPATH |        NONE |        1000 |          1 |      5x |  1.275 s | 0.42% |  1.275 s | 0.42% |        421134277 |         2.539 GiB |       169.721 MiB |
| FILEPATH |        NONE |           0 |         32 |      5x |  1.357 s | 0.17% |  1.357 s | 0.17% |        395569655 |         2.535 GiB |        56.571 MiB |
| FILEPATH |        NONE |        1000 |         32 |      5x |  1.307 s | 0.20% |  1.307 s | 0.20% |        410754253 |         2.539 GiB |        30.415 MiB |


if (column.type().id() == type_id::STRING) {
auto scol = strings_column_view(column);
size_type colsize = cudf::detail::get_value<size_type>(scol.offsets(), column.size(), stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if there are sliced rows at the start of a column. I think we would need to subtract the first offset from this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm frankly amazed this code worked at all 😉 Yes, I'll subtract offsets[0].

@etseidl
Copy link
Contributor Author

etseidl commented Jan 30, 2023

Ack. The size calculation is all wrong...should be using leaf columns rather than the whole nested mess. I'm closing this to look at ways to do the per-column fragment sizes we were talking about.

@etseidl etseidl closed this Jan 30, 2023
@etseidl etseidl deleted the feature/frag_sizev2 branch March 6, 2023 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants