delayed column opening during merge #2132

PSeitz · 2023-07-21T07:39:00Z

This is the first part of addressing quickwit-oss/quickwit#3633
Instead of loading all Column into memory for the merge, only the current column_name, column_category group is loaded.

Second part would be to replace mmap calls with file reads, for better control on releasing memory

common/src/file_slice.rs

columnar/src/columnar/reader/mod.rs

columnar/src/columnar/merge/mod.rs

codecov-commenter · 2023-07-21T08:39:01Z

Codecov Report

Merging #2132 (1bfbfda) into main (5fafe4b) will decrease coverage by 0.02%.
The diff coverage is 88.83%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##             main    #2132      +/-   ##
==========================================
- Coverage   94.37%   94.36%   -0.02%     
==========================================
  Files         321      319       -2     
  Lines       60821    60874      +53     
==========================================
+ Hits        57401    57442      +41     
- Misses       3420     3432      +12

Impacted Files	Coverage Δ
common/src/file_slice.rs	`72.38% <0.00%> (-9.34%)`	⬇️
src/lib.rs	`99.05% <ø> (ø)`
columnar/src/columnar/merge/mod.rs	`94.81% <99.00%> (+0.73%)`	⬆️
columnar/src/columnar/merge/tests.rs	`98.48% <100.00%> (+0.04%)`	⬆️
columnar/src/columnar/reader/mod.rs	`79.41% <100.00%> (+1.15%)`	⬆️
columnar/src/dynamic_column.rs	`70.52% <100.00%> (ø)`
common/src/group_by.rs	`100.00% <100.00%> (ø)`
src/core/index_meta.rs	`96.14% <100.00%> (ø)`
src/store/compressors.rs	`97.70% <100.00%> (+5.51%)`	⬆️
src/store/decompressors.rs	`97.82% <100.00%> (-0.39%)`	⬇️
... and 1 more

... and 4 files with indirect coverage changes

This is the first part of addressing #3633 Instead of loading all Column into memory for the merge, only the current column_name group is loaded. This can be done since the sstable streams the columns lexicographically.

columnar/src/columnar/merge/mod.rs

fulmicoton · 2023-07-27T09:01:51Z

columnar/src/columnar/merge/mod.rs

-                continue;
+/// Iterates over the columns of the columnar readers, grouped by column name.
+/// Key functionality is that `open` of the Columns is done lazy per group.
+fn group_columns_for_merge_iter<'a>(


The code is much more complicated. Could we have kept the previous code, and worked of DynamicColumnHandle instead of DynamicColumn and gotten the same benefit?

The columns need to be loaded to access things like min max values. The alternative would be to change the code so we can access only the metadata with something like open_metadata

We can lighten the work of group_columns_for_merge and change it so that it does not:

filter out columns that end up being empty.

identify the resulting ColumnType.

It would just stops at returning a

fn group_columns_for_merge( columnar_readers: &[&ColumnarReader], required_columns: &[(String, ColumnType)], merge_row_order: &MergeRowOrder, ) -> io::Result<BTreeMap<(String, ColumnTypeCategory), Vec<Option<DynamicColumnHandle>>>> {

On the callsite, we would then iterate through these handles, and open the DynamicColumn,
and do the conversion of the (ColumnTypeCategory, Vec<Option<DynamicColumnHandle>>) into a
(ColumnType, Vec<Option<DynamicHandle>>) where all of the columns that are Some contain at least one element.

The rest is then similar to what exists today.
The code should not be more complicated than it is today.

One tricky part to consider is how to deal with required columns.
If all columns for this (String, ColumnTypeCategory) are empty (following deletes), whether or not we want a column depends on whether the column is marked as required or not.

In the high cardinality num column case, the BTreeMap variant may cost more than 100MB heap allocation, which is why I opted to the Iterator version.
From the code complexity the main difference is the mixing of the required columns into the (sorted by column name) Column stream instead of simply inserting them into the BTreeMap

This is too complex. Can you go with the BTreeMap solution?

I switched to a BtreeMap

common/src/file_slice.rs

adamreichold reviewed Jul 21, 2023

View reviewed changes

common/src/file_slice.rs Outdated Show resolved Hide resolved

adamreichold reviewed Jul 21, 2023

View reviewed changes

common/src/file_slice.rs Outdated Show resolved Hide resolved

adamreichold reviewed Jul 21, 2023

View reviewed changes

columnar/src/columnar/reader/mod.rs Show resolved Hide resolved

adamreichold reviewed Jul 21, 2023

View reviewed changes