Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable setting compression level #1378

Merged
merged 2 commits into from
Jun 10, 2022
Merged

enable setting compression level #1378

merged 2 commits into from
Jun 10, 2022

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented May 30, 2022

No description provided.

@PSeitz PSeitz force-pushed the test_compression branch 2 times, most recently from 4328b96 to 834cd4d Compare May 30, 2022 08:06
pub fn new(writer: WritePtr, compressor: Compressor, block_size: usize) -> StoreWriter {
pub fn new(
writer: WritePtr,
compressor: Compressor,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't that part of the compressor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compressor is currently constructed by deserializing from the compressor name. To pass more parameters we would need much more complicated deserialization.
And there are cases where the Compressor is only used for decompression.

Copy link
Collaborator

@fulmicoton fulmicoton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't compression part of the compressor?

If we have parameters that are compressor specific, it seems like an antipattern to have the union of these parameters in a list of options.

@codecov-commenter
Copy link

Codecov Report

Merging #1378 (834cd4d) into main (f0a2b1c) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1378   +/-   ##
=======================================
  Coverage   94.22%   94.23%           
=======================================
  Files         235      235           
  Lines       42893    42922   +29     
=======================================
+ Hits        40418    40449   +31     
+ Misses       2475     2473    -2     
Impacted Files Coverage Δ
src/core/index_meta.rs 93.02% <100.00%> (+0.03%) ⬆️
src/indexer/segment_serializer.rs 98.14% <100.00%> (+0.03%) ⬆️
src/indexer/segment_writer.rs 96.37% <100.00%> (+0.03%) ⬆️
src/store/compression_zstd_block.rs 90.47% <100.00%> (+1.00%) ⬆️
src/store/compressors.rs 87.09% <100.00%> (+2.19%) ⬆️
src/store/mod.rs 99.17% <100.00%> (ø)
src/store/writer.rs 99.00% <100.00%> (+0.09%) ⬆️
src/fastfield/reader.rs 89.91% <0.00%> (+0.84%) ⬆️
src/directory/watch_event_router.rs 96.63% <0.00%> (+0.84%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f0a2b1c...834cd4d. Read the comment docs.

@PSeitz
Copy link
Contributor Author

PSeitz commented May 30, 2022

Why isn't compression part of the compressor?

We have decompress only compressor

    /// Opens a store reader
    pub fn open(store_file: FileSlice) -> io::Result<StoreReader> {
        let (footer, data_and_offset) = DocStoreFooter::extract_footer(store_file)?;

        let (data_file, offset_index_file) = data_and_offset.split(footer.offset as usize);
        let index_data = offset_index_file.read_bytes()?;
        let space_usage = StoreSpaceUsage::new(data_file.len(), offset_index_file.len());
        let skip_index = SkipIndex::open(index_data);
        Ok(StoreReader {
            compressor: footer.compressor,
            data: data_file,
            cache: Arc::new(Mutex::new(LruCache::new(LRU_CACHE_CAPACITY))),
            cache_hits: Default::default(),
            cache_misses: Default::default(),
            skip_index: Arc::new(skip_index),
            space_usage,
        })
    }

If we have parameters that are compressor specific, it seems like an antipattern to have the union of these parameters in a list of options.

We could have zstd_compression_level or similar instead ofdocstore_compression_level, but I think we can use that parameter for multiple compressors. It's usually just a number.

@PSeitz PSeitz requested a review from fulmicoton May 31, 2022 03:17
@fulmicoton
Copy link
Collaborator

@PSeitz Can you investigate if a refactoring to make that nicer?

So the compressor enum would contain these params.
It then needs to be serialized/deserialized in the meta.json, as well as in the footer.

The footer part is a bit lame, because the level is not necessary, but that's fine.

@PSeitz PSeitz force-pushed the test_compression branch 4 times, most recently from 0b1196f to bf461ba Compare June 2, 2022 15:20
use custom de/serializer for compressor
accept parameters like zstd(compression_level=5) as compressor
@PSeitz PSeitz force-pushed the test_compression branch from bf461ba to 4d9d2b6 Compare June 2, 2022 15:29
pub fn compress(
uncompressed: &[u8],
compressed: &mut Vec<u8>,
compression_level: Option<i32>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it be negative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for more speed

@PSeitz PSeitz merged commit 328bd96 into main Jun 10, 2022
@PSeitz PSeitz deleted the test_compression branch June 10, 2022 03:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants