Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZStandard produces surprisingly large population files #2764

Closed
neuma opened this issue Sep 13, 2023 · 0 comments · Fixed by #2766
Closed

ZStandard produces surprisingly large population files #2764

neuma opened this issue Sep 13, 2023 · 0 comments · Fixed by #2766
Assignees
Labels
enhancement performance performance-related issues

Comments

@neuma
Copy link
Contributor

neuma commented Sep 13, 2023

The PopulationWriterHandler, e.g. PopulationWriterHandlerImplV6, flushes the buffered writer after each person.
This results in small buffers written to file, if the person has only a few or very short plans.
Dictionary-based compression algorithms like zst seem to have a problem building efficient dictionaries for small buffers and hence produce larger compressed files than regular deflate gzip.

Proposal
Move the flush to endPlans.

Possible Pitfalls
Flushing after each person guarantees a recoverable file in case of a crash - at least for uncompressed files.


Some Benchmarks
Given an uncompressed plans file with empty legs that it read in via StreamingPopulationReader and immediately streamed out via StreamingPopulationWriter. Output is written once as standard gz and once as zst with the default compression level of 6.

image

Given the current situation with the flush at the end of writePerson zst produces a 15% larger file compared to the regular gz implementation of MATSim. Moving the flush to endPlans produces the expected file size reduction compared to gz. Note that gz seems to be unaffected by the flush at the end of writePerson.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement performance performance-related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant