You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The PopulationWriterHandler, e.g. PopulationWriterHandlerImplV6, flushes the buffered writer after each person.
This results in small buffers written to file, if the person has only a few or very short plans.
Dictionary-based compression algorithms like zst seem to have a problem building efficient dictionaries for small buffers and hence produce larger compressed files than regular deflate gzip.
Proposal
Move the flush to endPlans.
Possible Pitfalls
Flushing after each person guarantees a recoverable file in case of a crash - at least for uncompressed files.
Some Benchmarks
Given an uncompressed plans file with empty legs that it read in via StreamingPopulationReader and immediately streamed out via StreamingPopulationWriter. Output is written once as standard gz and once as zst with the default compression level of 6.
Given the current situation with the flush at the end of writePerson zst produces a 15% larger file compared to the regular gz implementation of MATSim. Moving the flush to endPlans produces the expected file size reduction compared to gz. Note that gz seems to be unaffected by the flush at the end of writePerson.
The text was updated successfully, but these errors were encountered:
The PopulationWriterHandler, e.g. PopulationWriterHandlerImplV6, flushes the buffered writer after each person.
This results in small buffers written to file, if the person has only a few or very short plans.
Dictionary-based compression algorithms like zst seem to have a problem building efficient dictionaries for small buffers and hence produce larger compressed files than regular deflate gzip.
Proposal
Move the flush to endPlans.
Possible Pitfalls
Flushing after each person guarantees a recoverable file in case of a crash - at least for uncompressed files.
Some Benchmarks
Given an uncompressed plans file with empty legs that it read in via StreamingPopulationReader and immediately streamed out via StreamingPopulationWriter. Output is written once as standard gz and once as zst with the default compression level of 6.
Given the current situation with the flush at the end of writePerson zst produces a 15% larger file compared to the regular gz implementation of MATSim. Moving the flush to endPlans produces the expected file size reduction compared to gz. Note that gz seems to be unaffected by the flush at the end of writePerson.
The text was updated successfully, but these errors were encountered: