Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add options for different compression levels for parquet files #153

Merged
merged 4 commits into from
Nov 29, 2024

Conversation

tomjholland
Copy link
Collaborator

@tomjholland tomjholland commented Oct 16, 2024

This pull request includes several changes to the pyprobe project, focusing on adding new functionality for compression options and refactoring the code to improve clarity and maintainability.

New Functionality:

  • Added support for different compression priorities (performance, file size, uncompressed) when converting files to PyProBE format. ([[1]](https://github.com/ImperialCollegeLondon/PyProBE/pull/153/files#diff-441aec132d927ef70cd9824112874874d1fb11d9d8bc0604c3b492ed3ad9a7c8L64-R110), [[2]](https://github.com/ImperialCollegeLondon/PyProBE/pull/153/files#diff-441aec132d927ef70cd9824112874874d1fb11d9d8bc0604c3b492ed3ad9a7c8R122-R207), [[3]](https://github.com/ImperialCollegeLondon/PyProBE/pull/153/files#diff-441aec132d927ef70cd9824112874874d1fb11d9d8bc0604c3b492ed3ad9a7c8R217-R219), [[4]](https://github.com/ImperialCollegeLondon/PyProBE/pull/153/files#diff-441aec132d927ef70cd9824112874874d1fb11d9d8bc0604c3b492ed3ad9a7c8R241-L162))
  • Updated the process_cycler_file and process_generic_file methods to include the compression_priority parameter. ([[1]](https://github.com/ImperialCollegeLondon/PyProBE/pull/153/files#diff-441aec132d927ef70cd9824112874874d1fb11d9d8bc0604c3b492ed3ad9a7c8L64-R110), [[2]](https://github.com/ImperialCollegeLondon/PyProBE/pull/153/files#diff-441aec132d927ef70cd9824112874874d1fb11d9d8bc0604c3b492ed3ad9a7c8R217-R219))
  • Add the functionality to skip re-generation of parquet files if they already exist

Code Refactoring:

  • Extracted the file conversion logic into a new private method _convert_to_parquet to simplify the process_cycler_file and process_generic_file methods. ([[1]](https://github.com/ImperialCollegeLondon/PyProBE/pull/153/files#diff-441aec132d927ef70cd9824112874874d1fb11d9d8bc0604c3b492ed3ad9a7c8L64-R110), [[2]](https://github.com/ImperialCollegeLondon/PyProBE/pull/153/files#diff-441aec132d927ef70cd9824112874874d1fb11d9d8bc0604c3b492ed3ad9a7c8R217-R219))
  • Modified the _write_parquet method to accept a compression parameter and use it when writing the parquet file. ([pyprobe/cell.pyR323-R335](https://github.com/ImperialCollegeLondon/PyProBE/pull/153/files#diff-441aec132d927ef70cd9824112874874d1fb11d9d8bc0604c3b492ed3ad9a7c8R323-R335))

Documentation and Examples:

  • Added new code cells and markdown explanations to the comparing-pyprobe-performance.ipynb notebook to demonstrate the new compression options and their performance. ([[1]](https://github.com/ImperialCollegeLondon/PyProBE/pull/153/files#diff-f97df501eeb86d9944940d597c2c80a87b666e5ecd0715eb1cd37238964dc9e1R326-R334), [[2]](https://github.com/ImperialCollegeLondon/PyProBE/pull/153/files#diff-f97df501eeb86d9944940d597c2c80a87b666e5ecd0715eb1cd37238964dc9e1R357-R397))

Testing:

  • Updated the test_process_cycler_file test case to include the compression_priority and overwrite_existing parameters. ([tests/test_cell.pyL76-R83](https://github.com/ImperialCollegeLondon/PyProBE/pull/153/files#diff-5f57142fa1c2be045e3012de7f735da962b090fe45c8957a3deef79bb4819677L76-R83))

@tomjholland tomjholland added feature Adding a new functionality, small or large optimisation Improvements in the performance of the code labels Oct 16, 2024
@tomjholland tomjholland marked this pull request as ready for review November 29, 2024 15:25
@tomjholland tomjholland force-pushed the add-parquet-compression-options branch from a4382a2 to c38e2a3 Compare November 29, 2024 16:12
@tomjholland tomjholland added the refactor Refactoring existing code without significantly changing functionality label Nov 29, 2024
@tomjholland tomjholland merged commit 9bcfbd6 into main Nov 29, 2024
2 checks passed
@tomjholland tomjholland deleted the add-parquet-compression-options branch November 29, 2024 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Adding a new functionality, small or large optimisation Improvements in the performance of the code refactor Refactoring existing code without significantly changing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant