Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use enum_dispatch #615

Merged
merged 3 commits into from
Nov 26, 2024
Merged

use enum_dispatch #615

merged 3 commits into from
Nov 26, 2024

Conversation

yuunlimm
Copy link
Contributor

@yuunlimm yuunlimm commented Nov 22, 2024

Description

we were initially considering using dyn trait to address the concerns around having too many match statements. however, since dyn trait has a performance concern, we decided to go with enum_dispatch, which is 10x performant

Test Plan

Screenshot 2024-11-22 at 11.59.24 AM.png

@yuunlimm yuunlimm marked this pull request as ready for review November 22, 2024 19:53
@larry-aptos
Copy link
Collaborator

do you consider/evaluate enum_dispatch? which do you prefer(dispatch vs downcast)?

@yuunlimm
Copy link
Contributor Author

do you consider/evaluate enum_dispatch? which do you prefer(dispatch vs downcast)?

yeah I think enum_dispatch also works since we know there will be only fixed number of variants. and i assume enum_diospatch will be more performant than downcast?


/// Struct for handling Parquet types dynamically.
#[derive(Debug)]
pub struct ParquetTypeStructs {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this struct ParquetTypeStructs needed if it only contains 1 field: data? Maybe we can use Box<dyn ParquetTypeTrait> directly.

@@ -100,6 +61,10 @@ pub fn create_new_writer(schema: Arc<Type>) -> anyhow::Result<SerializedFileWrit
SerializedFileWriter::new(Vec::new(), schema, props_arc).context("Failed to create new writer")
}

fn determine_parquet_type(buffer: &ParquetTypeStructs) -> ParquetTypeEnum {
Copy link
Collaborator

@larry-aptos larry-aptos Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you Box<dyn trait> and need the type still, I assume dispatch is better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I think enum dispatch is the better choice here

@yuunlimm yuunlimm force-pushed the 11-22-parquet-structs branch from 5b9608c to 9422589 Compare November 22, 2024 23:17
Copy link

codecov bot commented Nov 22, 2024

Codecov Report

Attention: Patch coverage is 29.09091% with 39 lines in your changes missing coverage. Please review.

Project coverage is 48.8%. Comparing base (3d22e3b) to head (045d873).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
rust/sdk-processor/src/parquet_processors/mod.rs 32.6% 33 Missing ⚠️
...ust/sdk-processor/src/steps/common/gcs_uploader.rs 0.0% 4 Missing ⚠️
...-processor/src/steps/common/parquet_buffer_step.rs 0.0% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main    #615     +/-   ##
=======================================
+ Coverage   48.7%   48.8%   +0.1%     
=======================================
  Files        186     186             
  Lines      24904   24890     -14     
=======================================
+ Hits       12131   12151     +20     
+ Misses     12773   12739     -34     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yuunlimm yuunlimm changed the title replace enums with struct with dyn trait to address perofrmance risk with having too many enums replace enums with struct with dyn trait Nov 25, 2024
@yuunlimm yuunlimm changed the title replace enums with struct with dyn trait replace dyn trait with enum_dispatch Nov 25, 2024
Copy link
Contributor

@dermanyang dermanyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spoke offline; using enum_dispatch here for better performance and worth the extensibility trade-off. This crate also gives some nice code dedupe. See enum_dispatch docs for more info.

@yuunlimm can you update the PR desc please?

@@ -146,3 +146,4 @@ serde_canonical_json = "1.0.0"
allocative = "0.3.3"
allocative_derive = "0.3.3"
mockall = "0.12.1"
downcast-rs = "1.2.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants