-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize performance of table writers and refactor table model #74
Conversation
Fixes to make use of the new buffer_table driven model
Fix probe_meta and measurement_meta passing Fix tests
Update tests
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #74 +/- ##
==========================================
+ Coverage 83.62% 84.06% +0.43%
==========================================
Files 74 74
Lines 5893 6086 +193
==========================================
+ Hits 4928 5116 +188
- Misses 965 970 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
The places where it's placed might not be 100% accurate, but at least we get some kind of ballpark figure and we iterate on it.
Co-authored-by: DecFox <[email protected]>
…able-model-refactor-rb * 'table-model-refactor-rb' of github.com:ooni/data: Update oonidata/src/oonidata/models/experiment_result.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* v5.0.0-rc.0: Add simple redirector Tidy up the layout of the analysis viewer Add an observations viewer Get rid of all the dataviz that isn't the analysis visualizer Release/5.0.0 alpha3 (#81) Offset analysis schedule by 6 hours Add support for temporal cloud (#79) fix: support for sorting network_events using transaction_id (#51) Optimize performance of table writers and refactor table model (#74) Improvements related to deployment (#69) Add .codecov file Update jsonl sync example Setup workflow to publish ooni data docs (#73) OONI Pipeline v5 alpha (#64) Fix codecov (#62) Temporal workflows (#61) OONI Data, OONI Pipeline split (#60) Add support for caching netinfodb
This is an important refactor of the table models.
It moves the ProbeMeta and MeasurementMeta into nested composed classes, which is nicer because you don't get lost in the complicated class inheritance, but most importantly it significantly boosts performance because we don't have to make copies of each MeasurementMeta to pass it around.
I also introduced to better patterns for handling the TableModels. Basically you decorate a table that should end up inside of the database via the
table_model
decorator and then when it's used type safety is enforced by theTableModelProtocol
.Thanks to this refactoring it's also possible to improve the way in which we handle both the buffering and serialization of writes, but also the creation of the
CREATE
table queries by using python type hints.Some of these features require recentish versions of python (i.e. >=3.10), however we have already decided that backward compatibility is not a priority for the pipeline.
We might however need some kind of compatibility layer if some of these functions need to be used by oonidata (though we might also drop older python support there too at some point if it gets too complex to manage).
There are still several parts which need to be refactored, but I suggest doing that later and they are marked as TODO(art).
This also adds support for making use of buffer tables, which has a significant performance boost in a parallalized
workflow avoiding the issue outlined in here: #68
Moreover, we come up with better pattern to wait for table buffers being flushed before starting the dependant workflow. this can be implemented using primitives of temporal.
We also enrich columns with the new processing time metadata for performance monitoring.