Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize performance of table writers and refactor table model #74

Merged
merged 24 commits into from
Jun 27, 2024

Conversation

hellais
Copy link
Member

@hellais hellais commented May 3, 2024

This is an important refactor of the table models.

It moves the ProbeMeta and MeasurementMeta into nested composed classes, which is nicer because you don't get lost in the complicated class inheritance, but most importantly it significantly boosts performance because we don't have to make copies of each MeasurementMeta to pass it around.

I also introduced to better patterns for handling the TableModels. Basically you decorate a table that should end up inside of the database via the table_model decorator and then when it's used type safety is enforced by the TableModelProtocol.

Thanks to this refactoring it's also possible to improve the way in which we handle both the buffering and serialization of writes, but also the creation of the CREATE table queries by using python type hints.

Some of these features require recentish versions of python (i.e. >=3.10), however we have already decided that backward compatibility is not a priority for the pipeline.

We might however need some kind of compatibility layer if some of these functions need to be used by oonidata (though we might also drop older python support there too at some point if it gets too complex to manage).

There are still several parts which need to be refactored, but I suggest doing that later and they are marked as TODO(art).

This also adds support for making use of buffer tables, which has a significant performance boost in a parallalized
workflow avoiding the issue outlined in here: #68

Moreover, we come up with better pattern to wait for table buffers being flushed before starting the dependant workflow. this can be implemented using primitives of temporal.

We also enrich columns with the new processing time metadata for performance monitoring.

Copy link

codecov bot commented May 3, 2024

Codecov Report

Attention: Patch coverage is 94.72296% with 20 lines in your changes missing coverage. Please review.

Project coverage is 84.06%. Comparing base (74a9b76) to head (730b8f5).

Files Patch % Lines
oonidata/src/oonidata/models/base.py 80.76% 5 Missing ⚠️
oonipipeline/src/oonipipeline/db/create_tables.py 93.93% 4 Missing ⚠️
...pipeline/src/oonipipeline/analysis/web_analysis.py 77.77% 2 Missing ⚠️
oonipipeline/src/oonipipeline/cli/commands.py 85.71% 2 Missing ⚠️
...onipipeline/analysis/website_experiment_results.py 66.66% 1 Missing ⚠️
oonipipeline/src/oonipipeline/db/connections.py 98.41% 1 Missing ⚠️
...c/oonipipeline/temporal/activities/observations.py 85.71% 1 Missing ⚠️
...oonipipeline/transforms/measurement_transformer.py 93.75% 1 Missing ⚠️
...nipipeline/transforms/nettests/web_connectivity.py 92.30% 1 Missing ⚠️
...peline/src/oonipipeline/transforms/observations.py 85.71% 1 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #74      +/-   ##
==========================================
+ Coverage   83.62%   84.06%   +0.43%     
==========================================
  Files          74       74              
  Lines        5893     6086     +193     
==========================================
+ Hits         4928     5116     +188     
- Misses        965      970       +5     
Flag Coverage Δ
oonidata 77.41% <90.00%> (-0.03%) ⬇️
oonipipeline 87.10% <95.44%> (+0.55%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hellais hellais changed the title Table model refactor rb Optimize performance of table writers and refactor table model Jun 20, 2024
@hellais hellais requested a review from DecFox June 27, 2024 05:09
@hellais hellais marked this pull request as ready for review June 27, 2024 05:09
hellais and others added 3 commits June 27, 2024 09:45
…able-model-refactor-rb

* 'table-model-refactor-rb' of github.com:ooni/data:
  Update oonidata/src/oonidata/models/experiment_result.py
@hellais hellais requested a review from DecFox June 27, 2024 09:08
Copy link
Contributor

@DecFox DecFox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@hellais hellais merged commit c797c26 into main Jun 27, 2024
9 checks passed
@hellais hellais deleted the table-model-refactor-rb branch June 27, 2024 13:02
hellais added a commit that referenced this pull request Aug 30, 2024
* v5.0.0-rc.0:
  Add simple redirector
  Tidy up the layout of the analysis viewer
  Add an observations viewer
  Get rid of all the dataviz that isn't the analysis visualizer
  Release/5.0.0 alpha3 (#81)
  Offset analysis schedule by 6 hours
  Add support for temporal cloud (#79)
  fix: support for sorting network_events using transaction_id (#51)
  Optimize performance of table writers and refactor table model (#74)
  Improvements related to deployment (#69)
  Add .codecov file
  Update jsonl sync example
  Setup workflow to publish ooni data docs (#73)
  OONI Pipeline v5 alpha (#64)
  Fix codecov (#62)
  Temporal workflows (#61)
  OONI Data, OONI Pipeline split (#60)
  Add support for caching netinfodb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants