-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(ingest/bigquery): Add performance testing framework for bigquery usage #7690
test(ingest/bigquery): Add performance testing framework for bigquery usage #7690
Conversation
Codecov ReportPatch coverage:
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## master #7690 +/- ##
==========================================
- Coverage 74.39% 66.59% -7.81%
==========================================
Files 353 353
Lines 35386 35433 +47
==========================================
- Hits 26327 23596 -2731
- Misses 9059 11837 +2778
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 87 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
|
||
@pytest.fixture | ||
def report_log_level_info(): | ||
yield from set_log_level(report_logger, logging.INFO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this stuff necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see any error logs which made debugging these tests pretty annoying. Is this not usually the case and I'm doing something wrong? I also want to see INFO logs from the report class, so that when you run these tests, you can see how long each stage took
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this stuff is built into pytest: https://docs.pytest.org/en/7.1.x/how-to/logging.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice. I don't think it's unreasonable to set the log level for a specific logger for a specific test, so e.g. you can set overall log level to DEBUG but INFO for a certain logger. What do you think about removing the root logger changes (although, how is it possible that by default we don't show error logs?) but keeping the call to set the report log level. I also don't want someone running these tests and not seeing half the reporting it's supposed to show because they forgot to call pytest with --log-level INFO
or something
|
||
report.set_project_state("All", "Event Ingestion") | ||
with PerfTimer() as timer: | ||
assert usage_extractor, table_refs # TODO: Replace with call to usage extractor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's going on here? don't we need to call the usage_extractor
to actually test it's performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but I wrote this to work with the cross-project usage implementation that uses FileBackedDict, or at least the in-memory cross-project usage implementation that I have living in a branch. I don't think the current bigquery code is set up to take in a list of events and table_refs
like those 2 are, so just leaving the call out here. Then I can add it in in those other branches, once I merge these changes in
Added changes to metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_report.py that I forgot to add before |
from datahub.utilities.perf_timer import PerfTimer | ||
|
||
|
||
def set_log_level(logger: logging.Logger, level: int) -> Generator[None, None, None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll probably need to add a rule to exclude this file from our docs site - can add that in the generateDocsDir script in docs-website
… usage (#7690) - Creates metadata-ingestion/tests/performance directory - Excludes metadata-ingestion/tests from docs generation - Updates bigquery reporting around project state
Performance testing framework that contains:
Query
s into BigQueryAuditEvent
sChecklist