test(ingest/bigquery): Add performance testing framework for bigquery usage #7690

asikowitz · 2023-03-24T21:15:53Z

Performance testing framework that contains:

A simplified model of a data warehouse's metadata (tables, views, queries, etc.)
A function for generating examples of such a model of different sizes
A function for converting generic Querys into BigQuery AuditEvents
An unfinished test for testing bigquery usage performance

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

… usage

codecov-commenter · 2023-03-24T21:31:09Z

Codecov Report

Patch coverage: 93.33% and project coverage change: -7.81 ⚠️

Comparison is base (301c861) 74.39% compared to head (7d7bb67) 66.59%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7690      +/-   ##
==========================================
- Coverage   74.39%   66.59%   -7.81%     
==========================================
  Files         353      353              
  Lines       35386    35433      +47     
==========================================
- Hits        26327    23596    -2731     
- Misses       9059    11837    +2778

Flag	Coverage Δ
pytest-testIntegration	`?`
pytest-testIntegrationBatch1	`36.50% <40.00%> (+0.03%)`	⬆️
pytest-testQuick	`63.60% <93.33%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...hub/ingestion/source/bigquery_v2/bigquery_audit.py	`54.88% <87.50%> (+0.83%)`	⬆️
...ub/ingestion/source/bigquery_v2/bigquery_report.py	`100.00% <100.00%> (+4.76%)`	⬆️

... and 87 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

hsheth2 · 2023-03-27T22:45:11Z

metadata-ingestion/tests/performance/test_bigquery_usage.py

+
+@pytest.fixture
+def report_log_level_info():
+    yield from set_log_level(report_logger, logging.INFO)


why is this stuff necessary?

I didn't see any error logs which made debugging these tests pretty annoying. Is this not usually the case and I'm doing something wrong? I also want to see INFO logs from the report class, so that when you run these tests, you can see how long each stage took

All this stuff is built into pytest: https://docs.pytest.org/en/7.1.x/how-to/logging.html

Ah nice. I don't think it's unreasonable to set the log level for a specific logger for a specific test, so e.g. you can set overall log level to DEBUG but INFO for a certain logger. What do you think about removing the root logger changes (although, how is it possible that by default we don't show error logs?) but keeping the call to set the report log level. I also don't want someone running these tests and not seeing half the reporting it's supposed to show because they forgot to call pytest with --log-level INFO or something

hsheth2 · 2023-03-28T04:01:02Z

metadata-ingestion/tests/performance/test_bigquery_usage.py

+
+    report.set_project_state("All", "Event Ingestion")
+    with PerfTimer() as timer:
+        assert usage_extractor, table_refs  # TODO: Replace with call to usage extractor


what's going on here? don't we need to call the usage_extractor to actually test it's performance?

Yeah, but I wrote this to work with the cross-project usage implementation that uses FileBackedDict, or at least the in-memory cross-project usage implementation that I have living in a branch. I don't think the current bigquery code is set up to take in a list of events and table_refs like those 2 are, so just leaving the call out here. Then I can add it in in those other branches, once I merge these changes in

asikowitz · 2023-03-28T11:35:22Z

Added changes to metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_report.py that I forgot to add before

treff7es · 2023-03-28T13:16:08Z

metadata-ingestion/tests/performance/test_bigquery_usage.py

+from datahub.utilities.perf_timer import PerfTimer
+
+
+def set_log_level(logger: logging.Logger, level: int) -> Generator[None, None, None]:


Why is this needed?

See comment above

…add readme

hsheth2 · 2023-03-29T18:01:53Z

metadata-ingestion/tests/performance/README.md

we'll probably need to add a rule to exclude this file from our docs site - can add that in the generateDocsDir script in docs-website

… usage (#7690) - Creates metadata-ingestion/tests/performance directory - Excludes metadata-ingestion/tests from docs generation - Updates bigquery reporting around project state

test(ingest/bigquery): Add performance testing framework for bigquery…

503fb16

… usage

github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Mar 24, 2023

asikowitz requested review from hsheth2 and jjoyce0510 March 24, 2023 21:16

vercel bot deployed to Preview March 24, 2023 21:27 View deployment

lint

8c976e7

vercel bot deployed to Preview March 27, 2023 22:27 View deployment

hsheth2 reviewed Mar 28, 2023

View reviewed changes

update project state reporting

9d0194c

vercel bot deployed to Preview March 28, 2023 11:46 View deployment

treff7es reviewed Mar 28, 2023

View reviewed changes

fix bugs; simplify logger logic; revert some report changes; fix ci; …

6a4de92

…add readme

asikowitz requested a review from hsheth2 March 29, 2023 17:25

vercel bot had a problem deploying to Preview March 29, 2023 17:32 Failure

hsheth2 approved these changes Mar 29, 2023

View reviewed changes

exclude metadata-ingestion/tests/ from docs generation

7d7bb67

vercel bot deployed to Preview March 29, 2023 18:56 View deployment

asikowitz merged commit 54a3727 into datahub-project:master Mar 29, 2023

asikowitz deleted the ingestion-performance-testing-bq branch March 29, 2023 21:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(ingest/bigquery): Add performance testing framework for bigquery usage #7690

test(ingest/bigquery): Add performance testing framework for bigquery usage #7690

asikowitz commented Mar 24, 2023

codecov-commenter commented Mar 24, 2023 •

edited

Loading

hsheth2 Mar 27, 2023

asikowitz Mar 28, 2023

hsheth2 Mar 28, 2023

asikowitz Mar 28, 2023

hsheth2 Mar 28, 2023

asikowitz Mar 28, 2023

asikowitz commented Mar 28, 2023

treff7es Mar 28, 2023

asikowitz Mar 28, 2023

hsheth2 Mar 29, 2023 •

edited

Loading

		from datahub.utilities.perf_timer import PerfTimer


		def set_log_level(logger: logging.Logger, level: int) -> Generator[None, None, None]:

test(ingest/bigquery): Add performance testing framework for bigquery usage #7690

test(ingest/bigquery): Add performance testing framework for bigquery usage #7690

Conversation

asikowitz commented Mar 24, 2023

Checklist

codecov-commenter commented Mar 24, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asikowitz commented Mar 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hsheth2 Mar 29, 2023 • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Mar 24, 2023 •

edited

Loading

hsheth2 Mar 29, 2023 •

edited

Loading