-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest): add bigquery-queries source #10994
feat(ingest): add bigquery-queries source #10994
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe recent changes significantly enhance metadata ingestion processes for Google BigQuery and Snowflake. A new Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
505911c
to
9e1b62b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Outside diff range, codebase verification and nitpick comments (5)
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_config.py (1)
138-142
: Ensure email domain validation.The
email_domain
field is used in various parts of the codebase, but there is no explicit validation to ensure it is in a valid format. Please add validation logic to ensure that the email domain, if provided, is valid.
- Files to review:
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_config.py
- Other related configuration and usage files where
email_domain
is used.Analysis chain
Ensure email domain validation.
The
email_domain
field is optional and used for user display purposes. Ensure that the domain is valid if provided.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that email domains are valid if provided. # Test: Search for email domain usage. Expect: Valid domain format. rg --type python -A 5 $'email_domain'Length of output: 67
Script:
#!/bin/bash # Description: Verify that email domains are valid if provided. # Test: Search for email domain usage. Expect: Valid domain format. rg --type py -A 5 $'email_domain'Length of output: 22863
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py (3)
61-65
: Deprecation warning forsharded_table_pattern
.The
sharded_table_pattern
field is deprecated. Ensure that users are aware of this and provide alternatives if necessary.- sharded_table_pattern: str = Field( - deprecated=True, - default=_BIGQUERY_DEFAULT_SHARDED_TABLE_REGEX, - description="The regex pattern to match sharded tables and group as one table. This is a very low level config parameter, only change if you know what you are doing, ", - ) + sharded_table_pattern: str = Field( + deprecated=True, + default=_BIGQUERY_DEFAULT_SHARDED_TABLE_REGEX, + description="The regex pattern to match sharded tables and group as one table. This is a very low level config parameter, only change if you know what you are doing. Please use the new configuration options provided.", + )
232-244
: Handle deprecatedschema_pattern
field.The
schema_pattern
field is deprecated in favor ofdataset_pattern
. Ensure that users are aware of this and provide alternatives if necessary.- if ( - dataset_pattern == AllowDenyPattern.allow_all() - and schema_pattern != AllowDenyPattern.allow_all() - ): - logging.warning( - "dataset_pattern is not set but schema_pattern is set, using schema_pattern as dataset_pattern. schema_pattern will be deprecated, please use dataset_pattern instead." - ) - values["dataset_pattern"] = schema_pattern - dataset_pattern = schema_pattern + if ( + dataset_pattern == AllowDenyPattern.allow_all() + and schema_pattern != AllowDenyPattern.allow_all() + ): + logging.warning( + "dataset_pattern is not set but schema_pattern is set, using schema_pattern as dataset_pattern. schema_pattern will be deprecated, please use dataset_pattern instead. Please update your configuration to use dataset_pattern." + ) + values["dataset_pattern"] = schema_pattern + dataset_pattern = schema_pattern
140-145
: Ensure secure handling of temporary credential files.The
create_credential_temp_file
method creates a temporary file for credentials, but there is no evidence of secure deletion of these files after use. Please ensure that the temporary credential files are securely deleted after they are no longer needed.
- File:
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py
- Lines: 140-145
Analysis chain
Ensure secure handling of temporary credential files.
The
create_credential_temp_file
method creates a temporary file for credentials. Ensure that the file is securely deleted after use.
Let's correct the
rg
command to properly search for Python files and check for secure deletion of temporary files created by thecreate_credential_temp_file
method.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify secure handling of temporary credential files. # Test: Search for secure deletion of temporary credential files. Expect: Secure deletion after use. rg --type python -A 5 $'create_credential_temp_file'Length of output: 82
Script:
#!/bin/bash # Description: Verify secure handling of temporary credential files. # Test: Search for secure deletion of temporary credential files. Expect: Secure deletion after use. rg -t py -A 5 'create_credential_temp_file'Length of output: 1545
metadata-ingestion/setup.py (1)
662-662
: Potential Issue: Unused Entry PointThe
bigquery-queries
entry point is defined insetup.py
and theBigQueryQueriesSource
class is implemented inbigquery_queries.py
, but there are no instances of this class being used elsewhere in the codebase.
metadata-ingestion/setup.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_queries.py
Please review and ensure that the
BigQueryQueriesSource
class is utilized correctly within the codebase.Analysis chain
LGTM! But verify the usage of the entry point in the codebase.
The addition of the
bigquery-queries
entry point is approved.However, ensure that the entry point is used correctly in the codebase.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the `bigquery-queries` entry point. # Test: Search for the entry point usage. Expect: Correct usage of the added entry point. rg --type python -A 5 $'bigquery-queries'Length of output: 71
Script:
#!/bin/bash # Description: Verify the usage of the `bigquery-queries` entry point. # Test: Search for the entry point usage. Expect: Correct usage of the added entry point. rg --type py -A 5 'bigquery-queries'Length of output: 1223
Script:
#!/bin/bash # Description: Verify the usage of the BigQueryQueriesSource class in the codebase. # Test: Search for the usage of BigQueryQueriesSource. Expect: Correct usage of the added entry point. rg --type py -A 5 'BigQueryQueriesSource'Length of output: 4378
Script:
#!/bin/bash # Description: Verify the usage of the BigQueryQueriesSource class in the codebase. # Test: Search for the usage of BigQueryQueriesSource. Expect: Correct usage of the added entry point. rg --type py -A 5 'BigQueryQueriesSource' | grep -v 'bigquery_queries.py'Length of output: 754
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (22)
- metadata-ingestion/setup.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery.py (9 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_audit.py (5 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py (8 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_queries.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_report.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_schema.py (4 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_schema_gen.py (16 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_test_connection.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/common.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/lineage.py (4 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/usage.py (7 hunks)
- metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_config.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_queries.py (1 hunks)
- metadata-ingestion/src/datahub/sql_parsing/sql_parsing_aggregator.py (2 hunks)
- metadata-ingestion/tests/integration/fivetran/test_fivetran.py (1 hunks)
- metadata-ingestion/tests/performance/bigquery/test_bigquery_usage.py (2 hunks)
- metadata-ingestion/tests/unit/test_bigquery_lineage.py (3 hunks)
- metadata-ingestion/tests/unit/test_bigquery_source.py (11 hunks)
- metadata-ingestion/tests/unit/test_bigquery_usage.py (21 hunks)
- metadata-ingestion/tests/unit/test_bigqueryv2_usage_source.py (2 hunks)
Files skipped from review due to trivial changes (1)
- metadata-ingestion/tests/integration/fivetran/test_fivetran.py
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py
72-74: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
Gitleaks
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py
109-109: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.
(private-key)
Additional comments not posted (82)
metadata-ingestion/tests/performance/bigquery/test_bigquery_usage.py (2)
14-14
: ImportBigQueryIdentifierBuilder
The import of
BigQueryIdentifierBuilder
is necessary for the changes made in therun_test
function.
52-52
: UtilizeBigQueryIdentifierBuilder
for identifier generationThe
usage_extractor
now usesBigQueryIdentifierBuilder
for generating identifiers, which improves the clarity and maintainability of the code.metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_queries.py (7)
1-7
: New importsThe imports are necessary for the new classes and methods introduced in this file.
34-40
: DefineBigQueryQueriesSourceReport
classThe
BigQueryQueriesSourceReport
class extendsSourceReport
and includes additional fields specific to BigQuery queries.
43-48
: DefineBigQueryQueriesSourceConfig
classThe
BigQueryQueriesSourceConfig
class extends multiple configuration classes and includes a connection configuration field.
51-72
: DefineBigQueryQueriesSource
classThe
BigQueryQueriesSource
class initializes various components, including theBigQueryQueriesExtractor
, and sets up the configuration and connection.
73-76
: Implementcreate
methodThe
create
method parses the configuration dictionary and returns an instance ofBigQueryQueriesSource
.
78-83
: Implementget_workunits_internal
methodThe
get_workunits_internal
method retrieves work units from thequeries_extractor
.
85-86
: Implementget_report
methodThe
get_report
method returns the report generated during the ingestion process.metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/common.py (10)
1-2
: New importsThe imports are necessary for the new classes and methods introduced in this file.
30-42
: DefineBigQueryIdentifierBuilder
classThe
BigQueryIdentifierBuilder
class encapsulates methods for generating various URNs related to BigQuery datasets and users.
43-56
: Implementgen_dataset_urn
methodThe
gen_dataset_urn
method generates a dataset URN based on the provided parameters.
57-63
: Implementgen_dataset_urn_from_raw_ref
methodThe
gen_dataset_urn_from_raw_ref
method generates a dataset URN from a raw reference.
65-67
: Implementgen_user_urn
methodThe
gen_user_urn
method generates a user URN based on the provided email.
68-70
: Implementmake_data_platform_urn
methodThe
make_data_platform_urn
method generates a data platform URN.
71-76
: Implementmake_dataplatform_instance_urn
methodThe
make_dataplatform_instance_urn
method generates a data platform instance URN based on the project ID and configuration.
79-85
: DefineBigQueryFilter
classThe
BigQueryFilter
class handles filtering logic for tables based on the provided configuration.
86-98
: Implementis_allowed
methodThe
is_allowed
method determines if a table is allowed based on project IDs and dataset patterns.
100-103
: Implementis_project_allowed
methodThe
is_project_allowed
method determines if a project is allowed based on the provided configuration.metadata-ingestion/tests/unit/test_bigquery_lineage.py (2)
111-111
: LGTM! Verify the usage ofBigQueryIdentifierBuilder
.The change to use
BigQueryIdentifierBuilder
enhances readability and maintainability.Ensure that
BigQueryIdentifierBuilder
is correctly utilized across the codebase.
12-12
: LGTM! Verify the usage ofBigQueryIdentifierBuilder
.The change to use
BigQueryIdentifierBuilder
enhances readability and maintainability.Ensure that
BigQueryIdentifierBuilder
is correctly utilized across the codebase.Also applies to: 86-86
Verification successful
Verified: Consistent usage of
BigQueryIdentifierBuilder
The
BigQueryIdentifierBuilder
is used consistently across multiple files in the codebase, indicating that the change is well-integrated and should not cause any issues.
metadata-ingestion/tests/unit/test_bigquery_usage.py
metadata-ingestion/tests/unit/test_bigqueryv2_usage_source.py
metadata-ingestion/tests/unit/test_bigquery_lineage.py
metadata-ingestion/tests/performance/bigquery/test_bigquery_usage.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/common.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/usage.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/lineage.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_schema_gen.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_test_connection.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_queries.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of `BigQueryIdentifierBuilder` across the codebase. # Test: Search for the usage of `BigQueryIdentifierBuilder`. Expect: Consistent usage across the codebase. rg --type python -A 5 $'BigQueryIdentifierBuilder'Length of output: 80
Script:
#!/bin/bash # Description: Verify the usage of `BigQueryIdentifierBuilder` across the codebase. # Test: Search for the usage of `BigQueryIdentifierBuilder`. Expect: Consistent usage across the codebase. rg --type py -A 5 $'BigQueryIdentifierBuilder'Length of output: 18757
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_report.py (2)
34-35
: LGTM!The renaming of performance timers enhances readability and understanding of their purpose.
173-174
: LGTM!The addition of the
sql_aggregator
attribute enhances the reporting features related to SQL data ingestion.metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_test_connection.py (2)
15-15
: LGTM!The change to use
BigQueryIdentifierBuilder
enhances the identifier handling mechanism.Also applies to: 138-138
162-162
: LGTM!The change to use
BigQueryIdentifierBuilder
enhances the identifier handling mechanism.metadata-ingestion/tests/unit/test_bigqueryv2_usage_source.py (1)
121-126
: Improved readability and structure.The changes improve the readability and structure by defining the
report
variable separately before being passed as an argument. This ensures that both the extractor and the identifier builder share the same report instance.metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery.py (2)
123-124
: Improved modularity and separation of concerns.The changes improve modularity and separation of concerns by instantiating
BigQueryFilter
andBigQueryIdentifierBuilder
in the constructor.
234-238
: Improved efficiency and clarity in project retrieval.The changes improve efficiency and clarity by using the new
get_projects
function, which integrates filtering logic into the project retrieval process.metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py (7)
1-3
: Ensure proper import ordering and usage.The imports are correctly ordered and necessary for the functionality.
78-88
: Ensure proper configuration handling.The
BigQueryQueriesExtractorConfig
class handles various configurations, including time windows and local paths for audit logs. Ensure that all configurations are properly documented and validated.
110-122
: Well-structured class initialization.The
BigQueryQueriesExtractor
class initialization is well-structured, ensuring all necessary components are properly instantiated.
186-220
: Efficient work unit generation.The
get_workunits_internal
method efficiently generates metadata work units by leveraging cached audit logs and fetching query logs as needed.
221-260
: Robust query log fetching.The
fetch_query_log
method robustly fetches query logs from BigQuery, handling errors and logging progress.
261-282
: Accurate audit log row parsing.The
_parse_audit_log_row
method accurately parses audit log rows intoObservedQuery
orPreparsedQuery
objects, ensuring proper handling of timestamps and user information.
285-323
: Comprehensive query log query building.The
_build_enriched_query_log_query
function comprehensively builds query log queries, filtering by creation time and excluding information schema queries.metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_config.py (1)
143-145
: Clarify the behavior ofapply_view_usage_to_tables
.The
apply_view_usage_to_tables
field determines whether usage statistics from views should be applied to their base tables. Clarify if this behavior is consistent across different use cases.Verification successful
Consistent behavior of
apply_view_usage_to_tables
confirmed.The
apply_view_usage_to_tables
field is used across different configurations and tests, indicating consistent behavior. It determines whether usage statistics from views should be applied to their base tables, and this behavior is documented and tested in both Snowflake and BigQuery configurations.
- Files and lines:
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_config.py
: Lines 143-145metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py
: Similar configuration for BigQuery- Tests:
metadata-ingestion/tests/unit/test_bigquery_usage.py
metadata-ingestion/tests/performance/bigquery/test_bigquery_usage.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/usage.py
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_usage_v2.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the behavior of `apply_view_usage_to_tables`. # Test: Search for the usage of `apply_view_usage_to_tables`. Expect: Consistent behavior across use cases. rg --type python -A 5 $'apply_view_usage_to_tables'Length of output: 81
Script:
#!/bin/bash # Description: Verify the behavior of `apply_view_usage_to_tables`. # Test: Search for the usage of `apply_view_usage_to_tables`. Expect: Consistent behavior across use cases. rg --type-add 'py:*.py' -A 5 'apply_view_usage_to_tables'Length of output: 4777
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_queries.py (1)
222-224
: Enhance logging for better observability.The logging statement provides progress updates every 1000 queries. Ensure that this granularity is appropriate for the expected query volume.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py (4)
47-50
: Ensure rate limiting is handled correctly.The
rate_limit
field controls whether API requests should be rate-limited. Ensure that the rate limiting logic is implemented correctly.Verification successful
Let's correct the
rg
command and re-run the script to search for therate_limit
logic in Python files.
Rate limiting logic is correctly implemented.
The
rate_limit
field inBigQueryBaseConfig
is used to control API request rates, and the logic for rate limiting is correctly implemented using theRateLimiter
class across various files.
bigquery_audit_log_api.py
bigquery_schema_gen.py
bigquery_schema.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify rate limiting logic. # Test: Search for rate limiting logic. Expect: Correct implementation of rate limiting. rg --type python -A 5 $'rate_limit'Length of output: 65
Script:
#!/bin/bash # Description: Verify rate limiting logic. # Test: Search for rate limiting logic. Expect: Correct implementation of rate limiting. rg --type-add 'python:*.py' -A 5 'rate_limit'Length of output: 17093
202-210
: Ensure proper filtering of project IDs.The
project_ids
field specifies project IDs to be ingested. Ensure that the filtering logic is correctly implemented.Verification successful
Verified proper filtering of project IDs.
The
project_ids
field is correctly integrated into the filtering logic across multiple components, including unit tests and schema generation.
- Unit Tests:
test_bigquery_source.py
- Schema Generation:
bigquery_schema_gen.py
- Common Filtering Logic:
common.py
- Configuration Handling:
bigquery_config.py
- Connection Testing:
bigquery_test_connection.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify filtering logic for project IDs. # Test: Search for filtering logic for project IDs. Expect: Correct implementation of filtering logic. rg --type python -A 5 $'project_ids'Length of output: 66
Script:
#!/bin/bash # Description: Verify filtering logic for project IDs. # Test: Search for filtering logic for project IDs. Expect: Correct implementation of filtering logic. rg --type py -A 5 $'project_ids'Length of output: 16739
303-304
: Ensure proper validation ofbigquery_audit_metadata_datasets
.The
validate_bigquery_audit_metadata_datasets
method validates thebigquery_audit_metadata_datasets
field. Ensure that the validation logic is correctly implemented.Verification successful
Validation logic for
bigquery_audit_metadata_datasets
is correctly implemented.The
validate_bigquery_audit_metadata_datasets
method ensures that whenuse_exported_bigquery_audit_metadata
is set toTrue
, thebigquery_audit_metadata_datasets
must be a non-empty list. This validation logic is correctly implemented.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py
:
- Lines: 303-304
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify validation logic for `bigquery_audit_metadata_datasets`. # Test: Search for validation logic for `bigquery_audit_metadata_datasets`. Expect: Correct implementation of validation logic. rg --type python -A 5 $'validate_bigquery_audit_metadata_datasets'Length of output: 96
Script:
#!/bin/bash # Description: Verify validation logic for `bigquery_audit_metadata_datasets`. # Test: Search for validation logic for `bigquery_audit_metadata_datasets`. Expect: Correct implementation of validation logic. rg --type py -A 5 'def validate_bigquery_audit_metadata_datasets'Length of output: 789
Script:
#!/bin/bash # Description: Retrieve the complete implementation of the `validate_bigquery_audit_metadata_datasets` method. # Test: Search for the complete implementation of `validate_bigquery_audit_metadata_datasets`. rg --type py -A 20 'def validate_bigquery_audit_metadata_datasets' metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.pyLength of output: 757
303-304
: Consolidate filtering capabilities.The
BigQueryV2Config
class now inherits fromBigQueryFilterConfig
, consolidating filtering capabilities. Ensure that the new filtering logic is correctly implemented.metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_audit.py (3)
379-398
: LGTM! But verify the usage of the method in the codebase.The code changes improve readability and are approved.
However, ensure that the method
from_exported_bigquery_audit_metadata
is used correctly in the codebase.Verification successful
The usage of the
from_exported_bigquery_audit_metadata
method has been verified in the codebase.The method is used in the following files:
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/usage.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/lineage.py
The changes improve readability and are consistent with the updated method implementation.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the `from_exported_bigquery_audit_metadata` method. # Test: Search for the method usage. Expect: Correct usage of the updated method. rg --type python -A 5 $'from_exported_bigquery_audit_metadata'Length of output: 92
Script:
#!/bin/bash # Description: Verify the usage of the `from_exported_bigquery_audit_metadata` method. # Test: Search for the method usage. Expect: Correct usage of the updated method. rg --type py -A 5 $'from_exported_bigquery_audit_metadata'Length of output: 3598
289-309
: LGTM! But verify the usage of the method in the codebase.The code changes improve readability and are approved.
However, ensure that the method
from_entry
is used correctly in the codebase.
456-475
: LGTM! But verify the usage of the method in the codebase.The code changes improve readability and are approved.
However, ensure that the method
from_entry_v2
is used correctly in the codebase.Verification successful
The usage of the
from_entry_v2
method is correct and consistent with the recent changes.The method is invoked appropriately in both
lineage.py
andusage.py
files, ensuring that the readability improvements do not impact functionality.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/lineage.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/usage.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the `from_entry_v2` method. # Test: Search for the method usage. Expect: Correct usage of the updated method. rg --type python -A 5 $'from_entry_v2'Length of output: 68
Script:
#!/bin/bash # Description: Verify the usage of the `from_entry_v2` method. # Test: Search for the method usage. Expect: Correct usage of the updated method. rg --type py -A 5 from_entry_v2Length of output: 1980
Script:
#!/bin/bash # Description: Extract relevant lines from `lineage.py` and `usage.py` to verify the usage of `from_entry_v2`. # Extract lines around the usage of `from_entry_v2` in `lineage.py` rg --type py -A 10 -B 5 'from_entry_v2' metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/lineage.py # Extract lines around the usage of `from_entry_v2` in `usage.py` rg --type py -A 10 -B 5 'from_entry_v2' metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/usage.pyLength of output: 1447
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_schema.py (2)
586-616
: LGTM! But verify the usage of the function in the codebase.The code changes improve error handling and robustness and are approved.
However, ensure that the function
query_project_list
is used correctly in the codebase.Verification successful
The function
query_project_list
is used correctly within the codebase.The usage aligns with the intended functionality and the changes made to the function.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_schema.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the `query_project_list` function. # Test: Search for the function usage. Expect: Correct usage of the updated function. rg --type python -A 5 $'query_project_list'Length of output: 73
Script:
#!/bin/bash # Description: Verify the usage of the `query_project_list` function. # Test: Search for the function usage. Expect: Correct usage of the updated function. rg --type py -A 5 $'query_project_list'Length of output: 1277
Script:
#!/bin/bash # Description: Extract surrounding lines of the `query_project_list` usage to verify correctness. # Test: Search for the function usage and extract additional context. rg --type py -A 10 -B 10 $'query_project_list' metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_schema.pyLength of output: 1304
618-636
: LGTM! But verify the usage of the function in the codebase.The code changes improve modularity and reusability and are approved.
However, ensure that the function
get_projects
is used correctly in the codebase.Verification successful
The
get_projects
function is used correctly in the codebase.The function is integrated and tested in multiple places, indicating its correct usage.
metadata-ingestion/tests/unit/test_bigquery_source.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_schema.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the `get_projects` function. # Test: Search for the function usage. Expect: Correct usage of the updated function. rg --type python -A 5 $'get_projects'Length of output: 67
Script:
#!/bin/bash # Description: Verify the usage of the `get_projects` function. # Test: Search for the function usage. Expect: Correct usage of the updated function. rg --type py -A 5 $'get_projects'Length of output: 15417
metadata-ingestion/setup.py (1)
354-354
: LGTM! But verify the usage of the dependency in the codebase.The addition of the
bigquery-queries
dependency is approved.However, ensure that the dependency is used correctly in the codebase.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/lineage.py (3)
231-236
: Verify the correctness of the newidentifiers
parameter.Ensure that the
BigQueryIdentifierBuilder
class is correctly instantiated and used throughout the class methods.Verification successful
The new
identifiers
parameter is correctly instantiated and used.The
BigQueryIdentifierBuilder
class is properly instantiated and utilized across various files, including tests and source files, confirming the correctness of the changes.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/lineage.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_queries.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/common.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the instantiation and usage of the `BigQueryIdentifierBuilder` class. # Test: Search for the instantiation of `BigQueryIdentifierBuilder`. Expect: Correct instantiation. rg --type python -A 5 $'BigQueryIdentifierBuilder'Length of output: 80
Script:
#!/bin/bash # Description: Verify the instantiation and usage of the `BigQueryIdentifierBuilder` class. # Test: Search for the instantiation of `BigQueryIdentifierBuilder`. Expect: Correct instantiation. rg 'BigQueryIdentifierBuilder' -t py -A 5Length of output: 18753
433-433
: Verify the correctness of the new URN generation method.Ensure that
self.identifiers.gen_dataset_urn_from_raw_ref
correctly replaces the old callable function.Verification successful
The new URN generation method is correctly integrated and used.
The method
self.identifiers.gen_dataset_urn_from_raw_ref
is consistently used across the codebase and correctly replaces the old callable function.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/lineage.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/usage.py
metadata-ingestion/tests/unit/test_bigquery_usage.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of `gen_dataset_urn_from_raw_ref` method. # Test: Search for the usage of `gen_dataset_urn_from_raw_ref`. Expect: Correct usage. rg --type python -A 5 $'gen_dataset_urn_from_raw_ref'Length of output: 83
Script:
#!/bin/bash # Description: Verify the usage of `gen_dataset_urn_from_raw_ref` method. # Test: Search for the usage of `gen_dataset_urn_from_raw_ref`. Expect: Correct usage. rg 'gen_dataset_urn_from_raw_ref' --glob '*.py' -A 5Length of output: 7930
876-878
: Verify the correctness of the new URN generation method.Ensure that
self.identifiers.gen_dataset_urn_from_raw_ref
correctly replaces the old callable function.Verification successful
Let's correct the file type and re-run the search for the
gen_dataset_urn_from_raw_ref
method usage.
To ensure the correctness of the new URN generation method, let's verify the definition and usage of the
gen_dataset_urn
method incommon.py
.
Verification successful for the new URN generation method.
The method
self.identifiers.gen_dataset_urn_from_raw_ref
correctly replaces the old callable function and is used appropriately across the codebase.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/lineage.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/common.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of `gen_dataset_urn_from_raw_ref` method. # Test: Search for the usage of `gen_dataset_urn_from_raw_ref`. Expect: Correct usage. rg --type python -A 5 $'gen_dataset_urn_from_raw_ref'Length of output: 83
Script:
#!/bin/bash # Description: Verify the usage of `gen_dataset_urn_from_raw_ref` method. # Test: Search for the usage of `gen_dataset_urn_from_raw_ref`. Expect: Correct usage. rg --type py -A 5 'gen_dataset_urn_from_raw_ref'Length of output: 7926
Script:
#!/bin/bash # Description: Verify the definition and usage of `gen_dataset_urn` method. # Test: Search for the definition and usage of `gen_dataset_urn`. Expect: Correct definition and usage. rg --type py -A 5 'def gen_dataset_urn'Length of output: 4331
metadata-ingestion/tests/unit/test_bigquery_usage.py (4)
171-171
: Verify the correctness of the new URN generation method.Ensure that
identifiers.gen_dataset_urn_from_raw_ref
is correctly used to generate URNs.Verification successful
The
gen_dataset_urn_from_raw_ref
method is defined and used in multiple places within the codebase. Here are the key points of interest:
Definition:
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/common.py: def gen_dataset_urn_from_raw_ref(self, ref: BigQueryTableRef) -> str:
- This method generates a dataset URN from a raw reference.
Usages:
metadata-ingestion/tests/unit/test_bigquery_usage.py
(multiple lines)metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/lineage.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/usage.py
The method appears to be correctly defined and used across the codebase. Based on the gathered information, the usage of
gen_dataset_urn_from_raw_ref
intest_bigquery_usage.py
seems consistent with its intended purpose.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of `gen_dataset_urn_from_raw_ref` method in `make_usage_workunit`. # Test: Search for the usage of `gen_dataset_urn_from_raw_ref`. Expect: Correct usage. rg --type python -A 5 $'gen_dataset_urn_from_raw_ref'Length of output: 83
Script:
#!/bin/bash # Description: Verify the usage of `gen_dataset_urn_from_raw_ref` method in `make_usage_workunit`. # Test: Search for the usage of `gen_dataset_urn_from_raw_ref`. Expect: Correct usage. rg --type py -A 5 'gen_dataset_urn_from_raw_ref'Length of output: 7926
178-178
: Ensure the newresource_urn
parameter is correctly used.Verify that the
resource_urn
parameter is correctly passed and used in the function.Verification successful
The
resource_urn
parameter is correctly used in themake_operational_workunit
function.
- The
resource_urn
is correctly passed as theentityUrn
in theMetadataChangeProposalWrapper
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the `resource_urn` parameter in `make_operational_workunit`. # Test: Search for the usage of `resource_urn` in `make_operational_workunit`. Expect: Correct usage. rg --type python -A 5 $'def make_operational_workunit'Length of output: 86
Script:
#!/bin/bash # Description: Verify the usage of the `resource_urn` parameter in `make_operational_workunit`. # Test: Search for the usage of `resource_urn` in `make_operational_workunit`. Expect: Correct usage. rg --type py -A 5 $'def make_operational_workunit'Length of output: 594
214-217
: Ensure the newidentifiers
parameter is correctly used.Verify that the
identifiers
parameter is correctly passed and used in the function.Verification successful
The new
identifiers
parameter is correctly used in themake_zero_usage_workunit
function.
- The
identifiers
parameter is passed to themake_usage_workunit
function, ensuring proper usage.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the `identifiers` parameter in `make_zero_usage_workunit`. # Test: Search for the usage of `identifiers` in `make_zero_usage_workunit`. Expect: Correct usage. rg --type python -A 5 $'def make_zero_usage_workunit'Length of output: 85
Script:
#!/bin/bash # Description: Verify the usage of the `identifiers` parameter in `make_zero_usage_workunit`. # Test: Search for the usage of `identifiers` in `make_zero_usage_workunit`. Expect: Correct usage. rg --type py -A 5 $'def make_zero_usage_workunit'Length of output: 561
Script:
#!/bin/bash # Description: Verify the usage of the `identifiers` parameter in `make_zero_usage_workunit`. # Test: Search for the usage of `identifiers` in `make_zero_usage_workunit` and capture more lines of context. rg --type py -A 50 $'def make_zero_usage_workunit'Length of output: 4302
165-167
: Ensure the newidentifiers
parameter is correctly used.Verify that the
identifiers
parameter is correctly passed and used in the function.Verification successful
The new
identifiers
parameter is correctly used.The
identifiers
parameter is utilized to generate theentityUrn
within themake_usage_workunit
function, ensuring correct and consistent behavior.
metadata-ingestion/tests/unit/test_bigquery_usage.py
, lines 165-167Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the `identifiers` parameter in `make_usage_workunit`. # Test: Search for the usage of `identifiers` in `make_usage_workunit`. Expect: Correct usage. rg --type python -A 5 $'def make_usage_workunit'Length of output: 80
Script:
#!/bin/bash # Description: Verify the usage of the `identifiers` parameter in `make_usage_workunit`. # Test: Search for the usage of `identifiers` in `make_usage_workunit`. Expect: Correct usage. rg --type py -A 5 $'def make_usage_workunit'Length of output: 1853
Script:
#!/bin/bash # Description: Verify the usage of the `identifiers` parameter within the `make_usage_workunit` function. # Test: Search for the usage of `identifiers` within the `make_usage_workunit` function in `metadata-ingestion/tests/unit/test_bigquery_usage.py`. rg --type py -A 20 'def make_usage_workunit' metadata-ingestion/tests/unit/test_bigquery_usage.pyLength of output: 850
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/usage.py (6)
409-411
: Verify the correctness of the new URN generation method.Ensure that
self.identifiers.gen_dataset_urn_from_raw_ref
correctly replaces the old callable function.Verification successful
The new URN generation method
gen_dataset_urn_from_raw_ref
is correctly used and consistent across the codebase.The method is defined in
common.py
and is used appropriately inusage.py
and other relevant files.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/common.py
: Method definition.metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/usage.py
: Multiple usages.metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/lineage.py
: Multiple usages.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of `gen_dataset_urn_from_raw_ref` method. # Test: Search for the usage of `gen_dataset_urn_from_raw_ref`. Expect: Correct usage. rg --type python -A 5 $'gen_dataset_urn_from_raw_ref'Length of output: 83
Script:
#!/bin/bash # Description: Verify the usage of `gen_dataset_urn_from_raw_ref` method. # Test: Search for the usage of `gen_dataset_urn_from_raw_ref`. Expect: Correct usage. rg --type py -A 5 $'gen_dataset_urn_from_raw_ref'Length of output: 7926
542-542
: Verify the correctness of the new URN generation method.Ensure that
self.identifiers.gen_dataset_urn_from_raw_ref
correctly replaces the old callable function.Verification successful
Verify the correctness of the new URN generation method.
The method
self.identifiers.gen_dataset_urn_from_raw_ref
is used correctly and consistently across the codebase, confirming that it correctly replaces the old callable function.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/usage.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/lineage.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/common.py
metadata-ingestion/tests/unit/test_bigquery_usage.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of `gen_dataset_urn_from_raw_ref` method. # Test: Search for the usage of `gen_dataset_urn_from_raw_ref`. Expect: Correct usage. rg --type python -A 5 $'gen_dataset_urn_from_raw_ref'Length of output: 83
Script:
#!/bin/bash # Description: Verify the usage of `gen_dataset_urn_from_raw_ref` method. # Test: Search for the usage of `gen_dataset_urn_from_raw_ref`. Expect: Correct usage. rg -A 5 'gen_dataset_urn_from_raw_ref'Length of output: 7916
Line range hint
1117-1117
: Verify the correctness of the new method usage.Ensure that
_get_parsed_bigquery_log_events
correctly replaces the oldget_exported_log_entries
method.Verification successful
Verification successful: The
_get_parsed_bigquery_log_events
method is used correctly in themetadata-ingestion/src/datahub/ingestion/source/bigquery_v2/usage.py
file, replacing the oldget_exported_log_entries
method. The method is defined and utilized properly within the context.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of `_get_parsed_bigquery_log_events` method. # Test: Search for the usage of `_get_parsed_bigquery_log_events`. Expect: Correct usage. rg --type python -A 5 $'_get_parsed_bigquery_log_events'Length of output: 86
Script:
#!/bin/bash # Description: Verify the usage of `_get_parsed_bigquery_log_events` method. # Test: Search for the usage of `_get_parsed_bigquery_log_events`. Expect: Correct usage. rg -A 5 '_get_parsed_bigquery_log_events'Length of output: 2177
718-719
: Verify the correctness of the new URN generation method.Ensure that
self.identifiers.gen_dataset_urn_from_raw_ref
correctly replaces the old callable function.Verification successful
The method
gen_dataset_urn_from_raw_ref
is correctly replacing the old callable function.The method is defined in
bigquery_v2/common.py
and is used consistently across the codebase, including in unit tests, indicating proper integration.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/common.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/usage.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/lineage.py
metadata-ingestion/tests/unit/test_bigquery_usage.py
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of `gen_dataset_urn_from_raw_ref` method. # Test: Search for the usage of `gen_dataset_urn_from_raw_ref`. Expect: Correct usage. rg --type python -A 5 $'gen_dataset_urn_from_raw_ref'Length of output: 83
Script:
#!/bin/bash # Description: Verify the usage of `gen_dataset_urn_from_raw_ref` method. # Test: Search for the usage of `gen_dataset_urn_from_raw_ref`. Expect: Correct usage. rg -A 5 $'gen_dataset_urn_from_raw_ref'Length of output: 7916
318-324
: Verify the correctness of the newidentifiers
parameter.Ensure that the
BigQueryIdentifierBuilder
class is correctly instantiated and used throughout the class methods.Verification successful
The new
identifiers
parameter is correctly instantiated and used throughout the class methods.The
BigQueryIdentifierBuilder
class is consistently instantiated and utilized across various files in the codebase.
- Files:
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_test_connection.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/usage.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/common.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_schema_gen.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/lineage.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_queries.py
- Various test files
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the instantiation and usage of the `BigQueryIdentifierBuilder` class. # Test: Search for the instantiation of `BigQueryIdentifierBuilder`. Expect: Correct instantiation. rg --type python -A 5 $'BigQueryIdentifierBuilder'Length of output: 80
Script:
#!/bin/bash # Description: Verify the instantiation and usage of the `BigQueryIdentifierBuilder` class. # Test: Search for the instantiation of `BigQueryIdentifierBuilder`. Expect: Correct instantiation. rg --type py -A 5 'BigQueryIdentifierBuilder'Length of output: 18757
724-724
: Verify the correctness of the new user URN generation method.Ensure that
self.identifiers.gen_user_urn
correctly replaces the oldmake_user_urn
function.Verification successful
Verified the correctness of the new user URN generation method.
The
gen_user_urn
methods in bothunity/source.py
andbigquery_v2/common.py
correctly preprocess the input and call themake_user_urn
function to generate the user URN.
gen_user_urn
inmetadata-ingestion/src/datahub/ingestion/source/unity/source.py
gen_user_urn
inmetadata-ingestion/src/datahub/ingestion/source/bigquery_v2/common.py
Both methods appropriately replace the old
make_user_urn
function by adding necessary preprocessing steps before callingmake_user_urn
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of `gen_user_urn` method. # Test: Search for the usage of `gen_user_urn`. Expect: Correct usage. rg --type python -A 5 $'gen_user_urn'Length of output: 67
Script:
#!/bin/bash # Description: Find the implementation of `gen_user_urn` method in the codebase. # Search for the definition of `gen_user_urn` method ast-grep --lang python --pattern 'def gen_user_urn($_$) { $$$ }'Length of output: 518
Script:
#!/bin/bash # Description: Find the implementation of `make_user_urn` function in the codebase. # Search for the definition of `make_user_urn` function ast-grep --lang python --pattern 'def make_user_urn($_$) { $$$ }'Length of output: 520
Script:
#!/bin/bash # Description: Find the implementation of `gen_user_urn` method in the codebase. # Search for the definition of `gen_user_urn` method ast-grep --lang python --pattern 'def gen_user_urn($_$): $$$'Length of output: 63
Script:
#!/bin/bash # Description: Find the implementation of `make_user_urn` function in the codebase. # Search for the definition of `make_user_urn` function ast-grep --lang python --pattern 'def make_user_urn($_$): $$$'Length of output: 64
Script:
#!/bin/bash # Description: Find the implementation of `gen_user_urn` method in the codebase. # Search for the definition of `gen_user_urn` method rg 'def gen_user_urn' --type py -A 5Length of output: 1305
Script:
#!/bin/bash # Description: Find the implementation of `make_user_urn` function in the codebase. # Search for the definition of `make_user_urn` function rg 'def make_user_urn' --type py -A 5Length of output: 540
metadata-ingestion/tests/unit/test_bigquery_source.py (10)
183-187
: LGTM!The function correctly calls
get_projects
and asserts the expected output.
197-201
: LGTM!The function correctly calls
get_projects
and asserts the expected output for the overridden project ID pattern.
219-223
: LGTM!The function correctly calls
get_projects
and asserts the expected output.
236-239
: LGTM!The function correctly initializes
BigQueryV2Config
and asserts the expected output.
286-290
: LGTM!The function correctly calls
get_projects
and asserts the expected output.
322-326
: LGTM!The function correctly calls
get_projects
and asserts the expected output.
347-351
: LGTM!The function correctly calls
get_projects
and asserts the expected output.
366-370
: LGTM!The function correctly calls
get_projects
and asserts the expected output.
391-395
: LGTM!The function correctly calls
get_projects
and asserts the expected output.
411-415
: LGTM!The function correctly calls
get_projects
and asserts the expected output.metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_schema_gen.py (10)
160-168
: LGTM!The
__init__
method is correctly updated to useBigQueryIdentifierBuilder
, improving code clarity and maintainability.
204-205
: LGTM!The function correctly uses
BigQueryIdentifierBuilder
for generating URNs, improving code clarity and maintainability.
215-215
: LGTM!The function correctly uses
BigQueryIdentifierBuilder
for generating platform URNs, improving code clarity and maintainability.
223-223
: LGTM!The function correctly uses
BigQueryIdentifierBuilder
for generating platform URNs, improving code clarity and maintainability.
738-740
: LGTM!The function correctly uses
BigQueryIdentifierBuilder
for generating dataset URNs, improving code clarity and maintainability.
779-781
: LGTM!The function correctly uses
BigQueryIdentifierBuilder
for generating dataset URNs, improving code clarity and maintainability.
949-949
: LGTM!The function correctly uses
BigQueryIdentifierBuilder
for generating platform URNs, improving code clarity and maintainability.
Line range hint
994-1004
:
LGTM!The function correctly uses
BigQuerySchemaApi
for retrieving tables, improving code clarity and maintainability.
1023-1023
: LGTM!The function correctly uses
BigQuerySchemaApi
for listing tables, improving code clarity and maintainability.
280-281
: LGTM!The function correctly uses
BigQuerySchemaApi
for retrieving datasets, improving code clarity and maintainability.metadata-ingestion/src/datahub/sql_parsing/sql_parsing_aggregator.py (2)
86-86
: LGTM! The aliasObservedQuery
improves code clarity.The alias
ObservedQuery
forLoggedQuery
enhances readability and simplifies type annotations.
475-494
: LGTM! Theadd
method now supportsObservedQuery
.The modification to accept
ObservedQuery
and handle it by invokingadd_observed_query
is consistent with the existing pattern.Verify the correctness of
add_observed_query
handling.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_queries.py (2 hunks)
Files skipped from review due to trivial changes (1)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py
Files skipped from review as they are similar to previous changes (1)
- metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_queries.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Outside diff range, codebase verification and nitpick comments (1)
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py (1)
12-16
: Consider grouping related imports together.Group related imports for better readability.
- from datahub.configuration.time_window_config import ( - BaseTimeWindowConfig, - get_time_bucket, - ) + from datahub.configuration.time_window_config import BaseTimeWindowConfig, get_time_bucket
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py (1 hunks)
- metadata-ingestion/src/datahub/sql_parsing/sql_parsing_aggregator.py (5 hunks)
- metadata-ingestion/tests/unit/sql_parsing/test_sql_aggregator.py (1 hunks)
Additional comments not posted (20)
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py (14)
54-57
: LGTM!The
BigQueryTableReference
class definition looks good.
60-63
: LGTM!The
DMLJobStatistics
class definition looks good.
66-84
: LGTM!The
BigQueryJob
class definition looks good.
87-113
: LGTM!The
BigQueryQueriesExtractorConfig
class definition looks good.
116-126
: LGTM!The
BigQueryQueriesExtractorReport
class definition looks good.
128-174
: Ensure proper handling of missing configurations.The constructor initializes various attributes but does not handle potential missing configurations, which could lead to runtime errors.
Ensure that all required configurations are provided and handle missing configurations appropriately.
177-180
: LGTM!The
structured_report
property looks good.
192-199
: LGTM!The
is_temp_table
method looks good.
201-209
: LGTM!The
is_allowed_table
method looks good.
211-257
: Ensure proper handling of stale audit logs.The
get_workunits_internal
method mentions a TODO comment about checking if the cached audit log is stale. This should be addressed to avoid potential issues with stale data.Ensure that logic is added to check if the cached audit log is stale.
259-287
: LGTM!The
deduplicate_queries
method looks good.
289-324
: LGTM!The
fetch_query_log
method looks good.
326-346
: LGTM!The
_parse_audit_log_row
method looks good.
349-389
: LGTM!The
_build_enriched_query_log_query
function looks good.metadata-ingestion/tests/unit/sql_parsing/test_sql_aggregator.py (2)
Line range hint
1-13
: LGTM!The imports look good.
504-526
: LGTM!The
test_create_table_query_mcps
function looks good.metadata-ingestion/src/datahub/sql_parsing/sql_parsing_aggregator.py (4)
86-90
: LGTM!The
ObservedQuery
class definition looks good.
478-499
: LGTM!The
add
method implementation looks good.
Line range hint
642-686
: LGTM!The
add_observed_query
method implementation looks good.
1158-1160
: LGTM!The guard clause in the
_gen_lineage_for_downstream
method looks good.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py
Outdated
Show resolved
Hide resolved
cdfb94f
to
fa2db0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- metadata-ingestion/setup.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery.py (9 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_audit.py (5 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py (8 hunks)
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py
72-74: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
Gitleaks
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py
109-109: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.
(private-key)
Additional comments not posted (14)
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery.py (5)
123-124
: Initialization ofBigQueryFilter
andBigQueryIdentifierBuilder
looks good.The changes are consistent and correctly initialize the new objects.
234-238
: Call toget_projects
and handling of projects list looks good.The changes correctly call
get_projects
and handle the projects list as intended.
Line range hint
15-15
: Verify the removal ofgen_dataset_urn
.The method
gen_dataset_urn
has been removed. Ensure that this change is intentional and does not break any functionality.
Line range hint
15-15
: Verify the removal ofgen_dataset_urn_from_raw_ref
.The method
gen_dataset_urn_from_raw_ref
has been removed. Ensure that this change is intentional and does not break any functionality.
Line range hint
15-15
: Verify the removal of_get_projects
and_query_project_list
.The methods
_get_projects
and_query_project_list
have been removed. Ensure that this change is intentional and does not break any functionality.metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py (4)
47-75
: New fields and validator method inBigQueryBaseConfig
look good.The new fields are correctly defined and the validator method is correctly implemented.
Tools
Ruff
72-74: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
105-145
: New fields and methods inBigQueryCredential
look good.The new fields are correctly defined and the methods are correctly implemented.
Tools
Gitleaks
109-109: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.
(private-key)
202-281
: New fields and root validator method inBigQueryFilterConfig
look good.The new fields are correctly defined and the root validator method is correctly implemented.
Line range hint
299-372
: New fields and root validator method inBigQueryV2Config
look good.The new fields are correctly defined and the root validator method is correctly implemented.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_audit.py (3)
289-309
: Changes to parameter assignments infrom_entry
look good.The changes are correctly implemented and maintain the intended functionality.
379-398
: Changes to parameter assignments infrom_exported_bigquery_audit_metadata
look good.The changes are correctly implemented and maintain the intended functionality.
456-475
: Changes to parameter assignments infrom_entry_v2
look good.The changes are correctly implemented and maintain the intended functionality.
metadata-ingestion/setup.py (2)
354-354
: Approved: Addition ofbigquery-queries
plugin.The addition of the
bigquery-queries
plugin with the appropriate dependencies (sql_common
,bigquery_common
,sqlglot_lib
) is consistent with the goal of enhancing BigQuery query handling.
662-662
: Approved: Addition ofbigquery-queries
entry point.The addition of the
bigquery-queries
entry point, linking toBigQueryQueriesSource
, is consistent with the goal of enhancing BigQuery query handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py (1 hunks)
- metadata-ingestion/tests/unit/sql_parsing/aggregator_goldens/test_create_table_query_mcps.json (1 hunks)
- metadata-ingestion/tests/unit/sql_parsing/aggregator_goldens/test_lineage_via_temp_table_disordered_add.json (1 hunks)
- metadata-ingestion/tests/unit/sql_parsing/test_sql_aggregator.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py
Additional comments not posted (4)
metadata-ingestion/tests/unit/sql_parsing/aggregator_goldens/test_create_table_query_mcps.json (1)
1-21
: JSON structure is valid and well-formed.The JSON structure for the metadata change proposal (MCP) is correctly formatted and aligns with the expected schema for representing dataset operations. It includes necessary fields such as
entityType
,entityUrn
,changeType
,aspectName
, andaspect
.metadata-ingestion/tests/unit/sql_parsing/aggregator_goldens/test_lineage_via_temp_table_disordered_add.json (1)
1-78
: JSON structure is valid and well-formed.The JSON structure for the metadata change proposals (MCPs) is correctly formatted and aligns with the expected schema for representing lineage and query properties. It includes necessary fields such as
entityType
,entityUrn
,changeType
,aspectName
, andaspect
.metadata-ingestion/tests/unit/sql_parsing/test_sql_aggregator.py (2)
505-526
: New test functiontest_create_table_query_mcps
is well-structured.The test function validates the
SqlParsingAggregator
for processing a SQL query that creates a table in a BigQuery environment. It correctly initializes the aggregator, adds an observed query, and checks the generated metadata against a golden file. The test enhances coverage for create table operations.
530-559
: New test functiontest_lineage_via_temp_table_disordered_add
is well-structured.The test function validates the
SqlParsingAggregator
for handling lineage through temporary tables in a Redshift environment. It correctly sets up schema information, adds observed queries, and checks the generated metadata against a golden file. The test enhances coverage for lineage tracking through temporary tables.
- update filtering in bigquery-queries source to use allow deny patterns
otherwise exception is thrown during parsing.e.g. list index out of range
a569bca
to
d997bb6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (25)
- metadata-ingestion/setup.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery.py (9 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_audit.py (5 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py (8 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_queries.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_report.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_schema.py (4 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_schema_gen.py (16 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_test_connection.py (3 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/common.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/lineage.py (4 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py (1 hunks)
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/usage.py (7 hunks)
- metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_config.py (2 hunks)
- metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_queries.py (2 hunks)
- metadata-ingestion/src/datahub/sql_parsing/sql_parsing_aggregator.py (5 hunks)
- metadata-ingestion/tests/integration/fivetran/test_fivetran.py (1 hunks)
- metadata-ingestion/tests/performance/bigquery/test_bigquery_usage.py (2 hunks)
- metadata-ingestion/tests/unit/sql_parsing/aggregator_goldens/test_create_table_query_mcps.json (1 hunks)
- metadata-ingestion/tests/unit/sql_parsing/aggregator_goldens/test_lineage_via_temp_table_disordered_add.json (1 hunks)
- metadata-ingestion/tests/unit/sql_parsing/test_sql_aggregator.py (1 hunks)
- metadata-ingestion/tests/unit/test_bigquery_lineage.py (3 hunks)
- metadata-ingestion/tests/unit/test_bigquery_source.py (11 hunks)
- metadata-ingestion/tests/unit/test_bigquery_usage.py (21 hunks)
- metadata-ingestion/tests/unit/test_bigqueryv2_usage_source.py (2 hunks)
Files skipped from review due to trivial changes (2)
- metadata-ingestion/tests/integration/fivetran/test_fivetran.py
- metadata-ingestion/tests/unit/sql_parsing/aggregator_goldens/test_lineage_via_temp_table_disordered_add.json
Files skipped from review as they are similar to previous changes (5)
- metadata-ingestion/setup.py
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_audit.py
- metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py
- metadata-ingestion/tests/unit/sql_parsing/aggregator_goldens/test_create_table_query_mcps.json
- metadata-ingestion/tests/unit/sql_parsing/test_sql_aggregator.py
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py
72-74: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
Gitleaks
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py
109-109: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.
(private-key)
Additional comments not posted (55)
metadata-ingestion/tests/performance/bigquery/test_bigquery_usage.py (1)
52-52
: Refactor approved: Use ofBigQueryIdentifierBuilder
.The refactor enhances maintainability and readability by encapsulating identifier logic within
BigQueryIdentifierBuilder
.Ensure that the integration of
BigQueryIdentifierBuilder
is consistent and correct throughout the codebase.Verification successful
Integration of
BigQueryIdentifierBuilder
is consistent and correct.The
BigQueryIdentifierBuilder
is widely used across multiple files, including both test and source files, indicating a well-integrated refactor.
- Verified files include:
metadata-ingestion/tests/unit/test_bigquery_usage.py
metadata-ingestion/tests/unit/test_bigquery_lineage.py
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/usage.py
- And several others.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the integration of `BigQueryIdentifierBuilder` across the codebase. # Test: Search for the class usage. Expect: Consistent and correct instantiation and usage. rg --type python -A 5 $'BigQueryIdentifierBuilder'Length of output: 80
Script:
#!/bin/bash # Description: Verify the integration of `BigQueryIdentifierBuilder` across the codebase. # Test: Search for the class usage. Expect: Consistent and correct instantiation and usage. rg --glob '*.py' -A 5 'BigQueryIdentifierBuilder'Length of output: 18036
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_queries.py (2)
34-40
: Class structure approved:BigQueryQueriesSourceReport
.The class is well-structured for encapsulating query extraction and schema performance metrics.
51-55
: Class design approved:BigQueryQueriesSource
.The class effectively manages the extraction process with clear separation of concerns.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/common.py (2)
30-55
: Class approved:BigQueryIdentifierBuilder
.The class provides a comprehensive approach to URN generation, ensuring consistent identifier handling.
79-103
: Class approved:BigQueryFilter
.The filtering logic is robust, effectively utilizing patterns and configurations to enforce rules.
Ensure that all necessary filtering criteria are covered and correctly implemented.
metadata-ingestion/tests/unit/test_bigquery_lineage.py (2)
86-86
: Use ofBigQueryIdentifierBuilder
improves maintainability.The introduction of
BigQueryIdentifierBuilder
encapsulates URN generation, enhancing clarity and maintainability.
111-111
: Consistent use ofBigQueryIdentifierBuilder
enhances code clarity.The change aligns with the goal of encapsulating identifier logic within a dedicated class.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_report.py (2)
34-35
: Renaming attributes improves clarity.Renaming
list_projects
andlist_datasets
tolist_projects_timer
andlist_datasets_timer
clarifies their purpose as performance timers.
174-174
: Addition ofsql_aggregator
enhances reporting capabilities.The
sql_aggregator
attribute is a valuable addition for SQL query aggregation and reporting.metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_test_connection.py (2)
138-138
: Use ofBigQueryIdentifierBuilder
improves maintainability.The introduction of
BigQueryIdentifierBuilder
encapsulates identifier logic, enhancing clarity and maintainability.
162-162
: Consistent use ofBigQueryIdentifierBuilder
enhances code clarity.The change aligns with the goal of encapsulating identifier logic within a dedicated class.
metadata-ingestion/tests/unit/test_bigqueryv2_usage_source.py (1)
121-126
: Improved readability and functionality in test setup.The refactoring to reuse the
report
instance and update theidentifiers
parameter withBigQueryIdentifierBuilder
enhances the readability and functionality of the test setup.metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery.py (2)
123-124
: Centralized filtering and identification logic.The introduction of
BigQueryFilter
andBigQueryIdentifierBuilder
centralizes the logic for filtering and identification, enhancing code clarity and maintainability.
234-238
: Simplified project retrieval logic.The direct invocation of
get_projects
streamlines the project retrieval process, reducing redundancy and improving code clarity.metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_config.py (1)
138-146
: Enhanced configurability withSnowflakeUsageConfig
.The addition of
SnowflakeUsageConfig
with fieldsemail_domain
andapply_view_usage_to_tables
enhances configurability for Snowflake usage settings, allowing for more tailored tracking.metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_queries.py (2)
222-224
: Improved logging granularity inget_workunits_internal
.The logging now tracks the number of query log entries added to the SQL aggregator every 1000 entries, providing better visibility into the process.
280-281
: Modify logging condition infetch_query_log
.The logging now starts after the first row, reducing unnecessary log entries and focusing on the progress of subsequent rows.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py (6)
38-44
: New regex pattern for sharded tables.The
_BIGQUERY_DEFAULT_SHARDED_TABLE_REGEX
provides a pattern to identify sharded tables by checking for valid date suffixes. This improves the detection of sharded tables.
47-75
: Enhance exception handling insharded_table_pattern_is_a_valid_regexp
.Consider using
raise ... from err
to distinguish exceptions from errors in exception handling.Tools
Ruff
72-74: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
105-110
: Security concern: Identified a private key.Ensure that the
private_key
field is handled securely and not exposed in logs or error messages.Tools
Gitleaks
109-109: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.
(private-key)
202-231
: IntroduceBigQueryFilterConfig
for flexible filtering.The
BigQueryFilterConfig
class provides regex patterns for filtering projects, datasets, and table snapshots, enhancing flexibility in data ingestion configurations.
284-297
: AddBigQueryIdentifierConfig
for identifier management.The class introduces fields for managing data platform instances and legacy sharded table support, improving identifier configuration.
299-303
: UpdateBigQueryV2Config
to include new configurations.The
BigQueryV2Config
class now includesBigQueryFilterConfig
andBigQueryIdentifierConfig
, enhancing its configuration capabilities.metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_schema.py (2)
586-616
: Addquery_project_list
function for project retrieval.This function retrieves a list of projects with error handling and filtering based on project ID patterns. It enhances the robustness of project data retrieval.
618-636
: Introduceget_projects
function for simplified project access.The function provides a straightforward interface to obtain projects, either by specific IDs or by querying the project list, centralizing project retrieval logic.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/lineage.py (3)
231-236
: Constructor Update: UseBigQueryIdentifierBuilder
.The constructor now uses
BigQueryIdentifierBuilder
for generating identifiers. This change improves modularity and encapsulates identifier logic, enhancing maintainability.
433-433
: Useidentifiers
for URN generation.The line now uses
identifiers.gen_dataset_urn_from_raw_ref(table_ref)
to generate dataset URNs. This centralizes URN generation logic, improving consistency and readability.
876-878
: Useidentifiers
for upstream table URN generation.The use of
identifiers.gen_dataset_urn_from_raw_ref(upstream_table)
ensures consistent URN generation for upstream tables, aligning with the new identifier management approach.metadata-ingestion/tests/unit/test_bigquery_usage.py (7)
165-167
: Addidentifiers
parameter tomake_usage_workunit
.The function now requires
identifiers
to generate URNs, enhancing consistency with the new identifier management approach.
178-181
: Updatemake_operational_workunit
to useresource_urn
.The function now takes
resource_urn
directly, simplifying the interface and aligning with the new URN generation strategy.
214-217
: Addidentifiers
parameter tomake_zero_usage_workunit
.The function now includes
identifiers
for URN generation, ensuring consistency with the updated URN management approach.
209-209
: InstantiateBigQueryIdentifierBuilder
inusage_extractor
.The
usage_extractor
fixture now includesidentifiers
, aligning with the new URN generation strategy and ensuring consistent identifier management.
301-304
: Update test case to useidentifiers
.The test case now includes
identifiers
when callingmake_usage_workunit
, reflecting the updated function signature and ensuring proper URN generation.
385-385
: Update test cases to useidentifiers
.Test cases are updated to include
identifiers
when callingmake_usage_workunit
, ensuring consistency with the new function signature.Also applies to: 413-413, 445-445, 490-490, 511-511, 545-545, 636-636, 679-679, 729-729, 781-781, 811-811, 890-890, 1010-1010
1056-1060
: Useidentifiers
for URN generation in operational stats test.The test now uses
identifiers.gen_dataset_urn_from_raw_ref
to generate URNs, aligning with the new identifier management strategy.Also applies to: 1077-1081, 1088-1090
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/usage.py (4)
318-324
: Constructor Update: UseBigQueryIdentifierBuilder
.The constructor now uses
BigQueryIdentifierBuilder
, centralizing URN generation logic and improving maintainability.
409-411
: Useidentifiers
for URN generation in_get_workunits_internal
.The method now generates dataset URNs using
identifiers.gen_dataset_urn_from_raw_ref
, ensuring consistency and readability.
717-719
: Useidentifiers
for URN generation in_create_operation_workunit
.The method now uses
identifiers
for generating URNs for affected datasets and destination tables, aligning with the new identifier management strategy.Also applies to: 738-738
724-724
: Useidentifiers
for user URN generation.The method now uses
identifiers.gen_user_urn
for generating user URNs, centralizing identifier logic and improving consistency.metadata-ingestion/tests/unit/test_bigquery_source.py (8)
183-187
: LGTM!The test correctly verifies the behavior of
get_projects
withproject_ids
, ensuring no unnecessary API calls are made.
219-223
: LGTM!The test accurately checks the override behavior of
project_ids
overproject_id_pattern
.
236-239
: LGTM!The test correctly verifies the backward compatibility of
project_ids
withproject_id
.
286-290
: LGTM!The test correctly verifies the behavior of
get_projects
with a singleproject_id
.
322-326
: LGTM!The test correctly verifies the behavior of
get_projects
with a paginated list of projects.
347-351
: LGTM!The test accurately checks the filtering behavior of
get_projects
usingproject_id_pattern
.
366-370
: LGTM!The test correctly verifies the behavior of
get_projects
when no projects are returned.
391-395
: LGTM!The test correctly verifies the error handling behavior of
get_projects
during API call failures.metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_schema_gen.py (4)
160-168
: LGTM!The refactoring to use an
identifiers
object centralizes identifier generation, improving maintainability and readability.
204-205
: LGTM!The use of the
identifiers
object for URN generation aligns with the refactoring goals, maintaining functionality and improving clarity.
215-215
: LGTM!The method's use of the
identifiers
object for platform information is consistent with the refactoring goals.
223-223
: LGTM!The method's use of the
identifiers
object for platform information is consistent with the refactoring goals.metadata-ingestion/src/datahub/sql_parsing/sql_parsing_aggregator.py (4)
86-89
: LGTM! TheObservedQuery
class is well-defined.The class correctly extends
LoggedQuery
and introduces new attributes with appropriate defaults.
478-499
: LGTM! Theadd
method handlesObservedQuery
instances appropriately.The method correctly delegates processing to
add_observed_query
forObservedQuery
instances.
642-642
: LGTM! Theadd_observed_query
method is effectively optimized.The inclusion of
query_hash
for conditional assignment of the query fingerprint enhances the method's efficiency.
1158-1160
: LGTM! The guard clause in_gen_lineage_for_downstream
enhances robustness.The addition prevents unnecessary processing when there are no upstream aspects or fine-grained lineages.
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_queries.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looking pretty good
I would like to think about how we might effectively test this code. I suspect the local_temp_path
might come in handy
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/bigquery_config.py
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/snowflake/snowflake_config.py
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/bigquery_v2/queries_extractor.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/tests/unit/sql_parsing/aggregator_goldens/test_create_table_query_mcps.json
Show resolved
Hide resolved
metadata-ingestion/tests/unit/sql_parsing/test_sql_aggregator.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/sql_parsing/sql_parsing_aggregator.py
Outdated
Show resolved
Hide resolved
6d14175
to
4153ac5
Compare
metadata-ingestion/tests/integration/bigquery_v2/audit_log.sqlite
Outdated
Show resolved
Hide resolved
- address review comments - support adding extra_info for debugging with queries - fix usage issue, add unit test for sql aggregator usage
Uses queries from INFORMATION_SCHEMA.JOBS along with SqlParsingAggregator to generate "Query" entity and its aspects, Dataset's datasetUsageStatistics, lineage and operation aspects.
Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests