-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(Ingestor/Glue): Implement column upstream lineage between S3 and Glue #10895
fix(Ingestor/Glue): Implement column upstream lineage between S3 and Glue #10895
Conversation
WalkthroughThe recent changes enhance AWS Glue data sources in DataHub by introducing fine-grained column-level lineage extraction. A new configuration option, Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant GlueSource
participant GlueClient
participant S3
participant DataHubGraph
User->>GlueSource: Initiate ingestion with column lineage enabled
GlueSource->>GlueClient: Fetch database and table metadata
GlueClient-->>GlueSource: Return metadata
GlueSource->>S3: Fetch S3 lineage data for columns
S3-->>GlueSource: Return column lineage data
GlueSource->>DataHubGraph: Emit lineage and metadata to DataHub
DataHubGraph-->>GlueSource: Acknowledge receipt
GlueSource-->>User: Ingestion complete
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- metadata-ingestion/src/datahub/ingestion/source/aws/glue.py (7 hunks)
Additional context used
Ruff
metadata-ingestion/src/datahub/ingestion/source/aws/glue.py
27-27:
datahub.emitter.mce_builder
imported but unusedRemove unused import:
datahub.emitter.mce_builder
(F401)
Additional comments not posted (3)
metadata-ingestion/src/datahub/ingestion/source/aws/glue.py (3)
183-186
: LGTM! New fieldinclude_column_lineage
added toGlueSourceConfig
.The new field
include_column_lineage
has been added to enable column-level lineage extraction.
732-768
: LGTM! Updates toget_lineage_if_enabled
method.The updates to the
get_lineage_if_enabled
method include logic for extracting schema metadata and creating fine-grained lineage ifinclude_column_lineage
is enabled.
790-828
: LGTM! New methodget_fine_grained_lineages
added.The new method
get_fine_grained_lineages
creates fine-grained lineage between schema fields of Glue and S3 datasets.
55c38db
to
daabcfb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- metadata-ingestion/src/datahub/ingestion/source/aws/glue.py (6 hunks)
Files skipped from review as they are similar to previous changes (1)
- metadata-ingestion/src/datahub/ingestion/source/aws/glue.py
e687f8b
to
16d80dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (5)
- metadata-ingestion/src/datahub/ingestion/source/aws/glue.py (6 hunks)
- metadata-ingestion/tests/unit/glue/glue_mces_golden_table_column_lineage.json (1 hunks)
- metadata-ingestion/tests/unit/glue/glue_mces_golden_table_lineage.json (1 hunks)
- metadata-ingestion/tests/unit/test_glue_source.py (5 hunks)
- metadata-ingestion/tests/unit/test_glue_source_stubs.py (1 hunks)
Files not summarized due to errors (1)
- metadata-ingestion/tests/unit/glue/glue_mces_golden_table_lineage.json: Error: Message exceeds token limit
Files skipped from review as they are similar to previous changes (1)
- metadata-ingestion/src/datahub/ingestion/source/aws/glue.py
Additional comments not posted (29)
metadata-ingestion/tests/unit/glue/glue_mces_golden_table_column_lineage.json (1)
1-373
: LGTM! JSON structure is well-formed and consistent.The JSON data correctly represents the various aspects related to the Glue platform, including container properties, status, data platform instance, subtypes, dataset properties, schema metadata, ownership, upstream lineage, and fine-grained lineages.
metadata-ingestion/tests/unit/test_glue_source.py (6)
70-91
: Verify the integration of new parameters inglue_source
function.The function includes new parameters:
mock_datahub_graph
,emit_s3_lineage
,include_column_lineage
, andextract_transforms
. Ensure these parameters are correctly integrated and used within the function.
444-536
: Verify new test cases intest_glue_ingest_include_table_lineage
.The function includes a new test case for table lineage with different platform instances and MCE files. Ensure the test cases are correctly defined and cover all necessary scenarios.
543-643
: Verify new test functiontest_glue_ingest_include_column_lineage
.The function is a new test function for testing column lineage. Ensure the function is correctly defined and covers all necessary scenarios for column lineage.
Line range hint
291-303
:
Verify stateful ingestion tests intest_glue_stateful
.The function includes stateful ingestion tests for Glue. Ensure the tests are correctly defined and cover all necessary scenarios.
Line range hint
401-419
:
Verify delta schema ingestion tests intest_glue_with_delta_schema_ingest
.The function includes tests for ingesting delta schemas. Ensure the tests are correctly defined and cover all necessary scenarios.
Line range hint
421-439
:
Verify malformed delta schema ingestion tests intest_glue_with_malformed_delta_schema_ingest
.The function includes tests for ingesting malformed delta schemas. Ensure the tests are correctly defined and cover all necessary scenarios.
metadata-ingestion/tests/unit/test_glue_source_stubs.py (2)
883-901
: Ensure database parameters are accurate.The dictionary
get_databases_response_for_lineage
appears to be well-structured. Verify that each parameter, particularlyLocationUri
andParameters
, accurately reflects the intended database setup for lineage purposes.
903-973
: Ensure table parameters and lineage details are accurate.The list
tables_lineage_1
and the dictionaryget_tables_lineage_response_1
are well-structured. Verify that each parameter, particularlyColumns
,Location
, andParameters
, accurately reflects the intended table setup for lineage purposes. Consider adding more realistic comments and metadata to better simulate real-world scenarios.metadata-ingestion/tests/unit/glue/glue_mces_golden_table_lineage.json (20)
2-22
: FieldentityUrn
andaspectName
values are consistent and correct.The
entityUrn
andaspectName
fields match the expected values for acontainer
entity.
23-33
: FieldentityUrn
andaspectName
values are consistent and correct.The
entityUrn
andaspectName
fields match the expected values for acontainer
entity with astatus
aspect.
34-44
: FieldentityUrn
andaspectName
values are consistent and correct.The
entityUrn
andaspectName
fields match the expected values for acontainer
entity with adataPlatformInstance
aspect.
45-57
: FieldentityUrn
andaspectName
values are consistent and correct.The
entityUrn
andaspectName
fields match the expected values for acontainer
entity with asubTypes
aspect.
58-75
: FieldentityUrn
andaspectName
values are consistent and correct.The
entityUrn
andaspectName
fields match the expected values for acontainer
entity with various aspects.
76-86
: FieldentityUrn
andaspectName
values are consistent and correct.The
entityUrn
andaspectName
fields match the expected values for acontainer
entity with astatus
aspect.
87-97
: FieldentityUrn
andaspectName
values are consistent and correct.The
entityUrn
andaspectName
fields match the expected values for acontainer
entity with adataPlatformInstance
aspect.
98-110
: FieldentityUrn
andaspectName
values are consistent and correct.The
entityUrn
andaspectName
fields match the expected values for acontainer
entity with asubTypes
aspect.
111-366
: Fieldurn
and various aspect fields are consistent and correct.The
urn
and various aspect fields match the expected values for aDatasetSnapshot
entity.
367-380
: FieldentityUrn
andaspectName
values are consistent and correct.The
entityUrn
andaspectName
fields match the expected values for adataset
entity with asubTypes
aspect.
381-390
: FieldentityUrn
andaspectName
values are consistent and correct.The
entityUrn
andaspectName
fields match the expected values for adataset
entity with acontainer
aspect.
391-410
: FieldentityUrn
andaspectName
values are consistent and correct.The
entityUrn
andaspectName
fields match the expected values for adataset
entity with anupstreamLineage
aspect.
411-572
: Fieldurn
and various aspect fields are consistent and correct.The
urn
and various aspect fields match the expected values for aDatasetSnapshot
entity.
573-585
: FieldentityUrn
andaspectName
values are consistent and correct.The
entityUrn
andaspectName
fields match the expected values for adataset
entity with asubTypes
aspect.
586-595
: FieldentityUrn
andaspectName
values are consistent and correct.The
entityUrn
andaspectName
fields match the expected values for adataset
entity with acontainer
aspect.
596-615
: FieldentityUrn
andaspectName
values are consistent and correct.The
entityUrn
andaspectName
fields match the expected values for adataset
entity with anupstreamLineage
aspect.
616-778
: Fieldurn
and various aspect fields are consistent and correct.The
urn
and various aspect fields match the expected values for aDatasetSnapshot
entity.
779-790
: FieldentityUrn
andaspectName
values are consistent and correct.The
entityUrn
andaspectName
fields match the expected values for adataset
entity with asubTypes
aspect.
791-801
: FieldentityUrn
andaspectName
values are consistent and correct.The
entityUrn
andaspectName
fields match the expected values for adataset
entity with acontainer
aspect.
802-821
: FieldentityUrn
andaspectName
values are consistent and correct.The
entityUrn
andaspectName
fields match the expected values for adataset
entity with anupstreamLineage
aspect.
( | ||
f | ||
for f in schema_metadata_for_s3.fields | ||
if f.fieldPath.split(".")[-1] == field.fieldPath.split(".")[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work for nested fields?
For example:
a.b.c != a.c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an example fieldPath for the above:
unique_v2_field_paths = [
"[version=2.0].[type=union].[type=A].[type=string].f",
"[version=2.0].[type=union].[type=B].[type=string].f"
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @treff7es it will work only for top level as s3, Based on my understanding, S3 itself doesn’t handle nested levels directly.
here is output for nested level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true but you will connect these which you shouldn't if I'm not mistaken:
Let's say we have this column for an s3 file:
[version=2.0].[type=int].yr
and we have this column for the glue table:
[version=2.0].[type=union].[type=A].[type=string].yr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @treff7es
I have tested the scenario, and it seems that the lineage will only connect to the top-level fields. The lineage mapping does not work for nested column levels.
For eg
Glue Schema
The field path in Glue at the nested level is:
[version=2.0].[type=struct].[type=union].[type=struct0].union_type.[type=string].mapping
S3 Schema
The top-level field path in S3 is:
"fieldPath": "mapping"
Nested Field Details
{
"fieldPath": "union_type",
"nullable": false,
"type": {
"type": {
"com.linkedin.schema.RecordType": {}
}
},
"nativeDataType": "struct<int_value: int32, double_value: double, array_value: list<element: string>, struct_value: struct<employee_id: int32, mapping: string>>",
"recursive": false,
"isPartOfKey": false
}
The lineage is not correctly lineage the field "mapping" because there is no corresponding top-level field in Glue. As a result, the lineage mapping fails.
output of this
Thanks !!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @treff7es yes, issue is related to v2 vs. v1 field path only i ll use above to verify the lineage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @treff7es implemented the changes mentioned above
Output:
Let me know if anything is required
Thanks !!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It's much better and safer now. I left one small comment, and I think it will be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have fixed that, Thanks !!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, it looks good now
16d80dd
to
6c45be8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (5)
- metadata-ingestion/src/datahub/ingestion/source/aws/glue.py (7 hunks)
- metadata-ingestion/tests/unit/glue/glue_mces_golden_table_column_lineage.json (1 hunks)
- metadata-ingestion/tests/unit/glue/glue_mces_golden_table_lineage.json (1 hunks)
- metadata-ingestion/tests/unit/test_glue_source.py (5 hunks)
- metadata-ingestion/tests/unit/test_glue_source_stubs.py (1 hunks)
Files skipped from review as they are similar to previous changes (2)
- metadata-ingestion/src/datahub/ingestion/source/aws/glue.py
- metadata-ingestion/tests/unit/glue/glue_mces_golden_table_column_lineage.json
Additional comments not posted (37)
metadata-ingestion/tests/unit/test_glue_source.py (4)
443-536
: New test function added for table lineage inclusion.The new test function
test_glue_ingest_include_table_lineage
validates the inclusion of table lineage during the ingestion process.
540-643
: New test function added for column lineage inclusion.The new test function
test_glue_ingest_include_column_lineage
validates the inclusion of column lineage during the ingestion process.
565-618
: New helper function added for schema metadata.The new helper function
fake_schema_metadata
provides a mock implementation for schema metadata, which is useful for testing the column lineage feature.
70-91
: LGTM! But verify the function usage in the codebase.The code changes are approved.
However, ensure that all function calls to
glue_source
match the new signature.Verification successful
Verified: All function calls to
glue_source
match the new signature.The function calls in
metadata-ingestion/tests/unit/test_glue_source.py
have been updated to include the new parameters, ensuring consistency with the modified function signature.
metadata-ingestion/tests/unit/test_glue_source.py
: Verified multiple instances ofglue_source
calls with the updated parameters.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `glue_source` match the new signature. # Test: Search for the function usage. Expect: Only occurrences of the new signature. rg --type python -A 5 $'glue_source'Length of output: 66
Script:
#!/bin/bash # Description: Verify all function calls to `glue_source` match the new signature. # Test: Search for the function usage. Expect: Only occurrences of the new signature. rg --type py -A 5 $'glue_source'Length of output: 11046
Script:
#!/bin/bash # Description: Extract and display all `glue_source` function calls with additional context lines. rg --type py -A 10 -B 2 $'glue_source('Length of output: 115
Script:
#!/bin/bash # Description: Extract and display all `glue_source` function calls with additional context lines. rg --type py -A 10 -B 2 'glue_source\('Length of output: 6689
metadata-ingestion/tests/unit/test_glue_source_stubs.py (2)
883-901
: New mock response added for database lineage.The new mock response
get_databases_response_for_lineage
provides a mock implementation for database lineage, which is useful for testing the lineage feature.
903-973
: New mock response added for table lineage.The new mock response
get_tables_lineage_response_1
provides a mock implementation for table lineage, which is useful for testing the lineage feature.metadata-ingestion/tests/unit/glue/glue_mces_golden_table_lineage.json (31)
7-21
: Check for completeness of custom properties.Ensure that all necessary custom properties are included and correctly populated.
27-32
: Ensure correct status aspect.The status aspect indicates that the container is not removed. Verify that this status is correct.
38-43
: Ensure correct data platform instance.The data platform instance is set to
urn:li:dataPlatform:glue
. Verify that this is correct.
3-21
: Ensure consistent URN usage.The URN
urn:li:container:0b9f1f731ecf6743be6207fec3dc9cba
is used consistently across multiple aspects. Verify that this URN is correct and consistent with other parts of the project.Verification successful
URN usage is consistent across the project.
The URN
urn:li:container:0b9f1f731ecf6743be6207fec3dc9cba
is used consistently across multiple files and aspects within the project.
metadata-ingestion/tests/unit/glue/glue_mces_golden_table_lineage.json
metadata-ingestion/tests/unit/glue/glue_deleted_actor_mces_golden.json
metadata-ingestion/tests/unit/glue/glue_mces_golden.json
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the URN usage consistency across the project. # Test: Search for the URN usage. Expect: Only occurances of the specified URN. rg --type json -A 5 $'urn:li:container:0b9f1f731ecf6743be6207fec3dc9cba'Length of output: 6426
49-56
: Ensure correct subTypes.The subTypes aspect indicates that the container is of type
Database
. Verify that this is correct.
63-74
: Check for completeness of custom properties.Ensure that all necessary custom properties are included and correctly populated.
80-85
: Ensure correct status aspect.The status aspect indicates that the container is not removed. Verify that this status is correct.
91-96
: Ensure correct data platform instance.The data platform instance is set to
urn:li:dataPlatform:glue
. Verify that this is correct.
102-109
: Ensure correct subTypes.The subTypes aspect indicates that the container is of type
Database
. Verify that this is correct.
59-74
: Ensure consistent URN usage.The URN
urn:li:container:bdf4342ea6899d162eae685bfe9074a7
is used consistently across multiple aspects. Verify that this URN is correct and consistent with other parts of the project.Verification successful
URN usage is consistent and correct.
The URN
urn:li:container:bdf4342ea6899d162eae685bfe9074a7
is used consistently across multiple files and aspects within the project. No discrepancies were found.
- metadata-ingestion/tests/unit/glue/glue_mces_golden_table_lineage.json
- metadata-ingestion/tests/unit/glue/glue_mces_golden.json
- metadata-ingestion/tests/unit/glue/glue_deleted_actor_mces_golden.json
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the URN usage consistency across the project. # Test: Search for the URN usage. Expect: Only occurances of the specified URN. rg --type json -A 5 $'urn:li:container:bdf4342ea6899d162eae685bfe9074a7'Length of output: 12703
115-127
: Check for completeness of custom properties.Ensure that all necessary custom properties are included and correctly populated.
133-138
: Ensure correct status aspect.The status aspect indicates that the container is not removed. Verify that this status is correct.
144-149
: Ensure correct data platform instance.The data platform instance is set to
urn:li:dataPlatform:glue
. Verify that this is correct.
155-162
: Ensure correct subTypes.The subTypes aspect indicates that the container is of type
Database
. Verify that this is correct.
170-173
: Ensure correct status aspect.The status aspect indicates that the dataset is not removed. Verify that this status is correct.
175-201
: Check for completeness of custom properties.Ensure that all necessary custom properties are included and correctly populated.
333-335
: Ensure correct data platform instance.The data platform instance is set to
urn:li:dataPlatform:glue
. Verify that this is correct.
353-361
: Ensure correct global tags.The global tags aspect includes tags associated with the dataset. Verify that all tags are correctly defined.
369-378
: Ensure correct subTypes.The subTypes aspect indicates that the dataset is of type
Table
. Verify that this is correct.
383-390
: Ensure correct container aspect.The container aspect indicates the container URN. Verify that this is correct.
418-421
: Ensure correct status aspect.The status aspect indicates that the dataset is not removed. Verify that this status is correct.
423-448
: Check for completeness of custom properties.Ensure that all necessary custom properties are included and correctly populated.
538-540
: Ensure correct data platform instance.The data platform instance is set to
urn:li:dataPlatform:glue
. Verify that this is correct.
113-127
: Ensure consistent URN usage.The URN
urn:li:container:110bc08849d1c1bde5fc345dab5c3ae7
is used consistently across multiple aspects. Verify that this URN is correct and consistent with other parts of the project.Verification successful
URN usage is consistent.
The URN
urn:li:container:110bc08849d1c1bde5fc345dab5c3ae7
is used consistently across multiple aspects in the project.
metadata-ingestion/tests/unit/glue/glue_mces_golden_table_lineage.json
metadata-ingestion/tests/unit/glue/glue_mces_golden.json
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the URN usage consistency across the project. # Test: Search for the URN usage. Expect: Only occurances of the specified URN. rg --type json -A 5 $'urn:li:container:110bc08849d1c1bde5fc345dab5c3ae7'Length of output: 4784
338-350
: Ensure correct ownership.The ownership aspect includes owners and their types. Verify that all owners and types are correctly defined.
543-555
: Ensure correct ownership.The ownership aspect includes owners and their types. Verify that all owners and types are correctly defined.
#!/bin/bash # Description: Verify the ownership aspect correctness. # Test: Search <details> <summary>Verification successful</summary> --- <!-- [monologue] [task review_file_1/1 lines 543-555] [observations] The search results show three instances of the "Ownership" structure in the JSON file. Each instance has the same structure: - An "owners" array with one owner having the type "DATAOWNER". - An empty "ownerTypes" object. - A "lastModified" object with a time and actor. [analysis] The consistency across all instances suggests that the ownership structure is correctly defined in terms of the JSON schema. However, the "ownerTypes" object is empty in all instances, which might be a point of concern depending on the expected data. [actions] Since the structure is consistent and appears correct, we can conclude the verification process. If the empty "ownerTypes" is intentional or acceptable, then the ownership aspect is correctly defined. --> **Ownership aspect verified.** The ownership structure is consistent across all instances in the JSON file, with correctly defined owners and types. The "ownerTypes" object is empty, which may be acceptable depending on the expected data. - Instances found: 3 - Owners and types: Correctly defined - ownerTypes: Empty but consistent </details> <details> <summary>Scripts executed</summary> The following scripts were executed for the analysis: --- Script: ```shell #!/bin/bash # Description: Search for all instances of "Ownership" in the JSON file and display their structure. # Search for "Ownership" and display the surrounding lines for context. rg '"com.linkedin.pegasus2avro.common.Ownership":' -A 10 metadata-ingestion/tests/unit/glue/glue_mces_golden_table_lineage.jsonLength of output: 1620
204-329
: Ensure correct schema metadata.The schema metadata includes fields and their properties. Verify that all fields and properties are correctly defined.
Verification successful
Schema metadata verification completed.
The schema metadata in the specified JSON file appears to be correctly defined. Each field includes all necessary attributes, and the values are consistent with the expected schema.
schemaName
,platform
,version
,created
,lastModified
,hash
,platformSchema
, andfields
are all present and correctly structured.- Each field within
fields
containsfieldPath
,nullable
,description
(where applicable),type
,nativeDataType
,recursive
,isPartOfKey
, andjsonProps
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the schema metadata correctness. # Test: Search for the schema metadata. Expect: All fields and properties are correctly defined. rg --type json -A 5 $'schemaMetadata'Length of output: 81650
Script:
#!/bin/bash # Description: Extract and display the relevant portion of the JSON file to verify the schema metadata correctness. # Test: Extract lines 204-329 from the specified JSON file and display them. # Extract and display lines 204-329 from the JSON file sed -n '204,329p' metadata-ingestion/tests/unit/glue/glue_mces_golden_table_lineage.jsonLength of output: 6828
165-364
: Ensure consistent URN usage.The URN
urn:li:dataset:(urn:li:dataPlatform:glue,flights-database.avro,PROD)
is used consistently across multiple aspects. Verify that this URN is correct and consistent with other parts of the project.
393-410
: Ensure correct upstream lineage.The upstream lineage aspect includes upstream datasets. Verify that all upstream datasets are correctly defined.
Verification successful
Let's run a script to extract the "dataset" URNs from the "upstreams" arrays in the JSON files and validate their format.
Upstream lineage datasets are correctly defined.
The extracted dataset URNs from the JSON files follow the expected format and correctly represent the upstream datasets.
- Example URNs:
urn:li:dataset:(urn:li:dataPlatform:dbt,pagila.pagila_dev.customer_details,PROD)
urn:li:dataset:(urn:li:dataPlatform:bigquery,bigquery-public-data.covid19_geotab_mobility_impact.commercial_traffic_by_industry,PROD)
urn:li:dataset:(urn:li:dataPlatform:postgres,pagila.public.customer,PROD)
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the upstream lineage correctness. # Test: Search for the upstream lineage aspect. Expect: All upstream datasets are correctly defined. rg --type json -A 5 $'upstreamLineage'Length of output: 193876
Script:
#!/bin/bash # Description: Extract and validate dataset URNs from upstreamLineage aspects. # Extract dataset URNs from upstreamLineage aspects in JSON files. rg --type json -o -r '$1' '"dataset":\s*"([^"]+)"' | sort | uniqLength of output: 127687
451-535
: Ensure correct schema metadata.The schema metadata includes fields and their properties. Verify that all fields and properties are correctly defined.
Verification successful
Schema metadata verification successful.
The schema metadata in
metadata-ingestion/tests/unit/glue/glue_mces_golden_table_lineage.json
is correctly defined and includes all necessary fields and properties.
schemaName
: Correctly specified.platform
: Correctly specified.version
: Correctly specified.created
andlastModified
: Correctly specified withtime
andactor
.hash
: Correctly specified.platformSchema
: Correctly specified withtableSchema
.fields
: Correctly specified withfieldPath
,nullable
,type
,nativeDataType
,recursive
,isPartOfKey
, andjsonProps
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the schema metadata correctness. # Test: Search for the schema metadata. Expect: All fields and properties are correctly defined. rg --type json -A 5 $'schemaMetadata'Length of output: 81650
Script:
#!/bin/bash # Description: Search for the schema metadata and its detailed contents in the specified file. # Test: Search for the schema metadata object and its fields and properties in the specified file. rg --type json -A 20 $'schemaMetadata' metadata-ingestion/tests/unit/glue/glue_mces_golden_table_lineage.jsonLength of output: 108
Script:
#!/bin/bash # Description: Search for the schema metadata and its detailed contents in the specified file. # Test: Search for the term `SchemaMetadata` in the specified file. rg --type json -A 20 $'SchemaMetadata' metadata-ingestion/tests/unit/glue/glue_mces_golden_table_lineage.jsonLength of output: 3102
413-567
: Ensure consistent URN usage.The URN
urn:li:dataset:(urn:li:dataPlatform:glue,test-database.test_jsons_markers,PROD)
is used consistently across multiple aspects. Verify that this URN is correct and consistent with other parts of the project.Verification successful
Ensure consistent URN usage.
The URN
urn:li:dataset:(urn:li:dataPlatform:glue,test-database.test_jsons_markers,PROD)
is used consistently across multiple test files in themetadata-ingestion/tests/unit/glue
directory. No inconsistencies were found in its usage.
metadata-ingestion/tests/unit/glue/glue_mces_golden_table_lineage.json
metadata-ingestion/tests/unit/glue/glue_deleted_actor_mces_golden.json
metadata-ingestion/tests/unit/glue/glue_mces_golden.json
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the URN usage consistency across the project. # Test: Search for the URN usage. Expect: Only occurances of the specified URN. rg --type json -A 5 $'urn:li:dataset:(urn:li:dataPlatform:glue,test-database.test_jsons_markers,PROD)'Length of output: 103
Script:
#!/bin/bash # Description: Verify the URN usage consistency across the entire codebase. # Test: Search for the URN usage. Expect: Occurrences of the specified URN in various contexts. rg -A 5 $'urn:li:dataset:(urn:li:dataPlatform:glue,test-database.test_jsons_markers,PROD)'Length of output: 91
Script:
#!/bin/bash # Description: Verify the URN usage consistency across the entire codebase, including all file types. # Test: Search for the URN usage. Expect: Occurrences of the specified URN in various contexts. rg -A 5 'urn:li:dataset:(urn:li:dataPlatform:glue,test-database.test_jsons_markers,PROD)'Length of output: 91
Script:
#!/bin/bash # Description: Verify the URN usage consistency across the entire codebase using fd and grep. # List all files in the codebase and search for the URN within those files. fd -t f -x grep -Hn 'urn:li:dataset:(urn:li:dataPlatform:glue,test-database.test_jsons_markers,PROD)' {}Length of output: 2018
6c45be8
to
50e1f92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- metadata-ingestion/src/datahub/ingestion/source/aws/glue.py (7 hunks)
Files skipped from review as they are similar to previous changes (1)
- metadata-ingestion/src/datahub/ingestion/source/aws/glue.py
( | ||
f | ||
for f in schema_metadata_for_s3.fields | ||
if simplify_field_path(f.fieldPath) == field_path_v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this downgrading to V1 field paths. Is that the best practice going forward here?
* feat(forms) Handle deleting forms references when hard deleting forms (datahub-project#10820) * refactor(ui): Misc improvements to the setup ingestion flow (ingest uplift 1/2) (datahub-project#10764) Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> * fix(ingestion/airflow-plugin): pipeline tasks discoverable in search (datahub-project#10819) * feat(ingest/transformer): tags to terms transformer (datahub-project#10758) Co-authored-by: Aseem Bansal <[email protected]> * fix(ingestion/unity-catalog): fixed issue with profiling with GE turned on (datahub-project#10752) Co-authored-by: Aseem Bansal <[email protected]> * feat(forms) Add java SDK for form entity PATCH + CRUD examples (datahub-project#10822) * feat(SDK) Add java SDK for structuredProperty entity PATCH + CRUD examples (datahub-project#10823) * feat(SDK) Add StructuredPropertyPatchBuilder in python sdk and provide sample CRUD files (datahub-project#10824) * feat(forms) Add CRUD endpoints to GraphQL for Form entities (datahub-project#10825) * add flag for includeSoftDeleted in scroll entities API (datahub-project#10831) * feat(deprecation) Return actor entity with deprecation aspect (datahub-project#10832) * feat(structuredProperties) Add CRUD graphql APIs for structured property entities (datahub-project#10826) * add scroll parameters to openapi v3 spec (datahub-project#10833) * fix(ingest): correct profile_day_of_week implementation (datahub-project#10818) * feat(ingest/glue): allow ingestion of empty databases from Glue (datahub-project#10666) Co-authored-by: Harshal Sheth <[email protected]> * feat(cli): add more details to get cli (datahub-project#10815) * fix(ingestion/glue): ensure date formatting works on all platforms for aws glue (datahub-project#10836) * fix(ingestion): fix datajob patcher (datahub-project#10827) * fix(smoke-test): add suffix in temp file creation (datahub-project#10841) * feat(ingest/glue): add helper method to permit user or group ownership (datahub-project#10784) * feat(): Show data platform instances in policy modal if they are set on the policy (datahub-project#10645) Co-authored-by: Hendrik Richert <[email protected]> * docs(patch): add patch documentation for how implementation works (datahub-project#10010) Co-authored-by: John Joyce <[email protected]> * fix(jar): add missing custom-plugin-jar task (datahub-project#10847) * fix(): also check exceptions/stack trace when filtering log messages (datahub-project#10391) Co-authored-by: John Joyce <[email protected]> * docs(): Update posts.md (datahub-project#9893) Co-authored-by: Hyejin Yoon <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * chore(ingest): update acryl-datahub-classify version (datahub-project#10844) * refactor(ingest): Refactor structured logging to support infos, warnings, and failures structured reporting to UI (datahub-project#10828) Co-authored-by: John Joyce <[email protected]> Co-authored-by: Harshal Sheth <[email protected]> * fix(restli): log aspect-not-found as a warning rather than as an error (datahub-project#10834) * fix(ingest/nifi): remove duplicate upstream jobs (datahub-project#10849) * fix(smoke-test): test access to create/revoke personal access tokens (datahub-project#10848) * fix(smoke-test): missing test for move domain (datahub-project#10837) * ci: update usernames to not considered for community (datahub-project#10851) * env: change defaults for data contract visibility (datahub-project#10854) * fix(ingest/tableau): quote special characters in external URL (datahub-project#10842) * fix(smoke-test): fix flakiness of auto complete test * ci(ingest): pin dask dependency for feast (datahub-project#10865) * fix(ingestion/lookml): liquid template resolution and view-to-view cll (datahub-project#10542) * feat(ingest/audit): add client id and version in system metadata props (datahub-project#10829) * chore(ingest): Mypy 1.10.1 pin (datahub-project#10867) * docs: use acryl-datahub-actions as expected python package to install (datahub-project#10852) * docs: add new js snippet (datahub-project#10846) * refactor(ingestion): remove company domain for security reason (datahub-project#10839) * fix(ingestion/spark): Platform instance and column level lineage fix (datahub-project#10843) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(ingestion/tableau): optionally ingest multiple sites and create site containers (datahub-project#10498) Co-authored-by: Yanik Häni <[email protected]> * fix(ingestion/looker): Add sqlglot dependency and remove unused sqlparser (datahub-project#10874) * fix(manage-tokens): fix manage access token policy (datahub-project#10853) * Batch get entity endpoints (datahub-project#10880) * feat(system): support conditional write semantics (datahub-project#10868) * fix(build): upgrade vercel builds to Node 20.x (datahub-project#10890) * feat(ingest/lookml): shallow clone repos (datahub-project#10888) * fix(ingest/looker): add missing dependency (datahub-project#10876) * fix(ingest): only populate audit stamps where accurate (datahub-project#10604) * fix(ingest/dbt): always encode tag urns (datahub-project#10799) * fix(ingest/redshift): handle multiline alter table commands (datahub-project#10727) * fix(ingestion/looker): column name missing in explore (datahub-project#10892) * fix(lineage) Fix lineage source/dest filtering with explored per hop limit (datahub-project#10879) * feat(conditional-writes): misc updates and fixes (datahub-project#10901) * feat(ci): update outdated action (datahub-project#10899) * feat(rest-emitter): adding async flag to rest emitter (datahub-project#10902) Co-authored-by: Gabe Lyons <[email protected]> * feat(ingest): add snowflake-queries source (datahub-project#10835) * fix(ingest): improve `auto_materialize_referenced_tags_terms` error handling (datahub-project#10906) * docs: add new company to adoption list (datahub-project#10909) * refactor(redshift): Improve redshift error handling with new structured reporting system (datahub-project#10870) Co-authored-by: John Joyce <[email protected]> Co-authored-by: Harshal Sheth <[email protected]> * feat(ui) Finalize support for all entity types on forms (datahub-project#10915) * Index ExecutionRequestResults status field (datahub-project#10811) * feat(ingest): grafana connector (datahub-project#10891) Co-authored-by: Shirshanka Das <[email protected]> Co-authored-by: Harshal Sheth <[email protected]> * fix(gms) Add Form entity type to EntityTypeMapper (datahub-project#10916) * feat(dataset): add support for external url in Dataset (datahub-project#10877) * docs(saas-overview) added missing features to observe section (datahub-project#10913) Co-authored-by: John Joyce <[email protected]> * fix(ingest/spark): Fixing Micrometer warning (datahub-project#10882) * fix(structured properties): allow application of structured properties without schema file (datahub-project#10918) * fix(data-contracts-web) handle other schedule types (datahub-project#10919) * fix(ingestion/tableau): human-readable message for PERMISSIONS_MODE_SWITCHED error (datahub-project#10866) Co-authored-by: Harshal Sheth <[email protected]> * Add feature flag for view defintions (datahub-project#10914) Co-authored-by: Ethan Cartwright <[email protected]> * feat(ingest/BigQuery): refactor+parallelize dataset metadata extraction (datahub-project#10884) * fix(airflow): add error handling around render_template() (datahub-project#10907) * feat(ingestion/sqlglot): add optional `default_dialect` parameter to sqlglot lineage (datahub-project#10830) * feat(mcp-mutator): new mcp mutator plugin (datahub-project#10904) * fix(ingest/bigquery): changes helper function to decode unicode scape sequences (datahub-project#10845) * feat(ingest/postgres): fetch table sizes for profile (datahub-project#10864) * feat(ingest/abs): Adding azure blob storage ingestion source (datahub-project#10813) * fix(ingest/redshift): reduce severity of SQL parsing issues (datahub-project#10924) * fix(build): fix lint fix web react (datahub-project#10896) * fix(ingest/bigquery): handle quota exceeded for project.list requests (datahub-project#10912) * feat(ingest): report extractor failures more loudly (datahub-project#10908) * feat(ingest/snowflake): integrate snowflake-queries into main source (datahub-project#10905) * fix(ingest): fix docs build (datahub-project#10926) * fix(ingest/snowflake): fix test connection (datahub-project#10927) * fix(ingest/lookml): add view load failures to cache (datahub-project#10923) * docs(slack) overhauled setup instructions and screenshots (datahub-project#10922) Co-authored-by: John Joyce <[email protected]> * fix(airflow): Add comma parsing of owners to DataJobs (datahub-project#10903) * fix(entityservice): fix merging sideeffects (datahub-project#10937) * feat(ingest): Support System Ingestion Sources, Show and hide system ingestion sources with Command-S (datahub-project#10938) Co-authored-by: John Joyce <[email protected]> * chore() Set a default lineage filtering end time on backend when a start time is present (datahub-project#10925) Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> * Added relationships APIs to V3. Added these generic APIs to V3 swagger doc. (datahub-project#10939) * docs: add learning center to docs (datahub-project#10921) * doc: Update hubspot form id (datahub-project#10943) * chore(airflow): add python 3.11 w/ Airflow 2.9 to CI (datahub-project#10941) * fix(ingest/Glue): column upstream lineage between S3 and Glue (datahub-project#10895) * fix(ingest/abs): split abs utils into multiple files (datahub-project#10945) * doc(ingest/looker): fix doc for sql parsing documentation (datahub-project#10883) Co-authored-by: Harshal Sheth <[email protected]> * fix(ingest/bigquery): Adding missing BigQuery types (datahub-project#10950) * fix(ingest/setup): feast and abs source setup (datahub-project#10951) * fix(connections) Harden adding /gms to connections in backend (datahub-project#10942) * feat(siblings) Add flag to prevent combining siblings in the UI (datahub-project#10952) * fix(docs): make graphql doc gen more automated (datahub-project#10953) * feat(ingest/athena): Add option for Athena partitioned profiling (datahub-project#10723) * fix(spark-lineage): default timeout for future responses (datahub-project#10947) * feat(datajob/flow): add environment filter using info aspects (datahub-project#10814) * fix(ui/ingest): correct privilege used to show tab (datahub-project#10483) Co-authored-by: Kunal-kankriya <[email protected]> * feat(ingest/looker): include dashboard urns in browse v2 (datahub-project#10955) * add a structured type to batchGet in OpenAPI V3 spec (datahub-project#10956) * fix(ui): scroll on the domain sidebar to show all domains (datahub-project#10966) * fix(ingest/sagemaker): resolve incorrect variable assignment for SageMaker API call (datahub-project#10965) * fix(airflow/build): Pinning mypy (datahub-project#10972) * Fixed a bug where the OpenAPI V3 spec was incorrect. The bug was introduced in datahub-project#10939. (datahub-project#10974) * fix(ingest/test): Fix for mssql integration tests (datahub-project#10978) * fix(entity-service) exist check correctly extracts status (datahub-project#10973) * fix(structuredProps) casing bug in StructuredPropertiesValidator (datahub-project#10982) * bugfix: use anyOf instead of allOf when creating references in openapi v3 spec (datahub-project#10986) * fix(ui): Remove ant less imports (datahub-project#10988) * feat(ingest/graph): Add get_results_by_filter to DataHubGraph (datahub-project#10987) * feat(ingest/cli): init does not actually support environment variables (datahub-project#10989) * fix(ingest/graph): Update get_results_by_filter graphql query (datahub-project#10991) * feat(ingest/spark): Promote beta plugin (datahub-project#10881) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(ingest): support domains in meta -> "datahub" section (datahub-project#10967) * feat(ingest): add `check server-config` command (datahub-project#10990) * feat(cli): Make consistent use of DataHubGraphClientConfig (datahub-project#10466) Deprecates get_url_and_token() in favor of a more complete option: load_graph_config() that returns a full DatahubClientConfig. This change was then propagated across previous usages of get_url_and_token so that connections to DataHub server from the client respect the full breadth of configuration specified by DatahubClientConfig. I.e: You can now specify disable_ssl_verification: true in your ~/.datahubenv file so that all cli functions to the server work when ssl certification is disabled. Fixes datahub-project#9705 * fix(ingest/s3): Fixing container creation when there is no folder in path (datahub-project#10993) * fix(ingest/looker): support platform instance for dashboards & charts (datahub-project#10771) * feat(ingest/bigquery): improve handling of information schema in sql parser (datahub-project#10985) * feat(ingest): improve `ingest deploy` command (datahub-project#10944) * fix(backend): allow excluding soft-deleted entities in relationship-queries; exclude soft-deleted members of groups (datahub-project#10920) - allow excluding soft-deleted entities in relationship-queries - exclude soft-deleted members of groups * fix(ingest/looker): downgrade missing chart type log level (datahub-project#10996) * doc(acryl-cloud): release docs for 0.3.4.x (datahub-project#10984) Co-authored-by: John Joyce <[email protected]> Co-authored-by: RyanHolstien <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Pedro Silva <[email protected]> * fix(protobuf/build): Fix protobuf check jar script (datahub-project#11006) * fix(ui/ingest): Support invalid cron jobs (datahub-project#10998) * fix(ingest): fix graph config loading (datahub-project#11002) Co-authored-by: Pedro Silva <[email protected]> * feat(docs): Document __DATAHUB_TO_FILE_ directive (datahub-project#10968) Co-authored-by: Harshal Sheth <[email protected]> * fix(graphql/upsertIngestionSource): Validate cron schedule; parse error in CLI (datahub-project#11011) * feat(ece): support custom ownership type urns in ECE generation (datahub-project#10999) * feat(assertion-v2): changed Validation tab to Quality and created new Governance tab (datahub-project#10935) * fix(ingestion/glue): Add support for missing config options for profiling in Glue (datahub-project#10858) * feat(propagation): Add models for schema field docs, tags, terms (datahub-project#2959) (datahub-project#11016) Co-authored-by: Chris Collins <[email protected]> * docs: standardize terminology to DataHub Cloud (datahub-project#11003) * fix(ingestion/transformer): replace the externalUrl container (datahub-project#11013) * docs(slack) troubleshoot docs (datahub-project#11014) * feat(propagation): Add graphql API (datahub-project#11030) Co-authored-by: Chris Collins <[email protected]> * feat(propagation): Add models for Action feature settings (datahub-project#11029) * docs(custom properties): Remove duplicate from sidebar (datahub-project#11033) * feat(models): Introducing Dataset Partitions Aspect (datahub-project#10997) Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> * feat(propagation): Add Documentation Propagation Settings (datahub-project#11038) * fix(models): chart schema fields mapping, add dataHubAction entity, t… (datahub-project#11040) * fix(ci): smoke test lint failures (datahub-project#11044) * docs: fix learning center color scheme & typo (datahub-project#11043) * feat: add cloud main page (datahub-project#11017) Co-authored-by: Jay <[email protected]> * feat(restore-indices): add additional step to also clear system metadata service (datahub-project#10662) Co-authored-by: John Joyce <[email protected]> * docs: fix typo (datahub-project#11046) * fix(lint): apply spotless (datahub-project#11050) * docs(airflow): example query to get datajobs for a dataflow (datahub-project#11034) * feat(cli): Add run-id option to put sub-command (datahub-project#11023) Adds an option to assign run-id to a given put command execution. This is useful when transformers do not exist for a given ingestion payload, we can follow up with custom metadata and assign it to an ingestion pipeline. * fix(ingest): improve sql error reporting calls (datahub-project#11025) * fix(airflow): fix CI setup (datahub-project#11031) * feat(ingest/dbt): add experimental `prefer_sql_parser_lineage` flag (datahub-project#11039) * fix(ingestion/lookml): enable stack-trace in lookml logs (datahub-project#10971) * (chore): Linting fix (datahub-project#11015) * chore(ci): update deprecated github actions (datahub-project#10977) * Fix ALB configuration example (datahub-project#10981) * chore(ingestion-base): bump base image packages (datahub-project#11053) * feat(cli): Trim report of dataHubExecutionRequestResult to max GMS size (datahub-project#11051) * fix(ingestion/lookml): emit dummy sql condition for lookml custom condition tag (datahub-project#11008) Co-authored-by: Harshal Sheth <[email protected]> * fix(ingestion/powerbi): fix issue with broken report lineage (datahub-project#10910) * feat(ingest/tableau): add retry on timeout (datahub-project#10995) * change generate kafka connect properties from env (datahub-project#10545) Co-authored-by: david-leifker <[email protected]> * fix(ingest): fix oracle cronjob ingestion (datahub-project#11001) Co-authored-by: david-leifker <[email protected]> * chore(ci): revert update deprecated github actions (datahub-project#10977) (datahub-project#11062) * feat(ingest/dbt-cloud): update metadata_endpoint inference (datahub-project#11041) * build: Reduce size of datahub-frontend-react image by 50-ish% (datahub-project#10878) Co-authored-by: david-leifker <[email protected]> * fix(ci): Fix lint issue in datahub_ingestion_run_summary_provider.py (datahub-project#11063) * docs(ingest): update developing-a-transformer.md (datahub-project#11019) * feat(search-test): update search tests from datahub-project#10408 (datahub-project#11056) * feat(cli): add aspects parameter to DataHubGraph.get_entity_semityped (datahub-project#11009) Co-authored-by: Harshal Sheth <[email protected]> * docs(airflow): update min version for plugin v2 (datahub-project#11065) * doc(ingestion/tableau): doc update for derived permission (datahub-project#11054) Co-authored-by: Pedro Silva <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Harshal Sheth <[email protected]> * fix(py): remove dep on types-pkg_resources (datahub-project#11076) * feat(ingest/mode): add option to exclude restricted (datahub-project#11081) * fix(ingest): set lastObserved in sdk when unset (datahub-project#11071) * doc(ingest): Update capabilities (datahub-project#11072) * chore(vulnerability): Log Injection (datahub-project#11090) * chore(vulnerability): Information exposure through a stack trace (datahub-project#11091) * chore(vulnerability): Comparison of narrow type with wide type in loop condition (datahub-project#11089) * chore(vulnerability): Insertion of sensitive information into log files (datahub-project#11088) * chore(vulnerability): Risky Cryptographic Algorithm (datahub-project#11059) * chore(vulnerability): Overly permissive regex range (datahub-project#11061) Co-authored-by: Harshal Sheth <[email protected]> * fix: update customer data (datahub-project#11075) * fix(models): fixing the datasetPartition models (datahub-project#11085) Co-authored-by: John Joyce <[email protected]> * fix(ui): Adding view, forms GraphQL query, remove showing a fallback error message on unhandled GraphQL error (datahub-project#11084) Co-authored-by: John Joyce <[email protected]> * feat(docs-site): hiding learn more from cloud page (datahub-project#11097) * fix(docs): Add correct usage of orFilters in search API docs (datahub-project#11082) Co-authored-by: Jay <[email protected]> * fix(ingest/mode): Regexp in mode name matcher didn't allow underscore (datahub-project#11098) * docs: Refactor customer stories section (datahub-project#10869) Co-authored-by: Jeff Merrick <[email protected]> * fix(release): fix full/slim suffix on tag (datahub-project#11087) * feat(config): support alternate hashing algorithm for doc id (datahub-project#10423) Co-authored-by: david-leifker <[email protected]> Co-authored-by: John Joyce <[email protected]> * fix(emitter): fix typo in get method of java kafka emitter (datahub-project#11007) * fix(ingest): use correct native data type in all SQLAlchemy sources by compiling data type using dialect (datahub-project#10898) Co-authored-by: Harshal Sheth <[email protected]> * chore: Update contributors list in PR labeler (datahub-project#11105) * feat(ingest): tweak stale entity removal messaging (datahub-project#11064) * fix(ingestion): enforce lastObserved timestamps in SystemMetadata (datahub-project#11104) * fix(ingest/powerbi): fix broken lineage between chart and dataset (datahub-project#11080) * feat(ingest/lookml): CLL support for sql set in sql_table_name attribute of lookml view (datahub-project#11069) * docs: update graphql docs on forms & structured properties (datahub-project#11100) * test(search): search openAPI v3 test (datahub-project#11049) * fix(ingest/tableau): prevent empty site content urls (datahub-project#11057) Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * feat(entity-client): implement client batch interface (datahub-project#11106) * fix(snowflake): avoid reporting warnings/info for sys tables (datahub-project#11114) * fix(ingest): downgrade column type mapping warning to info (datahub-project#11115) * feat(api): add AuditStamp to the V3 API entity/aspect response (datahub-project#11118) * fix(ingest/redshift): replace r'\n' with '\n' to avoid token error redshift serverless… (datahub-project#11111) * fix(entiy-client): handle null entityUrn case for restli (datahub-project#11122) * fix(sql-parser): prevent bad urns from alter table lineage (datahub-project#11092) * fix(ingest/bigquery): use small batch size if use_tables_list_query_v2 is set (datahub-project#11121) * fix(graphql): add missing entities to EntityTypeMapper and EntityTypeUrnMapper (datahub-project#10366) * feat(ui): Changes to allow editable dataset name (datahub-project#10608) Co-authored-by: Jay Kadambi <[email protected]> * fix: remove saxo (datahub-project#11127) * feat(mcl-processor): Update mcl processor hooks (datahub-project#11134) * fix(openapi): fix openapi v2 endpoints & v3 documentation update * Revert "fix(openapi): fix openapi v2 endpoints & v3 documentation update" This reverts commit 573c1cb. * docs(policies): updates to policies documentation (datahub-project#11073) * fix(openapi): fix openapi v2 and v3 docs update (datahub-project#11139) * feat(auth): grant type and acr values custom oidc parameters support (datahub-project#11116) * fix(mutator): mutator hook fixes (datahub-project#11140) * feat(search): support sorting on multiple fields (datahub-project#10775) * feat(ingest): various logging improvements (datahub-project#11126) * fix(ingestion/lookml): fix for sql parsing error (datahub-project#11079) Co-authored-by: Harshal Sheth <[email protected]> * feat(docs-site) cloud page spacing and content polishes (datahub-project#11141) * feat(ui) Enable editing structured props on fields (datahub-project#11042) * feat(tests): add md5 and last computed to testResult model (datahub-project#11117) * test(openapi): openapi regression smoke tests (datahub-project#11143) * fix(airflow): fix tox tests + update docs (datahub-project#11125) * docs: add chime to adoption stories (datahub-project#11142) * fix(ingest/databricks): Updating code to work with Databricks sdk 0.30 (datahub-project#11158) * fix(kafka-setup): add missing script to image (datahub-project#11190) * fix(config): fix hash algo config (datahub-project#11191) * test(smoke-test): updates to smoke-tests (datahub-project#11152) * fix(elasticsearch): refactor idHashAlgo setting (datahub-project#11193) * chore(kafka): kafka version bump (datahub-project#11211) * readd UsageStatsWorkUnit * fix merge problems * change logo --------- Co-authored-by: Chris Collins <[email protected]> Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> Co-authored-by: dushayntAW <[email protected]> Co-authored-by: sagar-salvi-apptware <[email protected]> Co-authored-by: Aseem Bansal <[email protected]> Co-authored-by: Kevin Chun <[email protected]> Co-authored-by: jordanjeremy <[email protected]> Co-authored-by: skrydal <[email protected]> Co-authored-by: Harshal Sheth <[email protected]> Co-authored-by: david-leifker <[email protected]> Co-authored-by: sid-acryl <[email protected]> Co-authored-by: Julien Jehannet <[email protected]> Co-authored-by: Hendrik Richert <[email protected]> Co-authored-by: Hendrik Richert <[email protected]> Co-authored-by: RyanHolstien <[email protected]> Co-authored-by: Felix Lüdin <[email protected]> Co-authored-by: Pirry <[email protected]> Co-authored-by: Hyejin Yoon <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: cburroughs <[email protected]> Co-authored-by: ksrinath <[email protected]> Co-authored-by: Mayuri Nehate <[email protected]> Co-authored-by: Kunal-kankriya <[email protected]> Co-authored-by: Shirshanka Das <[email protected]> Co-authored-by: ipolding-cais <[email protected]> Co-authored-by: Tamas Nemeth <[email protected]> Co-authored-by: Shubham Jagtap <[email protected]> Co-authored-by: haeniya <[email protected]> Co-authored-by: Yanik Häni <[email protected]> Co-authored-by: Gabe Lyons <[email protected]> Co-authored-by: Gabe Lyons <[email protected]> Co-authored-by: 808OVADOZE <[email protected]> Co-authored-by: noggi <[email protected]> Co-authored-by: Nicholas Pena <[email protected]> Co-authored-by: Jay <[email protected]> Co-authored-by: ethan-cartwright <[email protected]> Co-authored-by: Ethan Cartwright <[email protected]> Co-authored-by: Nadav Gross <[email protected]> Co-authored-by: Patrick Franco Braz <[email protected]> Co-authored-by: pie1nthesky <[email protected]> Co-authored-by: Joel Pinto Mata (KPN-DSH-DEX team) <[email protected]> Co-authored-by: Ellie O'Neil <[email protected]> Co-authored-by: Ajoy Majumdar <[email protected]> Co-authored-by: deepgarg-visa <[email protected]> Co-authored-by: Tristan Heisler <[email protected]> Co-authored-by: Andrew Sikowitz <[email protected]> Co-authored-by: Davi Arnaut <[email protected]> Co-authored-by: Pedro Silva <[email protected]> Co-authored-by: amit-apptware <[email protected]> Co-authored-by: Sam Black <[email protected]> Co-authored-by: Raj Tekal <[email protected]> Co-authored-by: Steffen Grohsschmiedt <[email protected]> Co-authored-by: jaegwon.seo <[email protected]> Co-authored-by: Renan F. Lima <[email protected]> Co-authored-by: Matt Exchange <[email protected]> Co-authored-by: Jonny Dixon <[email protected]> Co-authored-by: Pedro Silva <[email protected]> Co-authored-by: Pinaki Bhattacharjee <[email protected]> Co-authored-by: Jeff Merrick <[email protected]> Co-authored-by: skrydal <[email protected]> Co-authored-by: AndreasHegerNuritas <[email protected]> Co-authored-by: jayasimhankv <[email protected]> Co-authored-by: Jay Kadambi <[email protected]> Co-authored-by: David Leifker <[email protected]>
Checklist
Summary by CodeRabbit
New Features
Tests
Chores