Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(nested avro): Adding support for nested avro schema ingestion #3079

Merged
merged 57 commits into from
Aug 11, 2021

Conversation

rslanka
Copy link
Contributor

@rslanka rslanka commented Aug 11, 2021

  1. Adds support for nested AVRO schema ingestion.
  2. Defines a new fieldPath specification (v2) to support complex schemas (more details in docs/advanced/field-path-spec-v2.md).

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

rslanka and others added 30 commits July 29, 2021 21:01
1. Added type annotations to fields
2. Emitting non-leaf level fields as well.
1. Enhanced to emit all intermeidate AVRO nodes also as MCE schema
   fields.
2. Converted SchemaField list generation logic to use Python generator
   functions for better memory utilization.
3. Update unit tests.
1. Skipping the generation of intermeidate unions that are originally
   optional fields i.e. union(null, SomeType).
2. Eliminated redundant `[member=T]` and `[value=T] annotations.
shirshanka and others added 26 commits August 8, 2021 19:57
1. Skip emitting SchemaFields for 'null' member of unions.
2. Harden tests to ensure uniqueness of the fieldPath for SchemaFields
   corresponding to the actual AVRO fields (non-intermediate).
… types before a field, since they don't offer much value.
…ying type for optionals, instead of a union that Avro parser converts these to
Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shirshanka shirshanka merged commit 8844240 into datahub-project:master Aug 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants