Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(structured-properties): add cli validation for entity types #11863

Conversation

shirshanka
Copy link
Contributor

Adds validation for entity types referenced in structured property definition yaml files.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata smoke_test Contains changes related to smoke tests labels Nov 15, 2024
VALID_ENTITY_TYPES_PREFIX_STRING = ", ".join(
[
f"urn:li:entityType:datahub.{x}"
for x in ["dataset", "dashboard", "dataFlow", "schemaField"]
Copy link
Collaborator

@david-leifker david-leifker Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a temporary example?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the python code have access to the entity registry or is that server side only?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wonder if you might parse this http://localhost:9002/openapi/v3/api-docs/10-openapi-v3 as a workaround.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the openapi spec that is generated from the entity registry on the running server as a proxy for the entity registry's data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is that technically - one could register a new entityType completely dynamically. So generating a valid list of entityTypes by consulting the server requires talking to the data plane (not just the entity registry / openapi spec)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list of entities is just a sample... the error message looks like this:

Input urn:li:entityType:dataset is not a valid entity type urn. Valid entity type urns are urn:li:entityType:datahub.dataset, urn:li:entityType:datahub.dashboard, urn:li:entityType:datahub.dataFlow, urn:li:entityType:datahub.schemaField, etc... Ensure that the entity type is valid. (type=value_error)

@shirshanka shirshanka merged commit 3e128f4 into datahub-project:master Nov 16, 2024
74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata smoke_test Contains changes related to smoke tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants