Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(structured-properties): add cli validation for entity types #11863

Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
entity_types:
- dataset # or urn:li:entityType:datahub.dataset
- dataFlow
description: "Retention Time is used to figure out how long to retain records in a dataset"
description: 'Retention Time is used to figure out how long to retain records in a dataset'
allowed_values:
- value: 30
description: 30 days, usually reserved for datasets that are ephemeral and contain pii
Expand All @@ -18,7 +18,7 @@
- id: io.acryl.dataManagement.replicationSLA
type: number
display_name: Replication SLA
description: "SLA for how long data can be delayed before replicating to the destination cluster"
description: 'SLA for how long data can be delayed before replicating to the destination cluster'
entity_types:
- dataset
- id: io.acryl.dataManagement.deprecationDate
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,25 +41,28 @@ class AllowedValue(ConfigModel):
description: Optional[str] = None


VALID_ENTITY_TYPES_PREFIX_STRING = ", ".join(
[
f"urn:li:entityType:datahub.{x}"
for x in ["dataset", "dashboard", "dataFlow", "schemaField"]
Copy link
Collaborator

@david-leifker david-leifker Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a temporary example?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the python code have access to the entity registry or is that server side only?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wonder if you might parse this http://localhost:9002/openapi/v3/api-docs/10-openapi-v3 as a workaround.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the openapi spec that is generated from the entity registry on the running server as a proxy for the entity registry's data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is that technically - one could register a new entityType completely dynamically. So generating a valid list of entityTypes by consulting the server requires talking to the data plane (not just the entity registry / openapi spec)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list of entities is just a sample... the error message looks like this:

Input urn:li:entityType:dataset is not a valid entity type urn. Valid entity type urns are urn:li:entityType:datahub.dataset, urn:li:entityType:datahub.dashboard, urn:li:entityType:datahub.dataFlow, urn:li:entityType:datahub.schemaField, etc... Ensure that the entity type is valid. (type=value_error)

]
)
VALID_ENTITY_TYPES_STRING = f"Valid entity type urns are {VALID_ENTITY_TYPES_PREFIX_STRING}, etc... Ensure that the entity type is valid."


class TypeQualifierAllowedTypes(ConfigModel):
allowed_types: List[str]

@validator("allowed_types")
@validator("allowed_types", each_item=True)
def validate_allowed_types(cls, v):
validated_entity_type_urns = []
if v:
with get_default_graph() as graph:
for et in v:
validated_urn = Urn.make_entity_type_urn(et)
if graph.exists(validated_urn):
validated_entity_type_urns.append(validated_urn)
else:
logger.warn(
f"Input {et} is not a valid entity type urn. Skipping."
)
v = validated_entity_type_urns
if not v:
logger.warn("No allowed_types given within type_qualifier.")
validated_urn = Urn.make_entity_type_urn(v)
if not graph.exists(validated_urn):
raise ValueError(
f"Input {v} is not a valid entity type urn. {VALID_ENTITY_TYPES_STRING}"
)
v = str(validated_urn)
return v


Expand All @@ -77,6 +80,18 @@ class StructuredProperties(ConfigModel):
type_qualifier: Optional[TypeQualifierAllowedTypes] = None
immutable: Optional[bool] = False

@validator("entity_types", each_item=True)
def validate_entity_types(cls, v):
if v:
with get_default_graph() as graph:
validated_urn = Urn.make_entity_type_urn(v)
if not graph.exists(validated_urn):
raise ValueError(
f"Input {v} is not a valid entity type urn. {VALID_ENTITY_TYPES_STRING}"
)
v = str(validated_urn)
return v

@property
def fqn(self) -> str:
assert self.urn is not None
Expand Down
6 changes: 6 additions & 0 deletions smoke-test/tests/structured_properties/bad_entity_type.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
- id: clusterTypeBad
type: STRING
display_name: Cluster's type
description: 'Test Cluster Type Property'
entity_types:
- urn:li:entityType:dataset # should fail because this is not a valid entity type
Original file line number Diff line number Diff line change
Expand Up @@ -371,6 +371,24 @@ def test_structured_property_schema_field(ingest_cleanup_data, graph_client):
raise e


def test_structured_properties_yaml_load_with_bad_entity_type(
ingest_cleanup_data, graph_client
):
try:
StructuredProperties.create(
"tests/structured_properties/bad_entity_type.yaml",
graph=graph_client,
)
raise AssertionError(
"Should not be able to create structured properties with bad entity type"
)
except Exception as e:
if "urn:li:entityType:dataset is not a valid entity type urn" in str(e):
pass
else:
raise e


def test_dataset_yaml_loader(ingest_cleanup_data, graph_client):
StructuredProperties.create(
"tests/structured_properties/test_structured_properties.yaml",
Expand Down
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
- id: clusterType
type: STRING
display_name: Cluster's type
description: "Test Cluster Type Property"
description: 'Test Cluster Type Property'
entity_types:
- dataset
- id: clusterName
type: STRING
display_name: Cluster's name
description: "Test Cluster Name Property"
description: 'Test Cluster Name Property'
entity_types:
- dataset
- id: projectNames
type: STRING
cardinality: MULTIPLE
display_name: Project Name
entity_types:
- dataset # or urn:li:logicalEntity:metamodel.datahub.dataset
- dataflow
description: "Test property for project name"
- dataset # or urn:li:entityType:datahub.dataset
- dataFlow
description: 'Test property for project name'
allowed_values:
- value: Tracking
description: test value 1 for project
Expand Down
Loading