-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds an RFC to implement lineage backend #32
Conversation
Signed-off-by: verdan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is somewhat under-specified on its own (could specify the exact models etc), however I think given the prior RFC that went out, and existing code, that figuring out the details in implementation should be fine.
No new concepts/definitions are being introduced as a part of this RFC. | ||
|
||
Databuilder already has the table lineage model, which creates an upstream/downstream relation to adding to the Neo4j graph. | ||
Column lineage model however still needs to be developed as a part of this RFC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be reasonable to define this in the RFC. That said, given there's already table lineage, and the metadata response formats are defined, I think the solution space is small enough it's probably OK to figure it out in the implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, because we have started to define neo4j queries based up on pervious RFC and also lineage model available into amundsen-common. though these are not optimized queries and not completed but I was planning to put queries on this line.
def get_lineage(self, *,
id: str,
resource_type: ResourceType, direction: str, depth: int) -> Lineage:
get_both_lineage_query = textwrap.dedent(u"""
MATCH (down_parent:Table)<-[downstream_len:DOWNSTREAM*..{depth_key}]-(child:Table {{key: $query_key }})-[upstream_len:UPSTREAM*..{depth_key}]->(up_parent:Table)
WITH
child.key as child_key
,collect(distinct{{level:LENGTH(upstream_len),source:'hive',key:up_parent.key}}) AS upstream_entities
,collect(distinct{{level:LENGTH(downstream_len),source:'hive',key:down_parent.key}}) AS downstream_entities
RETURN
collect({{
key:child_key,direction:"both",depth:1
,upstream_entities:upstream_entities
,downstream_entities:downstream_entities
}}) AS lineageOutput
""").format(depth_key=depth)
get_upstream_lineage_query = textwrap.dedent(u"""
MATCH (child:Table {{key: $query_key }})-[upstream_len:UPSTREAM*..{depth_key}]->(up_parent:Table)
WITH
child.key as child_key
,collect(distinct{{level:LENGTH(upstream_len),source:'hive',key:up_parent.key}}) AS upstream_entities
RETURN
collect({{
key:child_key,direction:"upstream",depth:1
,upstream_entities:upstream_entities
}}) AS lineageOutput
""").format(depth_key=depth)
get_downstream_lineage_query = textwrap.dedent(u"""
MATCH (down_parent:Table)<-[downstream_len:DOWNSTREAM*..{depth_key}]-(child:Table {{key: $query_key }})
WITH
child.key as child_key
,collect(distinct{{level:LENGTH(downstream_len),source:'hive',key:down_parent.key}}) AS downstream_entities
RETURN
collect({{
key:child_key,direction:"downstream",depth:1
,downstream_entities:downstream_entities
}}) AS lineageOutput
""").format(depth_key=depth)
if direction == 'upstream':
records = self._execute_cypher_query(statement=get_upstream_lineage_query,
param_dict={'query_key': id})
elif direction == 'downstream':
records = self._execute_cypher_query(statement=get_downstream_lineage_query,
param_dict={'query_key': id})
else:
records = self._execute_cypher_query(statement=get_both_lineage_query,
param_dict={'query_key': id})
result = records.single()['lineageOutput'][0]
return result
|
||
ref: https://github.com/amundsen-io/rfcs/blob/master/rfcs/025-lineage-stage-0.md | ||
|
||
## Guide-level Explanation (aka Product Details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you describe whether backend is only for graph or atlas or mysql? which one does it not plan to support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
rfcs/032-lineage-backend.md
Outdated
|
||
No new concepts/definitions are being introduced as a part of this RFC. | ||
|
||
Databuilder already has the table lineage model, which creates an upstream/downstream relation to adding to the Neo4j graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you plan to change the existing model?
If so, what will be the new model interface?
The current one doesn't take into account for job/application that generate the lineage in between, do you plan to add those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarified
Signed-off-by: Dorian Johnson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@feng-tao Added some clarifications to answer your questions, thanks for the feedback!
|
||
ref: https://github.com/amundsen-io/rfcs/blob/master/rfcs/025-lineage-stage-0.md | ||
|
||
## Guide-level Explanation (aka Product Details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
rfcs/032-lineage-backend.md
Outdated
|
||
No new concepts/definitions are being introduced as a part of this RFC. | ||
|
||
Databuilder already has the table lineage model, which creates an upstream/downstream relation to adding to the Neo4j graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarified
* Adds an rfc to implement lineage backend Signed-off-by: verdan <[email protected]> * lineage rfc: review feedback Signed-off-by: Dorian Johnson <[email protected]> Co-authored-by: Dorian Johnson <[email protected]> Signed-off-by: Allison Suarez Miranda <[email protected]>
Signed-off-by: verdan [email protected]