Support orm in databuilder #10

xuan616 · 2020-10-26T17:02:01Z

No description provided.

Golodhros · 2020-10-27T16:23:51Z

000-orm-in-databuilder.md

@@ -0,0 +1,85 @@
+- Feature Name: Support ORM in databuilder
+- Start Date: 2020-10-26
+- RFC PR: [amundsen-io/rfcs#0000](https://github.com/amundsen-io/rfcs/pull/0000) (after opening the RFC PR, update this with a link to it and update the file name)


Suggested change

- RFC PR: [amundsen-io/rfcs#0000](https://github.com/amundsen-io/rfcs/pull/0000) (after opening the RFC PR, update this with a link to it and update the file name)

- RFC PR: [amundsen-io/rfcs#0010](https://github.com/amundsen-io/rfcs/pull/10)

feng-tao · 2020-11-04T17:03:58Z

@crazy-2020 Is the rfc ready to take a look (e.g how's ORM model look like compared to the graph model)? If so, I will take another look. Sorry for the late.

xuan616 · 2020-11-04T18:00:12Z

Thanks @feng-tao yes, it is ready for review. Please let me know if you have any concern.

criccomini

This looks like a good draft! Will wait for Amundsen folks to weigh in.

criccomini · 2020-11-04T21:34:03Z

000-orm-in-databuilder.md

+- RFC PR: [amundsen-io/rfcs#0010](https://github.com/amundsen-io/rfcs/pull/10)
+- Amundsen Issue: [amundsen-io/amundsen#0000](https://github.com/amundsen-io/amundsen/issues/0000) (leave this empty for now)
+
+# <RFC title>


Nit: needs a title

feng-tao

also to make development easier, it would be good to put the ORM/alembic into a separate repo like amundsen-io/amundsenrds (name could be you decided)? I could add a team for you to have access.

feng-tao · 2020-11-07T05:06:14Z

000-orm-in-databuilder.md

+N/A
+## Reference-level Explanation (aka Technical Details)
+
+- Still use the existing models but add model translators to support ORM schema definition and map node/relation data to the record that can be better consumed by relational DBs. 


FYI amundsen-io/amundsendatabuilder#380 which makes databuilder record more generic is just merged

feng-tao · 2020-11-07T05:11:07Z

000-orm-in-databuilder.md

+
+    - metric(metric_key, metric_name, metric_description_key, metric_description, metric_type_key, metric_type_name, dashboard_key, dashboard_name)
+
+    - table(table_key, database_key, database_name, cluster_key, cluster_name, schema_key, schema_name, schema_description_key, schema_description, schema_programmatic_description_key, schema_programmatic_description, table_name, table_is_view, table_description_key, table_description, table_programmatic_description_key, table_programmatic_description, last_updated_key, last_updated_timestamp, last_updated_timestamp_name,


is the table single table with that schema? what is the primary key? also some of the fields may be gathered from other sources e.g programmatic descriptions?

I wonder how do you get all the attributes in single extraction?

One alternative is that instead of having a denormalized schema like this. We could have the same schema as graph model:
E.G table model will only have https://github.com/amundsen-io/amundsendatabuilder/blob/d91c0c562c6e560dbc2a687e5dfb0d4cbc4a1c88/databuilder/models/table_metadata.py#L408-L422 which the model stays the same as graph model.

Pro side: 1) less change for databuilder; We build PK on label like what we have in graph model, we put FK relationship when serde the graph relationship. 2) Easier to implement
Con side: it will have multiple joins with less performance. I think it should be ok given we only need all the metadata for single table / single entity every time.

table_key is the pk of 'table'. Metadata from multiple extraction will update table.
I agree with you on keeping orm model same as graph model for those Pros if the use case is only about a single entity every time. I will update the orm schema accordingly in rfc.

Do you mean treating each relationship as an entity to make implementation easier?(e.g., table(key, name, is_view), schema_table(schema_key, table_key), schema(key, name))? Thanks @feng-tao

For relationship, I am thinking of just solving it with FK instead of a separate table.

feng-tao · 2020-11-07T05:11:33Z

000-orm-in-databuilder.md

+
+    - tag(tag_key, tag_type, entity_key)
+
+    - lineage(upstream_key, downstream_key)


don't worry about this one for now as it is still being updated.

feng-tao · 2020-11-07T05:11:59Z

000-orm-in-databuilder.md

+
+    - badge(badge_key, badge_category, entity_key)
+
+- Due to the reuse of graph db models, add new transformers for entities to convert 'node' and 'relationship' data to 'record' with model translators. 


will you convert the graph relationship into FK?

yes, will add FK from relationship

feng-tao · 2020-11-07T05:14:33Z

000-orm-in-databuilder.md

+## Unresolved questions
+
+- The initial relational DB schema plan was made according to the current graph DB models and it may need some polishment from Amundsen staff who are more familiar with real use cases.
+- Regarding how to map graph db models to ORM models, it would be great if there is a dynamic way for the model conversion


+1, but aside from performance, some of the metadata can't be fetched/gathered in single extraction as they could come from different sources.

I am mostly curious on how to get all those info into single table row with just one extraction.

feng-tao · 2020-11-07T05:16:46Z

also the main unknown questions would be how do you gather all the metadata just with single extraction for the ORM model as it is pretty denormalized. Or what will be the PK for model? and how do you update / cleanup the tables.

feng-tao · 2020-11-07T05:23:26Z

another question is that will you run the db upgrade/db migration in metadata proxy?

xuan616 · 2020-11-09T06:01:23Z

@feng-tao Thanks for view. I will update the orm model to make it same as the existing graph db models and it will be with marked PKs. For db migration, may I know the pro of running db upgrade in metadata proxy? or should it be in databuilder or the coming separate orm repo?

feng-tao · 2020-11-10T05:53:44Z

normally you bootstrap the db upgrade in a service (e..g metadata) bootstrap stage while databuilder is used mostly as a library in Airflow or any other workflow ETL orchestration . I'am thinking of the ORM db repo as a library which we could reference it in both databuilder and metadata.

@feng-tao Thanks for view. I will update the orm model to make it same as the existing graph db models and it will be with marked PKs. For db migration, may I know the pro of running db upgrade in metadata proxy? or should it be in databuilder or the coming separate orm repo?

feng-tao · 2020-11-10T05:58:43Z

Based on offline discussion, we may also want to consider how to clean up the db ORM tables instead of pure upsert to handle stale metadata.

feng-tao · 2020-11-11T07:26:00Z

https://github.com/amundsen-io/amundsenrds is created

Golodhros

LGTM

dorianj · 2020-12-03T01:13:45Z

Am I correct in reading that the overall intention is to support RDBMS's as a backend alongside neo4j, Atlas, and Neptune?

If so, is it possible to start RFCing the changes needed for the rest of the systems to support such a world, before we get deep into implementation on this particular portion? The databuilder piece is a totally reasonable way to start, but I think some discussion on how this fits into the architecture, and what commitments we're making by doing so, would be worth considering.

WDYT?

feng-tao · 2020-12-03T01:37:29Z

@Golodhros I think @crazy-2020 is still iterating the RFC, but in high level, we shouldn't merge it until it is finalized.

xuan616 · 2020-12-03T02:36:21Z

yeah, I am still working on proving the viability of possible changes in databuilder to support mysql as backend store. According to the comments in this PR, I am making orm models similar to the existing graph db ones to make the implementation easier. Since the generic graph db relationship/node/serializer was recently merged, I will change the implementation details accordingly. I am going to add RFCs for different modules(e.g., orm models, new serializer/dataloader/publisher). @Golodhros Could you help undo the merge, or you can remove the merged file and I will add
a new RFC for orm models in next PR. No worries, I am okay with either way. Thanks.

Golodhros · 2020-12-03T03:57:35Z

@Golodhros I think @crazy-2020 is still iterating the RFC, but in high level, we shouldn't merge it until it is finalized.

Oh, I see, my misunderstanding as I saw it as "Active".
I will revert it and reopen.

This reverts commit b327e7d.

This reverts commit b327e7d. Signed-off-by: Marcos Iglesias <[email protected]>

Golodhros · 2020-12-03T04:06:53Z

Sorry @crazy-2020, I have reverted the commit but I don't know how to restore this PR keeping the comments. I guess you would need to create a new PR. Sorry for the inconvenience!

xuan616 · 2020-12-03T17:54:38Z

Sorry @crazy-2020, I have reverted the commit but I don't know how to restore this PR keeping the comments. I guess you would need to create a new PR. Sorry for the inconvenience!

Np, I can add a new one

* Initial proposal to support orm in databuilder * Add owner and usage in the schema section * Add dashboard and badge in the schema section * Fix some typos Signed-off-by: Allison Suarez Miranda <[email protected]>

This reverts commit b327e7d. Signed-off-by: Marcos Iglesias <[email protected]> Signed-off-by: Allison Suarez Miranda <[email protected]>

* Initial proposal to support orm in databuilder * Add owner and usage in the schema section * Add dashboard and badge in the schema section * Fix some typos Signed-off-by: Allison Suarez Miranda <[email protected]>

This reverts commit b327e7d. Signed-off-by: Marcos Iglesias <[email protected]> Signed-off-by: Allison Suarez Miranda <[email protected]>

Initial proposal to support orm in databuilder

ea0c3bc

xuan616 requested a review from a team as a code owner October 26, 2020 17:02

Golodhros reviewed Oct 27, 2020

View reviewed changes

xuan616 added 2 commits November 2, 2020 16:56

Add owner and usage in the schema section

d1a229c

Add dashboard and badge in the schema section

36c2854

criccomini reviewed Nov 6, 2020

View reviewed changes

xuan616 changed the title ~~Initial proposal to support orm in databuilder~~ Support orm in databuilder Nov 6, 2020

Fix some typos

8ccfc40

feng-tao reviewed Nov 7, 2020

View reviewed changes

Golodhros added Status: Draft Status: Pending and removed Status: Draft labels Nov 12, 2020

feng-tao added Status: Active and removed Status: Pending labels Dec 3, 2020

feng-tao mentioned this pull request Dec 3, 2020

docs: Updated Roadmap document amundsen-io/amundsen#827

Merged

3 tasks

Golodhros approved these changes Dec 3, 2020

View reviewed changes

Golodhros merged commit b327e7d into amundsen-io:master Dec 3, 2020

Golodhros pushed a commit that referenced this pull request Dec 3, 2020

Revert "Support orm in databuilder (#10)"

9b94a98

This reverts commit b327e7d.

Golodhros pushed a commit that referenced this pull request Dec 3, 2020

Revert "Support orm in databuilder (#10)"

f1633aa

This reverts commit b327e7d. Signed-off-by: Marcos Iglesias <[email protected]>

Golodhros added a commit that referenced this pull request Dec 3, 2020

Revert "Support orm in databuilder (#10)" (#16)

660fcc4

This reverts commit b327e7d. Signed-off-by: Marcos Iglesias <[email protected]>

allisonsuarez pushed a commit that referenced this pull request Feb 23, 2021

Revert "Support orm in databuilder (#10)" (#16)

df3f135

This reverts commit b327e7d. Signed-off-by: Marcos Iglesias <[email protected]> Signed-off-by: Allison Suarez Miranda <[email protected]>

allisonsuarez pushed a commit that referenced this pull request May 5, 2021

Revert "Support orm in databuilder (#10)" (#16)

b490c47

This reverts commit b327e7d. Signed-off-by: Marcos Iglesias <[email protected]> Signed-off-by: Allison Suarez Miranda <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support orm in databuilder #10

Support orm in databuilder #10

xuan616 commented Oct 26, 2020

Golodhros Oct 27, 2020

feng-tao commented Nov 4, 2020 •

edited

Loading

xuan616 commented Nov 4, 2020

criccomini left a comment

criccomini Nov 4, 2020

feng-tao left a comment

feng-tao Nov 7, 2020

feng-tao Nov 7, 2020

feng-tao Nov 7, 2020

xuan616 Nov 9, 2020 •

edited

Loading

feng-tao Nov 10, 2020 •

edited

Loading

feng-tao Nov 7, 2020

feng-tao Nov 7, 2020

xuan616 Nov 9, 2020

feng-tao Nov 7, 2020

feng-tao commented Nov 7, 2020

feng-tao commented Nov 7, 2020

xuan616 commented Nov 9, 2020

feng-tao commented Nov 10, 2020

feng-tao commented Nov 10, 2020 •

edited

Loading

feng-tao commented Nov 11, 2020

Golodhros left a comment

dorianj commented Dec 3, 2020

feng-tao commented Dec 3, 2020 •

edited

Loading

xuan616 commented Dec 3, 2020

Golodhros commented Dec 3, 2020

Golodhros commented Dec 3, 2020

xuan616 commented Dec 3, 2020

	- RFC PR: [amundsen-io/rfcs#0000](https://github.com/amundsen-io/rfcs/pull/0000) (after opening the RFC PR, update this with a link to it and update the file name)
	- RFC PR: [amundsen-io/rfcs#0010](https://github.com/amundsen-io/rfcs/pull/10)


		- metric(metric_key, metric_name, metric_description_key, metric_description, metric_type_key, metric_type_name, dashboard_key, dashboard_name)

		- table(table_key, database_key, database_name, cluster_key, cluster_name, schema_key, schema_name, schema_description_key, schema_description, schema_programmatic_description_key, schema_programmatic_description, table_name, table_is_view, table_description_key, table_description, table_programmatic_description_key, table_programmatic_description, last_updated_key, last_updated_timestamp, last_updated_timestamp_name,


		- tag(tag_key, tag_type, entity_key)

		- lineage(upstream_key, downstream_key)


		- badge(badge_key, badge_category, entity_key)

		- Due to the reuse of graph db models, add new transformers for entities to convert 'node' and 'relationship' data to 'record' with model translators.

Support orm in databuilder #10

Support orm in databuilder #10

Conversation

xuan616 commented Oct 26, 2020

Choose a reason for hiding this comment

feng-tao commented Nov 4, 2020 • edited Loading

xuan616 commented Nov 4, 2020

criccomini left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

feng-tao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xuan616 Nov 9, 2020 • edited Loading

Choose a reason for hiding this comment

feng-tao Nov 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

feng-tao commented Nov 7, 2020

feng-tao commented Nov 7, 2020

xuan616 commented Nov 9, 2020

feng-tao commented Nov 10, 2020

feng-tao commented Nov 10, 2020 • edited Loading

feng-tao commented Nov 11, 2020

Golodhros left a comment

Choose a reason for hiding this comment

dorianj commented Dec 3, 2020

feng-tao commented Dec 3, 2020 • edited Loading

xuan616 commented Dec 3, 2020

Golodhros commented Dec 3, 2020

Golodhros commented Dec 3, 2020

xuan616 commented Dec 3, 2020

feng-tao commented Nov 4, 2020 •

edited

Loading

xuan616 Nov 9, 2020 •

edited

Loading

feng-tao Nov 10, 2020 •

edited

Loading

feng-tao commented Nov 10, 2020 •

edited

Loading

feng-tao commented Dec 3, 2020 •

edited

Loading