-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spike] Explore integration with Dagster #3180
Comments
I'm not clear what the ticket here is for. Is this documentation along the lines of #2817 ? |
I think it involves more of spike to work out how it would actually work. I think Flyte (LFAI), Dagster and Metaflow all fall into the modern orchestrator space which isn't served by Kedro. I also would push we address some of the fundamentals outlined in #3094 before doing this. |
Thanks! But in that case, it's not a docs ticket so I'll remove the label. |
Thanks both - yeah initially I thought about it as a docs ticket (even though the phrasing didn't match) but you're right, this should be a spike first. And good point @datajoely on looking at Flyte and Metaflow too (let's call them Tier 3), although both have 0.1x times the PyPI downloads of Dagster, so I wouldn't consider them on the same level of adoption. For reference, Dagster and Prefect (Tier 2) have about the same number of downloads, and both have 0.05x times Airflow (Tier 1). Kedro lies between Tier 2 and 3 at the moment. |
Aligned - I also think Dagster is closer to Kedro than the others in terms of granularity. In recent years they've really invested in their dbt integration and perhaps we can take inspiration in how they've done that. |
I never explored Dagster as much as I should have, I really like the idea of software defined assets. However, Dagster looks complicated as it has many concepts to understand. Also not sure on how individual task run (especially in a Kubernetes context). |
@gtauzin experimenting with Kedro & Dagster! https://github.com/gtauzin/kedro-spaceflights-dagster |
Hey! Thanks @astrojuanlu for pinging me, it's nice to see some interest for a dagster integration! It seems to me kedro and dagster are nicely complementary:
I also feel as @MatthiasRoels that dagster has a lot of concepts. Each of them separately is not necessarily complex, but the way their relate to each other is not always clear to me from the documentation (and the chatbot in there has confused me more than anything else so far). For example, there are several way of mapping kedro to dagster because dagster has many concepts around generic tasks:
In practice, to map kedro nodes, I believe multi assets would make sense even in the case of a node that does not have any outputs (and therefore does not define any assets). This is because ops are second-rate citizens in dagster: they do not even appear on their DAG visualization (the global asset lineage) on the UI, but are presented in a form a of a list lost somewhere in a menu. In the case of the spaceflights example, the last node, "evaluate_model_node" does not have any outputs. Defining it as a multi_asset with a corresponding asset that is intangible allows to have it as a part of the asset DAG: This small projects is a way for me to deepen my understanding of both kedro and dagster and this is also something I am planning on using for work in the near future. So if you're interested or are also looking into it, don't hesistate to ping me on the kedro slack, I'd be happy to discuss it more. |
Super cool work! |
Description
I have heard from several data people that they're happy with Dagster, which is probably the only "modern", widely used orchestrator that is not mentioned in our docs.
There was a request upstream to add Kedro integration to Dagster dagster-io/dagster#2062 but it's unclear what finally happened.
The text was updated successfully, but these errors were encountered: