-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite how dag to dataset / dataset alias are stored #41987
Rewrite how dag to dataset / dataset alias are stored #41987
Conversation
7f1c45e
to
40f2e0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cautious approve :)
Doesn’t airflow/airflow/datasets/__init__.py Lines 248 to 251 in cb7697f
|
Yep, but |
I was thinking it’s weird we’re using DatasetModel here… (DatasetManager actually converts them back to Dataset for a user hook too) I was having problems with this function when adding |
I think the main reason is that we need to check the stored dataset models. Do you want me to continue working on this one? Or wait for your PR? Without it, some edges cases might fail |
Hmm, wait, this still does not make sense even considering we’re using DatasetModel. It’s probably best to come up with a test case here first to identify the actual issue. I don’t think calling |
This is the DAG that fails and yep, I think the solution is incorrect and that's why I convert it to draft again |
I think I found the root cause. will push in short |
40f2e0a
to
2c7c51f
Compare
2c7c51f
to
8adc2c3
Compare
This was due to some dataset/dataset alias subclass might not be hashable. I rewrote how the dag ref was handled. |
If this is already passing, maybe we can also try to merge it first and do the related changes in a different PR. Do you think this is ready to be merged? |
Yep, I think this one is ready to be merged |
…s, we should use sanitized uri instead
…es in bulk_write_to_db some dataset model variables were named as datasets which could be confusing
it used to save the whole object in a set, but in some edge cases, a customized dataset might not be hashasble and thus will fail
8adc2c3
to
87b5ad1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit (actually two but on the same topic), looks good to me.
Meh, I’m going to just merge this since I have plans to refactor the entire function anyway. I can change the thing I want when that happens. |
Thanks! @uranusjr I think this fix is needed for 2.10 so I'm going to backport now |
Why
If someone subclasses a dataset or a dataset alias and makes it unhashable. The dag ref will not be able to be handled.
What
Save the data needed as
(type, URI or name)
format in dag ref^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.