Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-84 Refactor Handling of Insert Duplicates #44322

Conversation

jason810496
Copy link
Contributor

@jason810496 jason810496 commented Nov 24, 2024

related: #44121 (comment)

It seems that only the Post Connection endpoint has an extra query to handle duplicate entries. Other 409_CONFLICT cases don't handle inserting duplicate entries on unique constraint columns.

@jason810496
Copy link
Contributor Author

jason810496 commented Nov 27, 2024

In summary, this PR will only add orig and statement to the response error details.

Regexp can be not super reliable, also it will depends on the error message of the database (which can change between releases and versions).

I will still explore the regex or string-matching approach in a follow-up PR.
From my perspective, database error messages do not change frequently ( also we won't update the database driver version very often I think ). Even if the error message changes after upgrading the database driver, fixing it in one global handler would not be a significant issue.

@jason810496 jason810496 marked this pull request as draft November 29, 2024 07:10
@jason810496 jason810496 force-pushed the feature/AIP-84/refactor-database-unique-constraint branch from da86894 to e7c68b3 Compare December 7, 2024 17:31
@jason810496 jason810496 marked this pull request as ready for review December 8, 2024 08:45
@jason810496 jason810496 changed the title AIP-84 Refactor Handling of Insert Duplicates for Post Connection AIP-84 Refactor Handling of Insert Duplicates Dec 8, 2024
@jason810496
Copy link
Contributor Author

jason810496 commented Dec 8, 2024

Hi @pierrejeambrun, @ephraimbuddy,

I’ve refactored the _UniqueConstraintErrorHandler to provide unified error details across all database dialects by parsing errors with re2. If the parsing fails, it defaults to returning the original database error.

I’ve also added tests for the unique constraint handler, covering the following cases:

  • Inserting a duplicate entry for a single-column unique constraint (e.g., Pool, Variable).
  • Inserting a duplicate entry for a multi-column unique constraint (e.g., dag_id and run_id pair in DagRun).
  • Handling cases where parsing the database error message fails.

Looking forward to your feedback and suggestions!

@jason810496 jason810496 force-pushed the feature/AIP-84/refactor-database-unique-constraint branch from 1498fc1 to 99de599 Compare December 12, 2024 16:19
@jason810496
Copy link
Contributor Author

jason810496 commented Dec 12, 2024

Rebase to latest main, looking forward to feedback.
cc @pierrejeambrun , @ephraimbuddy

Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a lot of hardcoded values, regexp and manual string manipulation etc.
This code will likely break when the db version update the message or when a column name / values contains special character that we did not account for. The parsing could still fail.

Also that's a lot of code to maintain, and it's not super welcoming.

I'm not in favour of this change.

I still think that at first just returning the plain orig and statement is enough for a user to understand. And for tests we just assert that those keys are present in the response.

airflow/api_fastapi/common/exceptions.py Outdated Show resolved Hide resolved
@kaxil
Copy link
Member

kaxil commented Dec 13, 2024

I think there are a lot of hardcoded values, regexp and manual string manipulation etc. This code will likely break when the db version update the message or when a column name / values contains special character that we did not account for. The parsing could still fail.

Also that's a lot of code to maintain, and it's not super welcoming.

I'm not in favour of this change.

I still think that at first just returning the plain orig and statement is enough for a user to understand. And for tests we just assert that those keys are present in the response.

Yup, I agree, it looks too brittle unfortunately.

Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the effort @jason810496 in having these error messages parsed.

I don't think that's the best approach and I wouldn't invest more time trying to make this more reliable because it's very unlikely we do.

The best we can do at this point is to return the plain orig and maybe also statement in the response details. This will enable power users to actually understand why the resquest is conflicting. (parameters fields should not be exposed in the response)

@jason810496 jason810496 force-pushed the feature/AIP-84/refactor-database-unique-constraint branch from 99de599 to 2880fe0 Compare December 14, 2024 07:18
@jason810496
Copy link
Contributor Author

jason810496 commented Dec 14, 2024

Thank you for the feedback, @pierrejeambrun and @kaxil!
I realize I overlooked the maintenance effort required for directly parsing errors from different dialects.
( Hope we’ll see a PEP someday that standardizes the error details across various DBAPI drivers, then we could adopt the parsing solution. )

Just refactored the implementation to include only the statement and orig in the error response.

Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@pierrejeambrun pierrejeambrun merged commit b3a384e into apache:main Dec 16, 2024
49 checks passed
LefterisXefteris pushed a commit to LefterisXefteris/airflow that referenced this pull request Jan 5, 2025
* Refactor PostConnection handling insert duplicate

* Add parsing error detail to error handler with test

* Add parsing multiple columns case in unique constraint

* Add multiple columns, parsing error cases in test

* Fix public API already existed test cases

* Refactor: remove parsing, return statement and orig

* Refactor test_exceptions with statement and orign

* Fix unique constriant violation response in api test
got686-yandex pushed a commit to got686-yandex/airflow that referenced this pull request Jan 30, 2025
* Refactor PostConnection handling insert duplicate

* Add parsing error detail to error handler with test

* Add parsing multiple columns case in unique constraint

* Add multiple columns, parsing error cases in test

* Fix public API already existed test cases

* Refactor: remove parsing, return statement and orig

* Refactor test_exceptions with statement and orign

* Fix unique constriant violation response in api test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants