Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Models] [Postgres] Check if the dynamically-added index is in the table schema before adding #32731

Merged
merged 5 commits into from
Aug 14, 2023

Conversation

ohaibbq
Copy link
Contributor

@ohaibbq ohaibbq commented Jul 20, 2023

We are utilizing the new dag.test() functionality with a Postgres-backed SQLAlchemy engine/session. We use the airflow.utils.initdb() and airflow.utils.resetdb() functions for test case setup/teardown.

When the schema is created for the second time, these event listeners add duplicate indexes to the SQLAlchemy Table definition, which causes the metadata.create_all() function to fail with a DatabaseError as it issues multiple CREATE UNIQUE INDEX statements for idx_ab_user_username and idx_ab_register_user_username.

This updates the listeners to check if the indexes have already been added before adding them to the Table definition.

@boring-cyborg boring-cyborg bot added the area:webserver Webserver related Issues label Jul 20, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 20, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

@ohaibbq ohaibbq requested a review from uranusjr July 21, 2023 16:30
@ohaibbq
Copy link
Contributor Author

ohaibbq commented Jul 21, 2023

Hi @uranusjr, thanks for the review!

I wanted to add some context to this error while I have some eyes on the issue. We're manually configuring the engine in test via AIRFLOW__DATABASE__SQL_ALCHEMY_CONN and settings.initialize().

This step seems superfluous as dag.test is supposed to accept a session (#29803), however, I was running into an error where a TaskInstance was not found since the session provided to TaskInstance._run_raw_task was not used in get_template_context. Is this a known issue?

https://github.com/apache/airflow/blob/main/airflow/models/taskinstance.py#L1499

@uranusjr
Copy link
Member

Please fix the static check failures. (I think other failures are unrelated)

@eladkal
Copy link
Contributor

eladkal commented Aug 7, 2023

@ohaibbq can you resolve conflicts and rebase?

@eladkal eladkal added this to the Airflow 2.7.1 milestone Aug 7, 2023
@eladkal eladkal added the type:bug-fix Changelog: Bug Fixes label Aug 7, 2023
@ohaibbq ohaibbq force-pushed the dan/table-index-check branch from 66b95d3 to 0246e74 Compare August 14, 2023 05:19
@ohaibbq ohaibbq force-pushed the dan/table-index-check branch from 7f8fba3 to 597d0b8 Compare August 14, 2023 06:50
@potiuk potiuk merged commit 2950fd7 into apache:main Aug 14, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Aug 14, 2023

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

ferruzzi pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Aug 17, 2023
…ble schema before adding (apache#32731)

* Check if the index is in the table schema before adding

* add pre-condition assertion

* static checks

* Update test_models.py

* integrate upstream auth manager changes
ephraimbuddy pushed a commit that referenced this pull request Aug 28, 2023
…ble schema before adding (#32731)

* Check if the index is in the table schema before adding

* add pre-condition assertion

* static checks

* Update test_models.py

* integrate upstream auth manager changes

(cherry picked from commit 2950fd7)
helperbot-recidiviz pushed a commit to Recidiviz/pulse-data that referenced this pull request May 29, 2024
…iviz/recidiviz-data#29712)

## Description of the change

Recidiviz/recidiviz-data#27852 upgraded to `composer-2.6.2-airflow-2.6.3` which failed during
the tf step and was reverted in Recidiviz/recidiviz-data#29696

The commits in this PR are as follows --
1. un-reverts Recidiviz/recidiviz-data#27852
2. upgrades to `composer-2.7.1-airflow-2.7.3`, copying requirements over
from [version
page](https://cloud.google.com/composer/docs/concepts/versioning/composer-versions#images)
3.  airflow pipenv lock workflow
4. removes a few imports and checks since
[Recidiviz/recidiviz-data#32731](apache/airflow#32731) opened by our
very own @ohaibbq is included in 2.7.1

read through the [release
notes](https://airflow.apache.org/docs/apache-airflow/stable/release_notes.html#airflow-2-7-0-2023-08-18),
nothing in our usage seems to be deprecated; there are a few things that
might be of interest to folks:
- _Create metrics to track Scheduled->Queued->Running task state
transition times
([Recidiviz/recidiviz-data#30612](apache/airflow#30612 will be nice
for us to be able to see this re: k8s spin up time
- _Add OpenTelemetry to Airflow
([AIP-49](https://github.com/apache/airflow/pulls?q=is%3Apr+is%3Amerged+label%3AAIP-49+milestone%3A%22Airflow+2.7.0%22))_
- _The trigger UI form is skipped in web UI if no parameters are defined
in a DAG ([Recidiviz/recidiviz-data#33351](apache/airflow#33351: not
sure if folks run ad-hoc dags using the UI instead of
`trigger_state_specific_calculation/ingest_dag` but we need to add
`show_trigger_form_if_no_params` if we want to still be able to trigger
via the UI

would love a second set of eyes around these warning logs that occur
after tests run, see
[here](https://github.com/Recidiviz/recidiviz-data/actions/runs/9071731702/job/24926024866?pr=29712#step:6:53).
also occurred locally but i couldn't isolate the cause of the issue
```
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.11.8/x64/lib/python3.11/logging/__init__.py", line 1113, in emit
    stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
  File "/home/runner/.local/share/virtualenvs/airflow-jCIRjM3K/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 509, in <lambda>
    and _finalize_fairy(
  File "/home/runner/.local/share/virtualenvs/airflow-jCIRjM3K/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 791, in _finalize_fairy
    pool.logger.error(
Message: 'Exception during reset or similar'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
  File "/home/runner/.local/share/virtualenvs/airflow-jCIRjM3K/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 763, in _finalize_fairy
    fairy._reset(pool, transaction_was_reset)
  File "/home/runner/.local/share/virtualenvs/airflow-jCIRjM3K/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 1038, in _reset
    pool._dialect.do_rollback(self)
  File "/home/runner/.local/share/virtualenvs/airflow-jCIRjM3K/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 683, in do_rollback
    dbapi_connection.rollback()
psycopg2.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.11.8/x64/lib/python3.11/logging/__init__.py", line 1113, in emit
    stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
  File "/home/runner/.local/share/virtualenvs/airflow-jCIRjM3K/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 509, in <lambda>
    and _finalize_fairy(
  File "/home/runner/.local/share/virtualenvs/airflow-jCIRjM3K/lib/python3.11/site-packages/sqlalchemy/pool/base.py", line 791, in _finalize_fairy
    pool.logger.error(
Message: 'Exception during reset or similar'
Arguments: ()
```

## Type of change

> All pull requests must have at least one of the following labels
applied (otherwise the PR will fail):

| Label | Description |
|-----------------------------
|-----------------------------------------------------------------------------------------------------------
|
| Type: Bug | non-breaking change that fixes an issue |
| Type: Feature | non-breaking change that adds functionality |
| Type: Breaking Change | fix or feature that would cause existing
functionality to not work as expected |
| Type: Non-breaking refactor | change addresses some tech debt item or
prepares for a later change, but does not change functionality |
| Type: Configuration Change | adjusts configuration to achieve some end
related to functionality, development, performance, or security |
| Type: Dependency Upgrade | upgrades a project dependency - these
changes are not included in release notes |

## Related issues

Closes Recidiviz/recidiviz-data#29554

## Checklists

### Development

**This box MUST be checked by the submitter prior to merging**:
- [x] **Double- and triple-checked that there is no Personally
Identifiable Information (PII) being mistakenly added in this pull
request**

These boxes should be checked by the submitter prior to merging:
- [ ] Tests have been written to cover the code changed/added as part of
this pull request

### Code review

These boxes should be checked by reviewers prior to merging:

- [x] This pull request has a descriptive title and information useful
to a reviewer
- [x] Potential security implications or infrastructural changes have
been considered, if relevant

---------

Co-authored-by: Helper Bot <[email protected]>
GitOrigin-RevId: 355d050265f490df1779ff773533f5aefd41de6b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:webserver Webserver related Issues type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants