-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Webserver fails to pull secrets from Hashicorp Vault on start up #26185
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
@notatallshaw-work @Wilder60 I don't use approle, but this is mysterious. Are there any other clues you can provide? All other configuration is at default values? Do you get a stack trace when the workers crash? |
Yes we do get a stack trace it's a Here is the full exception log (I've had to sanitize some of the values): https://gist.github.com/notatallshaw-work/6c1a77b4341caa653a231190bd64cdaa Running just 1 worker or reverting the PR both fixes this exception. |
Yeah, I don't really understand how/why that might be happening, but that's clearly a problem.. It's possible that this change is too fundamental to be safe for all configurations and we should just get rid of this caching and deal with the multiple logins. @potiuk - what do you think? Should I just revert this caching change altogether? |
I think so. It has proven to be more problematic than we both thought and the benefit is limited, I think that using external caching mechanism from the respective secrets providers is a much more robust way. We should treat it as a failed experiment I think. |
Okay, just opened #26223 to remove this caching. |
Merged. I think good learning for everyone :) |
FYI I wonder if the behavior would change with this PR: #27297 With each gunicorn worker preloaded perhaps the secrets would be sent back to the correct worker? |
Worth checking. |
Apache Airflow version
2.3.4
What happened
Since upgrading to Airflow 2.3.4 our webserver fails on start up to pull secrets from our Vault instance. Setting AIRFLOW__WEBSERVER_WORKERS = 1 allowed the webserver to start up successfully, but reverting the change added here https://github.com/apache/airflow/pull/25556 was the only way we found to fix the issue without adjusting the webserver's worker count.
What you think should happen instead
The airflow webserver should be able to successfully read from Vault with AIRFLOW__WEBSERVERS__WORKERS > 1.
How to reproduce
Star a Webserver instance set to authenticate with Vault using the approle method and AIRFLOW__DATABASE__SQL_ALCHEMY_CONN_SECRET and AIRFLOW__WEBSERVER__SECRET_KEY_SECRET set. The webserver should fail to initialize all of the gunicorn workers and exit.
Operating System
Fedora 29
Versions of Apache Airflow Providers
apache-airflow-providers-hashicorp==3.1.0
Deployment
Docker-Compose
Deployment details
Python 3.9.13
Vault 1.9.4
Anything else
None
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: