Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orms: support for Apache Airflow #32537

Open
miry opened this issue Nov 21, 2018 · 13 comments
Open

orms: support for Apache Airflow #32537

miry opened this issue Nov 21, 2018 · 13 comments
Labels
C-investigation Further steps needed to qualify. C-label will change. meta-issue Contains a list of several other issues. O-community Originated from the community T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@miry
Copy link

miry commented Nov 21, 2018

I tested python cockroachdb version 2.1 with Airflow.

Collecting cockroachdb
  Downloading https://files.pythonhosted.org/packages/1c/7d/063bd5cc3ffe13561ded9eddbe736d795e984a7a9dfe5d0eca0986134f6e/cockroachdb-0.2.1.tar.gz

It fails on migration: https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/2e541a1dcfed_task_duration.py

With exception: sqlalchemy construct has no default compilation handler.

Jira issue: CRDB-4737

@rytaft rytaft added C-investigation Further steps needed to qualify. C-label will change. O-community Originated from the community labels Nov 26, 2018
@rytaft
Copy link
Collaborator

rytaft commented Nov 26, 2018

Hi @miry sorry to hear you are having issues with the CockroachDB SQLAlchemy integration. Can you please give a detailed list of the steps you took to attempt the migration that caused the error message above?

@miry
Copy link
Author

miry commented Nov 26, 2018

@rytaft thanks for response.

I followed this documentation with small modifications:

  1. Updated Dockerfile to install cockroachdb via pip
  2. Installed cockroachdb via Helm: helm install --name db stable/cockroachdb
  3. Updated https://github.com/apache/incubator-airflow/blob/master/scripts/ci/kubernetes/kube/secrets.yaml to cockroachdb://airflow@db-cockroachdb-public:26257/airflow
  4. Created Kube job resource to create User airflow, DB airflow and Grant permissions
  5. Then manual created Kube resource for airflow one by one base on https://github.com/apache/incubator-airflow/blob/master/scripts/ci/kubernetes/kube/deploy.sh

After I checked logs for Init container: https://github.com/apache/incubator-airflow/blob/master/scripts/ci/kubernetes/kube/airflow.yaml#L66

@knz knz changed the title SQLAlchemy integration orms: SQLAlchemy integration with Airflow Nov 26, 2018
@knz knz added the meta-issue Contains a list of several other issues. label Nov 26, 2018
@knz knz changed the title orms: SQLAlchemy integration with Airflow orms: support for Apache Airflow Nov 26, 2018
@knz
Copy link
Contributor

knz commented Nov 26, 2018

@miry thanks for the extra details.

  1. have you checked you installed Airflow with the postgres extension? This is not enabled by default. By default, Airflow is configured to operate with MySQL semantics, and would become confused with a postgres-like database.

Before we look at this it would be best if you could ensure that your Airflow setup works with a dummy PostgreSQL database, to verify the issue is indeed with CockroachDB.

  1. can you collect the SQL sent to CockroachDB during the airflow initialization. For this you would run set cluster setting sql.trace.log_statement_execute = true once after cluster initialization, run your Airflow test, then collect the cockroach-exec logs in the data directory.

Thanks in advance.

@miry
Copy link
Author

miry commented Nov 26, 2018

@rytaft It works nice with Postgresql.

@miry
Copy link
Author

miry commented Nov 26, 2018

@rytaft Thanks for the tip. I will try to collect more logs.

can you collect the SQL sent to CockroachDB during the airflow initialization. For this you would run set cluster setting sql.trace.log_statement_execute = true once after cluster initialization, run your Airflow test, then collect the cockroach-exec logs in the data directory.

@jasonmay
Copy link

I encountered this exact scenario.

I have airflow fully running with postgres already with docker-compose. I added cockroachdb/cockroach in Docker Hub and linked to that with SQLAlchemy. It seems to run some statements but not others in its alembic migration path.

Attached is an exec log. What I did in cockroach was enable the exec trace, then I did

DROP DATABASE airflow CASCADE;
CREATE DATABASE airflow;

And then in my airflow-webserver container, I did airflow initdb. These logs are the result.

cockroach-airflow.log

@jordanlewis
Copy link
Member

I made a little more progress with Airflow, after the work in #38318. The next thing we don't support is an ALTER TABLE ALTER COLUMN TYPE from int to float. I'm not sure if this is possible for us to support without actually writing a datatype migration schema change, but it's possible that that migration runs in Airflow before data is available, in which case we could do it.

@jordanlewis
Copy link
Member

I manually changed ALTER TABLE ALTER COLUMN TYPE to permit this type change, to see what else would break, but after that change, airflow initdb runs correctly.

@jasonmay what are some other good smoke tests I could run against airflow?

@jasonmay
Copy link

@jordanlewis airflow list_dags is a good one. That will populate job info in the db. There should also be an example DAG provided. I don't remember the name of the dag ID, but I think you should be able to do: airflow run $DAG_ID $(date +%Y-%m-%d) to have it populate dag_runs.

@jasonmay
Copy link

Also airflow resetdb for deletes/drops perhaps?

@vlizanae
Copy link

I don't know if this was recently introduced or not but I can't create_user. I did the manual table alteration in order to run the initdb command and it ran successfully, but apparently primary keys for most of the tables are not SERIAL, so when I run create_user it just complains a lot about missing "id" primary key column.

@vlizanae
Copy link

Maybe it should be on sqlAlchemy's side to change AUTO INCREMENTAL for SERIAL?

@rafiss rafiss added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label May 12, 2021
@fire
Copy link

fire commented Jul 20, 2023

Is this Apache Airflow issue stale?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-investigation Further steps needed to qualify. C-label will change. meta-issue Contains a list of several other issues. O-community Originated from the community T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests

8 participants