-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upgrading from 2.2.3 or 2.2.5 to 2.3.2 fails on migration-job #24526
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
Is this somethign similar to MariaDB #24247 but for RDS ? @ephraimbuddy @dstandish - we seem to have som consistent problems with missing indexes |
We ran into the same issue when upgrading from 2.2.5 to 2.3.2. What happened Database: MySQL 8.0.19
|
based on this line:
@ephraimbuddy @uranusjr @potiuk could this be due to collation incompatibility? i.e. at some point in past it seems we changed to use |
|
Hi @miimsam, yesterday I tested migration with changed -- 1. change charset and collate
ALTER TABLE task_instance MODIFY task_id VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin;
-- 2. drop table session
drop table session;
-- 3. drop index
drop index idx_job_dag_id on job;
-- 4. drop column
ALTER TABLE dag drop COLUMN timetable_description;
-- 5. drop table log_template
ALTER TABLE dag_run DROP FOREIGN KEY task_instance_log_template_id_fkey;
DROP TABLE log_template;
... I also have an older Backup before the the failed migration Job, maybe I should try this if the list of manual steps growth. |
Hey Good people here @milimsam @flrn77 ..... Is this possible to make a PR to https://airflow.apache.org/docs/apache-airflow/stable/installation/upgrading.html#migration-best-practices and describe those issues in a new "Troubleshooting MySQL Exceptions at upgrade" chapter (for example)? It would be a great contribution and giving back to the community - I think together with @dstandish we will be super happy to help with review of that and revise it and guide in the process, but it would be super, super helpful if we can have such a documentation. And starting it is easy - you just click "Suggest a change on this page" and it will open a Pull Request, where you are able to add such documentation via GitHub UI. I think, personally it would be best if you - as users who experienced it - describe it in the way that you would find helpful if you looked for a solution to a similar problem. Simply we - as maintainers - know the system too well and we have many assumptions in our head that our users don't - and maybe we don't even know where the users like would look for similar problem/issue and how phrase the "solution" steps to make it easy to follow. Would you like to become one of more than 2100 contributors to Airlfow ? This is an easy (ans super-helpful to others) way! |
my migration failed as well, same setup, RDS MySql 8.0.20, upgrading from 2.2.5 --> 2.3.2, failed on
and we are currently stuck, trying to recover this step manually :( |
Hey @potiuk, I would like to do that if I could fix it manually. |
I don't want to mix things here, not sure if that's a similar case - but we managed to overcome the issue by taking the scheduler down and running the following:
Then we start the scheduler and it pass this migration step and all the following. |
@itayB I think you had a different issue, the log you provided says |
Hello, @miimsam have a look at my modified sql dry-run code and into my pull request 24926 you should ba able fix it manual. UPDATE alembic_version SET version_num='e655c0453f75' WHERE alembic_version.version_num = 'f9da662e7089';
-- Running upgrade e655c0453f75 -> a3bcd0914482
ALTER TABLE serialized_dag MODIFY data JSON NULL;
ALTER TABLE serialized_dag ADD COLUMN data_compressed BLOB;
UPDATE alembic_version SET version_num='a3bcd0914482' WHERE alembic_version.version_num = 'e655c0453f75';
-- Running upgrade a3bcd0914482 -> c306b5b5ae4a
CREATE TABLE __airflow_tmp_xcom (
dag_run_id INTEGER NOT NULL,
task_id VARCHAR(250) COLLATE utf8mb3_bin NOT NULL,
`key` VARCHAR(512) COLLATE utf8mb3_bin NOT NULL,
value BLOB,
timestamp TIMESTAMP(6) NOT NULL,
dag_id VARCHAR(250) COLLATE utf8mb3_bin NOT NULL,
run_id VARCHAR(250) COLLATE utf8mb3_bin NOT NULL,
map_index INTEGER NOT NULL DEFAULT '-1'
);
/* it took 5min */
INSERT INTO __airflow_tmp_xcom SELECT dag_run.id, xcom.task_id, xcom.`key`, xcom.value, xcom.timestamp, xcom.dag_id, dag_run.run_id, -1
FROM xcom INNER JOIN dag_run ON xcom.dag_id = dag_run.dag_id AND xcom.execution_date = dag_run.execution_date;
DROP TABLE xcom;
ALTER TABLE __airflow_tmp_xcom RENAME TO xcom;
/* don't know why but had to delete duplicates */
create table tmp_xcom as select * from xcom limit 1;
truncate table tmp_xcom;
ALTER TABLE tmp_xcom ADD CONSTRAINT xcom_pkey PRIMARY KEY (dag_run_id, task_id, map_index, `key`);
INSERT IGNORE INTO tmp_xcom SELECT * FROM xcom;
DROP TABLE xcom;
RENAME TABLE tmp_xcom TO xcom;
/* done on tmp_xcom before renaming */
-- ALTER TABLE xcom ADD CONSTRAINT xcom_pkey PRIMARY KEY (dag_run_id, task_id, map_index, `key`);
CREATE INDEX idx_xcom_key ON xcom (`key`);
ALTER TABLE xcom ADD CONSTRAINT xcom_task_instance_fkey FOREIGN KEY(dag_id, task_id, run_id, map_index) REFERENCES task_instance (dag_id, task_id, run_id, map_index) ON DELETE CASCADE;
UPDATE alembic_version SET version_num='c306b5b5ae4a' WHERE alembic_version.version_num = 'a3bcd0914482';
-- Running upgrade c306b5b5ae4a -> c97c2ab6aa23
CREATE TABLE callback_request (
id INTEGER NOT NULL AUTO_INCREMENT,
created_at TIMESTAMP(6) NOT NULL,
priority_weight INTEGER NOT NULL,
callback_data JSON NOT NULL,
callback_type VARCHAR(20) NOT NULL,
dag_directory VARCHAR(1000),
PRIMARY KEY (id)
);
UPDATE alembic_version SET version_num='c97c2ab6aa23' WHERE alembic_version.version_num = 'c306b5b5ae4a';
-- Running upgrade c97c2ab6aa23 -> 4eaab2fe6582
ALTER TABLE rendered_task_instance_fields ADD COLUMN map_index INTEGER NOT NULL DEFAULT '-1';
ALTER TABLE rendered_task_instance_fields ADD COLUMN run_id VARCHAR(250) COLLATE utf8mb3_bin;
UPDATE rendered_task_instance_fields, dag_run SET rendered_task_instance_fields.run_id=dag_run.run_id WHERE dag_run.dag_id = rendered_task_instance_fields.dag_id AND dag_run.execution_date = rendered_task_instance_fields.execution_date;
ALTER TABLE rendered_task_instance_fields DROP PRIMARY KEY;
ALTER TABLE rendered_task_instance_fields MODIFY run_id VARCHAR(250) COLLATE utf8mb3_bin NOT NULL;
ALTER TABLE rendered_task_instance_fields DROP COLUMN execution_date;
ALTER TABLE rendered_task_instance_fields ADD CONSTRAINT rendered_task_instance_fields_pkey PRIMARY KEY (dag_id, task_id, run_id, map_index);
/* change cahrset caused by following error*/
ALTER TABLE rendered_task_instance_fields MODIFY task_id VARCHAR(255) CHARACTER SET utf8 COLLATE utf8mb3_bin;
ALTER TABLE rendered_task_instance_fields MODIFY dag_id VARCHAR(255) CHARACTER SET utf8 COLLATE utf8mb3_bin;
/* [Code: 3780, SQL State: HY000] Referencing column 'dag_id' and referenced column 'dag_id' in foreign key constraint 'rtif_ti_fkey' are incompatible. */
ALTER TABLE rendered_task_instance_fields ADD CONSTRAINT rtif_ti_fkey FOREIGN KEY(dag_id, task_id, run_id, map_index) REFERENCES task_instance (dag_id, task_id, run_id, map_index) ON DELETE CASCADE;
UPDATE alembic_version SET version_num='4eaab2fe6582' WHERE alembic_version.version_num = 'c97c2ab6aa23';
-- Running upgrade 4eaab2fe6582 -> 48925b2719cb
DROP INDEX idx_task_fail_dag_task_date ON task_fail;
ALTER TABLE task_fail ADD COLUMN map_index INTEGER NOT NULL DEFAULT '-1';
ALTER TABLE task_fail ADD COLUMN run_id VARCHAR(250) COLLATE utf8mb3_bin;
UPDATE task_fail, dag_run SET task_fail.run_id=dag_run.run_id WHERE dag_run.dag_id = task_fail.dag_id AND dag_run.execution_date = task_fail.execution_date;
ALTER TABLE task_fail MODIFY run_id VARCHAR(250) COLLATE utf8mb3_bin NOT NULL;
ALTER TABLE task_fail DROP COLUMN execution_date;
/* change cahrset caused by following error*/
ALTER TABLE task_fail MODIFY task_id VARCHAR(255) CHARACTER SET utf8 COLLATE utf8mb3_bin;
ALTER TABLE task_fail MODIFY dag_id VARCHAR(255) CHARACTER SET utf8 COLLATE utf8mb3_bin;
ALTER TABLE task_fail ADD CONSTRAINT task_fail_ti_fkey FOREIGN KEY(dag_id, task_id, run_id, map_index) REFERENCES task_instance (dag_id, task_id, run_id, map_index) ON DELETE CASCADE;
UPDATE alembic_version SET version_num='48925b2719cb' WHERE alembic_version.version_num = '4eaab2fe6582';
-- Running upgrade 48925b2719cb -> 909884dea523
ALTER TABLE ab_register_user MODIFY username VARCHAR(256) NOT NULL;
ALTER TABLE ab_register_user MODIFY email VARCHAR(256) NOT NULL;
ALTER TABLE ab_user MODIFY username VARCHAR(256) NOT NULL;
ALTER TABLE ab_user MODIFY email VARCHAR(256) NOT NULL;
UPDATE alembic_version SET version_num='909884dea523' WHERE alembic_version.version_num = '48925b2719cb';
-- Running upgrade 909884dea523 -> 75d5ed6c2b43
ALTER TABLE log ADD COLUMN map_index INTEGER;
UPDATE alembic_version SET version_num='75d5ed6c2b43' WHERE alembic_version.version_num = '909884dea523';
-- Running upgrade 75d5ed6c2b43 -> b1b348e02d07
UPDATE dag SET default_view='grid' WHERE dag.default_view = 'tree';
UPDATE alembic_version SET version_num='b1b348e02d07' WHERE alembic_version.version_num = '75d5ed6c2b43';
-- Running upgrade b1b348e02d07 -> 1de7bc13c950
CREATE INDEX idx_log_event ON log (event);
UPDATE alembic_version SET version_num='1de7bc13c950' WHERE alembic_version.version_num = 'b1b348e02d07';
-- Running upgrade 1de7bc13c950 -> 3c94c427fdf6
ALTER TABLE dag_tag DROP FOREIGN KEY dag_tag_ibfk_1;
ALTER TABLE dag_tag ADD CONSTRAINT dag_tag_dag_id_fkey FOREIGN KEY(dag_id) REFERENCES dag (dag_id) ON DELETE CASCADE;
UPDATE alembic_version SET version_num='3c94c427fdf6' WHERE alembic_version.version_num = '1de7bc13c950'; |
Hi, I managed to run migrations with some small manual changes to database before running -- remove foreign key temporarily as it is used by one of modified columns
ALTER TABLE task_reschedule DROP FOREIGN KEY task_reschedule_ti_fkey;
ALTER TABLE task_instance MODIFY task_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE task_reschedule MODIFY task_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE rendered_task_instance_fields MODIFY task_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE rendered_task_instance_fields MODIFY dag_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE task_fail MODIFY task_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE task_fail MODIFY dag_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
-- reintroduce the key (otherwise migrations should fail)
ALTER TABLE task_reschedule ADD CONSTRAINT task_reschedule_ti_fkey FOREIGN KEY (dag_id, task_id, run_id) REFERENCES task_instance (dag_id, task_id, run_id); |
Upgrading from 2.2.5 to 2.3.3 I'm running into these issues too. First I'm trying to fix the the
❓ Am I doing the right thing with the character set and collation? ❓ If anybody has a more convenient script/SQL command set to fix the collections that'd be awesome! After that I'm trying to do the |
Yes. But you should also do it for all colums that use different collation (the different CHARACTER seet for thos should be
Also I think the above comment sets another trap on you using utf8 and utf8_bin is very bad idea. MySQL 8 set a trap on you here, because if you migrate to the next version, the index will likely not increase but utf8 will become an alias to utf8mb4 (in MySQL 8 it is utf8mb3): https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8.html CHARSET/COLLATION for the "id" fields should be explicitly `utf8mb3' and collation 'utf8mb3_bin' NOT 'utf8', 'utf8_bin' The trap is a bit deeper, because is (as they warn) they remove utf8mb3 in the future then you will be locked witth MySQL8. My recommendation is as usual here: Switch to Postgres. |
Thanks, @potiuk from the documentation on Also thank you for clarifying the recommended CHARSET/COLLATION: I found the easiest way to fix this is to fix it right in my When applying the modified
First things first.. If there's a way to migrate the data I'll consider it once I recovered from todays incident.. |
Glad I could help.
PRs to the docs are most welcome. I think it would be great if after your experience you help others describing what you have done and update the docs. Airflow is created by > 2100 contributors - most of them users like you, and actually contributing docs to make them clearer and more helpful to others is one of the best contribution you can make - paying back by the 100% free software you use. In fact, you are probably the best person in the world now to describe it in the way that will be useful for the others, because you've JUST gone through it, remember all the steps you took and you can write it in the way that it will be understandable for users like you who would struggle with similar problem. And it's super easy. Look at the the documentation on sql_engine_collation_for_ids - at the bottom right there is "suggest a change on this page" button that will open up PR which you can edit directly there, in GitHub UI - it's an .rst doc that you can edit as easily as commenting in the issue here. I heartily encourage you to do so! |
@potiuk would you like me to update my previous suggestion with
|
You can do it - sure |
Depending on the way how and when you MySQL DB was created, it might have problems with the right CHARACTER SET and COLLATION used. This PR attempts to describe a process that users can follow to fix such a problem. Also a chapter was added to recommend taking a backup before the migration. Based on discussions and user input from apache#25866, apache#24526 Closes: apache#24526
Depending on the way how and when you MySQL DB was created, it might have problems with the right CHARACTER SET and COLLATION used. This PR attempts to describe a process that users can follow to fix such a problem. Also a chapter was added to recommend taking a backup before the migration. Based on discussions and user input from #25866, #24526 Closes: #24526
Also a chapter was added to recommend taking a backup before the migration. Based on discussions and user input from apache#25866, apache#24526 Closes: apache#24526 Improve cleanup of temporary files in CI (apache#25957) After recent change in Paralell execution, we start to have infrequent "no space left on device" message - likely caused by the /tmp/ generated files clogging the filesystem from multiple runs. We could fix it by simply running cleanup after parallel job always, but this is not good due to diagnostics needed when debugging parallel runs locally so we need to have a way to skip /tmp files deletion. This PR fixes the problem twofold: * cleanup breeze instructions which is run at the beginning of every job cleans also /tmp file * the parallel jobs cleans after themselvs unless skipped. Properly check the existence of missing mapped TIs (apache#25788) The previous implementation of missing indexes was not correct. Missing indexes were being checked every time that `task_instance_scheduling_decision` was called. The missing tasks should only be revised after expanding of last resort for mapped tasks have been done. If we find that a task is in schedulable state and has already been expanded, we revise its indexes and ensure they are complete. Missing indexes are marked as removed. This implementation allows the revision to be done in one place Co-authored-by: Tzu-ping Chung <[email protected]> Fix dataset_event_manager resolution (apache#25943) Appears `__init__` is not invoked as part of `_run_raw_task` due to the way TI is refreshed from db. Centralize dataset manager instantiation instead. Fix unhashable issue with secrets.backend_kwargs and caching (apache#25970) Resolves apache#25968 Fix response schema for list-mapped-task-instance (apache#25965) update areActiveRuns, fix states (apache#25962)
For posterity and in case it helps somebody else we fix our schema with the queries below. Note that it assumes that the migrations from 2.2 to 2.3 have not been executed and failed. Since DDL changes are not transaction in MySQL (or at least the way Alembic is configured), the state of the database will be different after a failed migration. ALTER DATABASE airflow CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE dag_tag DROP FOREIGN KEY dag_tag_ibfk_1;
ALTER TABLE dag_tag MODIFY dag_id VARCHAR(250) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE dag MODIFY dag_id VARCHAR(250) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE dag_tag ADD CONSTRAINT dag_tag_ibfk_1 FOREIGN KEY (dag_id) REFERENCES dag (dag_id) ON DELETE CASCADE;
ALTER TABLE task_fail MODIFY dag_id VARCHAR(250) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE job MODIFY dag_id VARCHAR(250) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE log MODIFY dag_id VARCHAR(250) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE sensor_instance MODIFY dag_id VARCHAR(250) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE serialized_dag MODIFY dag_id VARCHAR(250) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE sla_miss MODIFY dag_id VARCHAR(250) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
alter table task_instance drop foreign key task_instance_dag_run_fkey;
ALTER TABLE task_reschedule DROP FOREIGN KEY task_reschedule_ti_fkey;
ALTER TABLE task_reschedule MODIFY task_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE task_instance MODIFY dag_id VARCHAR(250) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
alter table task_instance add constraint task_instance_dag_run_fkey foreign key (dag_id, run_id) references dag_run (dag_id, run_id) on delete cascade;
ALTER TABLE task_reschedule MODIFY dag_id VARCHAR(250) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE xcom MODIFY dag_id VARCHAR(250) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE task_instance MODIFY task_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE task_reschedule ADD CONSTRAINT task_reschedule_ti_fkey FOREIGN KEY (dag_id, task_id, run_id) REFERENCES task_instance (dag_id, task_id, run_id);
ALTER TABLE rendered_task_instance_fields MODIFY dag_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE rendered_task_instance_fields MODIFY task_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE task_fail MODIFY task_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE task_fail MODIFY dag_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
|
Apache Airflow version
2.3.2 (latest released)
What happened
Upgrade Airflow 2.2.3 or 2.2.5 -> 2.3.2 fails on migration-job.
first time upgrade execution:
after the first failed execution (should be due to the first failed execution):
What you think should happen instead
The migration-job shouldn't fail ;)
How to reproduce
Everytime in my environment just need to create a snapshot from last working DB-Snapshot (Airflow Version 2.2.3)
and then deploy Airflow 2.3.2.
I can update in between to 2.2.5 but ran into the same issue by update to 2.3.2.
Operating System
Debian GNU/Linux 10 (buster) - apache/airflow:2.3.2-python3.8 (hub.docker.com)
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==2.4.0
apache-airflow-providers-celery==2.1.0
apache-airflow-providers-cncf-kubernetes==2.2.0
apache-airflow-providers-docker==2.3.0
apache-airflow-providers-elasticsearch==2.1.0
apache-airflow-providers-ftp==2.0.1
apache-airflow-providers-google==6.2.0
apache-airflow-providers-grpc==2.0.1
apache-airflow-providers-hashicorp==2.1.1
apache-airflow-providers-http==2.0.1
apache-airflow-providers-imap==2.0.1
apache-airflow-providers-microsoft-azure==3.4.0
apache-airflow-providers-mysql==2.1.1
apache-airflow-providers-odbc==2.0.1
apache-airflow-providers-postgres==2.4.0
apache-airflow-providers-redis==2.0.1
apache-airflow-providers-sendgrid==2.0.1
apache-airflow-providers-sftp==2.3.0
apache-airflow-providers-slack==4.1.0
apache-airflow-providers-sqlite==2.0.1
apache-airflow-providers-ssh==2.3.0
apache-airflow-providers-tableau==2.1.4
Deployment
Official Apache Airflow Helm Chart
Deployment details
Anything else
Full error Log first execution:
Full error Log after first execution (should caused by first execution):
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: