Skip to content

Commit

Permalink
Add instructions on manually fixing MySQL Charset problems (#25938)
Browse files Browse the repository at this point in the history
Depending on the way how and when you MySQL DB was created, it might
have problems with the right CHARACTER SET and COLLATION used.

This PR attempts to describe a process that users can follow to
fix such a problem.

Also a chapter was added to recommend taking a backup before
the migration.

Based on discussions and user input from #25866, #24526

Closes: #24526
  • Loading branch information
potiuk authored Aug 25, 2022
1 parent 994f188 commit e259262
Show file tree
Hide file tree
Showing 2 changed files with 156 additions and 7 deletions.
162 changes: 155 additions & 7 deletions docs/apache-airflow/installation/upgrading.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,18 @@ Newer Airflow versions can contain database migrations so you must run ``airflow
to upgrade your database with the schema changes in the Airflow version you are upgrading to.
Don't worry, it's safe to run even if there are no migrations to perform.

Upgrade preparation - make a backup of DB
=========================================

It is highly recommended to make a backup of your metadata DB before any migration.
If you do not have a "hot backup" capability for your DB, you should
do it after shutting down your Airflow instances, so that the backup of is consistent.
If you did not make a backup and your migration fails, you might end-up in
a half-migrated state and restoring DB from backup and repeating the
migration might be the only easy way out. This can for example be caused by a broken
network connection between your CLI and the database while the migration happens, so taking
a backup is an important precaution to avoid problems like this.

When you need to upgrade
========================

Expand Down Expand Up @@ -55,16 +67,143 @@ Sample usage:
``airflow db upgrade --revision-range "e959f08ac86c:142555e44c17"``


Migration best practices
========================
Handling migration problems
===========================


Wrong Encoding in MySQL database
................................

If you are using old Airflow 1.10 as a database created initially either manually or with previous version of MySQL,
depending on the original character set of your database, you might have problems with migrating to a newer
version of Airflow and your migration might fail with strange errors ("key size too big", "missing indexes" etc).
The next chapter describes how to fix the problem manually.


Why you might get the error? The recommended character set/collation for MySQL 8 database is
``utf8mb4`` and ``utf8mb4_bin`` respectively. However this has been changing in different versions of
MySQL and you could have custom created database with a different character set. If your database
was created with an old version of Airflow or MySQL, the encoding could have been wrong when the database
was created or broken during migration.

Unfortunately, MySQL limits the index key size and with ``utf8mb4``, Airflow index key sizes might be
too big for MySQL to handle. Therefore in Airflow we force all the "ID" keys to use ``utf8`` character
set (which is equivalent to ``utf8mb3`` in MySQL 8). This limits the size of indexes so that MySQL
can handle them.

Here are the steps you can follow to fix it BEFORE you attempt to migrate
(but you might also choose to do it your way if you know what you are doing).


1. Make a backup of your database so that you can restore it in case of a mistake.


2. Check which of the tables of yours need fixing. Look at those tables:

.. code-block:: sql
SHOW CREATE TABLE task_reschedule;
SHOW CREATE TABLE xcom;
SHOW CREATE TABLE task_fail;
SHOW CREATE TABLE rendered_task_instance_fields;
SHOW CREATE TABLE task_instance;
Make sure to copy the output. You will need it in the last step. Your
``dag_id``, ``run_id``, ``task_id`` and ``key`` columns should have ``utf8`` or ``utf8mb3`` character
set set explicitly, similar to:

.. code-block:: text
``task_id`` varchar(250) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, # correct
or

.. code-block:: text
``task_id`` varchar(250) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin NOT NULL, # correct
The problem is if your fields have no encoding:

.. code-block:: text
``task_id`` varchar(250), # wrong !!
or just collation set to utf8mb4:

.. code-block:: text
``task_id`` varchar(250) COLLATE utf8mb4_unicode_ci DEFAULT NULL, # wrong !!
or character set and collation set to utf8mb4

.. code-block:: text
``task_id`` varchar(250) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL, # wrong !!
You need to fix those fields that have wrong character set/collation set.


3. Drop foreign key indexes for tables you need to modify (you do not need to drop all of them - do it just
for those tables that you need to modify). You will need to recreate them in the last step (that's why
you need to keep the ``SHOW CREATE TABLE`` output from step 2.

.. code-block:: sql
ALTER TABLE task_reschedule DROP FOREIGN KEY task_reschedule_ti_fkey;
ALTER TABLE xcom DROP FOREIGN KEY xcom_task_instance_fkey;
ALTER TABLE task_fail DROP FOREIGN KEY task_fail_ti_fkey;
ALTER TABLE rendered_task_instance_fields DROP FOREIGN KEY rtif_ti_fkey;
4. Modify your ``ID`` fields to have correct character set/encoding. Only do that for fields that have
wrong encoding (here are all potential commands you might need to use):

.. code-block:: sql
ALTER TABLE task_instance MODIFY task_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE task_reschedule MODIFY task_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE rendered_task_instance_fields MODIFY task_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE rendered_task_instance_fields MODIFY dag_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE task_fail MODIFY task_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE task_fail MODIFY dag_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE sla_miss MODIFY task_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE sla_miss MODIFY dag_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE task_map MODIFY task_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE task_map MODIFY dag_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE task_map MODIFY run_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE xcom MODIFY task_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE xcom MODIFY dag_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE xcom MODIFY run_id VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
ALTER TABLE xcom MODIFY key VARCHAR(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin;
5. Recreate the foreign keys dropped in step 3.

Repeat this one for all the indexes you dropped. Note that depending on the version of Airflow you
Have, the indexes might be slightly different (for example ``map_index`` was added in 2.3.0) but if you
keep the ``SHOW CREATE TABLE`` output prepared in step 2., you will find the right ``CONSTRAINT_NAME``
and ``CONSTRAINT`` to use.

.. code-block:: sql
# Here you have to copy the statements from SHOW CREATE TABLE output
ALTER TABLE <TABLE> ADD CONSTRAINT `<CONSTRAINT_NAME>` <CONSTRAINT>
This should bring the database to the state where you will be able to run the migration to the new
Airflow version.

Depending on the size of your database and the actual migration it might take quite some time to migrate it,
so if you have long history and big database, it is recommended to make a copy of the database first and
perform a test migration to assess how long the migration will take. Typically "Major" upgrades might take
longer as adding new features require sometimes restructuring of the database.

Post-upgrade warnings
=====================
.....................

Typically you just need to successfully run ``airflow db upgrade`` command and this is all. However in
some cases, the migration might find some old, stale and probably wrong data in your database and moves it
Expand Down Expand Up @@ -118,3 +257,12 @@ Deleting a table:
.. code-block:: sql
DROP TABLE <table>;
Migration best practices
========================

Depending on the size of your database and the actual migration it might take quite some time to migrate it,
so if you have long history and big database, it is recommended to make a copy of the database first and
perform a test migration to assess how long the migration will take. Typically "Major" upgrades might take
longer as adding new features require sometimes restructuring of the database.
1 change: 1 addition & 0 deletions docs/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -956,6 +956,7 @@ infile
influxdb
infoType
infoTypes
ing
ingestions
ini
init
Expand Down

0 comments on commit e259262

Please sign in to comment.