Skip to content

Latest commit

 

History

History

cloud-composer-migration-complexity-assessment

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

cloud-composer-migration-complexity-assessment

Table Of Contents

  1. Use Case
  2. About
  3. Setup
  4. Results

use-case

For customers looking to migrate their Google Cloud Composer Version 1 environments to Composer 2.


about

Complexity Assessment DAG

Generates the following:

  • inventory of dags, tasks, operators
  • Airflow upgrade check script results
  • v1 to v2 migration reports

Uses code from the following:


setup

  1. Prepare a Google Cloud Storage Bucket for storing results.
  2. Prepare your Cloud Composer Environment by adding the following PyPi Packages:
apache-airflow-upgrade-check
prettytable==3.8.0
dominate
pandas
  1. Update migration-assessment.py DAG and add your bucket name
  2. Upload airflow-v1-to-v2-migration dir to DAGs Folder (ensure .airflowignore includes this dir)
  3. Upload migration-assessment.py DAG to Cloud Composer (v1) DAGs Folder

results

Navigate to your Google Cloud Storage bucket.

inventory/dags/

Inventory of dags generated by querying airflow database. output as JSON.

{"dag_id": "airflow_monitoring", "default_view": "tree", "description": "liveness monitoring dag", "fileloc": "/home/airflow/gcs/dags/airflow_monitoring.py", "is_active": 1, "is_paused": 0, "is_subdag": 0, "last_expired": null, "last_pickled": null, "last_scheduler_run": 1690813337, "owners": "airflow", "pickle_id": null, "processed_ts": 20088515, "root_dag_id": null, "schedule_interval": "\"*/10 * * * *\"", "scheduler_lock": null}
{"dag_id": "migration_assessment_v0_0", "default_view": "tree", "description": "assess migration scope for v1 to v2", "fileloc": "/home/airflow/gcs/dags/migration-assessment.py", "is_active": 1, "is_paused": 0, "is_subdag": 0, "last_expired": null, "last_pickled": null, "last_scheduler_run": 1690813103, "owners": "auditing", "pickle_id": null, "processed_ts": 20088515, "root_dag_id": null, "schedule_interval": "\"0 0 * * *\"", "scheduler_lock": null}

inventory/tasks/

Inventory of tasks generated by querying airflow database. output as JSON.

Sample:

{"dag_id": "airflow_monitoring", "duration": 1.00325, "end_date": 1690816211, "execution_date": 1690815600, "hostname": "airflow-worker-7dc8f98dfc-9j8mr", "job_id": 990, "operator": "BashOperator", "pid": 88919, "pool": "default_pool", "pool_slots": 1, "priority_weight": 2147483647, "processed_ts": 20078879, "queue": "default", "queued_dttm": 1690816202, "start_date": 1690816210, "state": "success", "task_id": "echo", "try_number": 1, "unixname": "airflow"}
{"dag_id": "migration_assessment_v0_1", "duration": 15.4799, "end_date": 1690813358, "execution_date": 1690675200, "hostname": "airflow-worker-7dc8f98dfc-fxnmm", "job_id": 979, "operator": "BashOperator", "pid": 84503, "pool": "default_pool", "pool_slots": 1, "priority_weight": 1, "processed_ts": 20078879, "queue": "default", "queued_dttm": 1690813328, "start_date": 1690813343, "state": "success", "task_id": "assessment", "try_number": 1, "unixname": "airflow"}
{"dag_id": "migration_assessment_v0_1", "duration": 1.56348, "end_date": 1690813339, "execution_date": 1690675200, "hostname": "airflow-worker-7dc8f98dfc-9j8mr", "job_id": 977, "operator": "MySqlToGoogleCloudStorageOperator", "pid": 87783, "pool": "default_pool", "pool_slots": 1, "priority_weight": 1, "processed_ts": 20078879, "queue": "default", "queued_dttm": 1690813328, "start_date": 1690813337, "state": "success", "task_id": "dag_mysql_to_gcs", "try_number": 1, "unixname": "airflow"}

inventory/operators/

Count of operators generated by querying airflow database. output as JSON.

Sample:

{"occurrences": 430, "operator": "BashOperator", "processed_ts": 20078890}
{"occurrences": 6, "operator": "MySqlToGoogleCloudStorageOperator", "processed_ts": 20078890}

upgrade-check/

Results of the open source upgrade-check.

Sample:

[2023-07-31 15:19:22,687] {configuration.py:732} INFO - Reading the config from /etc/airflow/airflow.cfg
[2023-07-31 15:19:23,746] {configuration.py:732} INFO - Reading the config from /etc/airflow/airflow.cfg

====== STATUS ======

Check for latest versions of apache-airflow and checkerSUCCESS
Remove airflow.AirflowMacroPlugin classSUCCESS
Ensure users are not using custom metaclasses in custom operatorsSUCCESS
Chain between DAG and operator not allowed.SUCCESS
Connection.conn_type is not nullableSUCCESS
Custom Executors now require full pathSUCCESS
Check versions of PostgreSQL, MySQL, and SQLite to ease upgrade to Airflow 2.0SUCCESS
Hooks that run DB functions must inherit from DBApiHookSUCCESS
Fernet is enabled by defaultSUCCESS
GCP service account key deprecationSUCCESS
Unify hostname_callable option in core sectionSUCCESS
Changes in import paths of hooks, operators, sensors and othersFAIL
Legacy UI is deprecated by defaultFAIL
Logging configuration has been moved to new sectionFAIL
Removal of Mesos ExecutorSUCCESS
No additional argument allowed in BaseOperator.FAIL
Rename max_threads to parsing_processesSUCCESS
Users must set a kubernetes.pod_template_file valueSKIPPED
Ensure Users Properly Import conf from AirflowSUCCESS
SendGrid email uses old airflow.contrib moduleFAIL
Check Spark JDBC Operator default connection nameSUCCESS
Changes in import path of remote task handlersSUCCESS
Connection.conn_id is not uniqueSUCCESS
Use CustomSQLAInterface instead of SQLAInterface for custom data models.SUCCESS
Found 11 problems.

= RECOMMENDATIONS ==

Changes in import paths of hooks, operators, sensors and others
---------------------------------------------------------------
Many hooks, operators and other classes has been renamed and moved. Those changes were part of unifying names and imports paths as described in AIP-21.
The `contrib` folder has been replaced by `providers` directory and packages:
https://github.com/apache/airflow#backport-packages

Problems:

  1.  Using `airflow.contrib.operators.mysql_to_gcs.MySqlToGoogleCloudStorageOperator` should be replaced by `airflow.providers.google.cloud.transfers.mysql_to_gcs.MySQLToGCSOperator`. Affected file: /home/airflow/gcs/dags/migration-assessment.py

Legacy UI is deprecated by default
----------------------------------
Legacy UI is deprecated. FAB RBAC is enabled by default in order to increase security.

Problems:

  1.  rbac in airflow.cfg must be explicitly set empty as RBAC mechanism is enabled by default.

Logging configuration has been moved to new section
---------------------------------------------------
The logging configurations have been moved from [core] to the new [logging] section.

Problems:

  1.  base_log_folder has been moved from [core] to a the new [logging] section.
  2.  remote_logging has been moved from [core] to a the new [logging] section.
  3.  remote_log_conn_id has been moved from [core] to a the new [logging] section.
  4.  remote_base_log_folder has been moved from [core] to a the new [logging] section.

No additional argument allowed in BaseOperator.
-----------------------------------------------
Passing unrecognized arguments to operators is not allowed in Airflow 2.0 anymore,
and will cause an exception.
                  

Problems:

  1.  DAG file `/home/airflow/gcs/dags/migration-assessment.py` with task_id `dag_mysql_to_gcs` has unrecognized positional args `()`and keyword args `{'provide_context': True}`
  2.  DAG file `/home/airflow/gcs/dags/migration-assessment.py` with task_id `task_mysql_to_gcs` has unrecognized positional args `()`and keyword args `{'provide_context': True}`
  3.  DAG file `/home/airflow/gcs/dags/migration-assessment.py` with task_id `operator_mysql_to_gcs` has unrecognized positional args `()`and keyword args `{'provide_context': True}`

Users must set a kubernetes.pod_template_file value
---------------------------------------------------
Skipped because this rule applies only to environment using KubernetesExecutor.

SendGrid email uses old airflow.contrib module
----------------------------------------------

The SendGrid module `airflow.contrib.utils.sendgrid` was moved to `airflow.providers.sendgrid.utils.emailer`.
    

Problems:

  1.  Email backend option uses airflow.contrib Sendgrid module. Please use new module: airflow.providers.sendgrid.utils.emailer

*v1-to-v2/dags/

Attempts at converting DAGs from airflow 1.10 to airflow 2.x

v1-to-v2/reports/Summary-Report.txt

High level report of the number of changes made when attempting to create v2 versions of your DAGs

+-------------------------------------------------------------------------------------------+
|                                       SUMMARY REPORT                                      |
+--------------------------------------------------------------------+----------------------+
| DESCRIPTION                                                        | INFO                 |
+--------------------------------------------------------------------+----------------------+
| Total number of DAG's                                              | 2                    |
| Total number of DAG's with changes:                                | 1                    |
| Total number of DAG's with import changes:                         | 1                    |
| Total number of DAG's with import and operator changes:            | 0                    |
| Total number of DAG's with import, operator and argument changes:  | 0                    |
|                                                                    |                      |
|                                                                    |                      |
| Impacted DAG's with import changes                                 | ['upgrade_check.py'] |
| ____________________                                               | ____________________ |
| Impacted DAG's with import and operator changes                    | []                   |
| ____________________                                               | ____________________ |
| Impacted DAG's with import, operator and argument changes          | []                   |
+--------------------------------------------------------------------+----------------------+

v1-to-v2/reports/Detailed-Report.txt

Granular level report of specific changes made in DAGS when attempting to create v2 versions of your DAGs

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                           DETAILED REPORT                                                                           |
+------------------+-----------+---------------+---------+----------------------------------------------------------+-------------------------------------------------+
| DAG FILE         | AUTOMATED | CHANGE_TYPE   | LINE_NO | OLD_STATEMENT                                            | NEW_STATEMENT                                   |
+------------------+-----------+---------------+---------+----------------------------------------------------------+-------------------------------------------------+
| upgrade_check.py | Y         | Import_Change | 23      | from airflow.operators.bash_operator import BashOperator | from airflow.operators.bash import BashOperator |
+------------------+-----------+---------------+---------+----------------------------------------------------------+-------------------------------------------------+