-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-3885] ~10x speed-up of SchedulerJobTest suite #4730
[AIRFLOW-3885] ~10x speed-up of SchedulerJobTest suite #4730
Conversation
The SchedulerJobTest suite now takes ~90 seconds on my laptop (down from ~900 seconds == 15 minutes) on Jenkins. There are a few optimizations here: 1. Don't sleep() for 1 second every scheduling loop (in unit tests) 2. Don't process the example DAGs 3. Use `subdir` to process only the DAGs we need, for a couple of tests that actually run the scheduler 4. Only load the DagBag once instead of before each test I've also added a few tables to the list of tables that are cleaned up in between test runs to make the tests re-entrant.
a481352
to
010ca8c
Compare
Codecov Report
@@ Coverage Diff @@
## master #4730 +/- ##
=========================================
- Coverage 74.65% 74.6% -0.06%
=========================================
Files 430 430
Lines 27991 27995 +4
=========================================
- Hits 20897 20885 -12
- Misses 7094 7110 +16
Continue to review full report at Codecov.
|
Between this PR and #4726, we're shaving off ~9 minutes from the total CI time.
|
cc @feng-tao |
nice find @astahlman ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May you clarify a bit on your change in utils/dag_processing.py
?
@@ -277,7 +277,7 @@ def get_dag(self, dag_id): | |||
|
|||
|
|||
def list_py_file_paths(directory, safe_mode=True, | |||
include_examples=conf.getboolean('core', 'LOAD_EXAMPLES')): | |||
include_examples=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes no difference to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @feng-tao , understand this PR is merged and it does bring significant improvement (big thanks to @astahlman !).
But may you check my comment? It's a very minor point, but not really necessary to add a separate check here (Kindly let me know if I missed anything). Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was needed because of the way that Python handles default arguments: the default value is evaluated when the function is defined, not when it is called. Therefore, our mocked implementation of getboolean
wouldn't get called here, and include_examples
will be always be True
.
This way, we call getboolean
every time the function is evaluated, so our mock implementation does get used and getboolean('core', 'LOAD_EXAMPLES')
evaluates to False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get it. Thanks for the clarification @astahlman
The SchedulerJobTest suite now takes ~90 seconds on my laptop (down from ~900 seconds == 15 minutes) on Jenkins. There are a few optimizations here: 1. Don't sleep() for 1 second every scheduling loop (in unit tests) 2. Don't process the example DAGs 3. Use `subdir` to process only the DAGs we need, for a couple of tests that actually run the scheduler 4. Only load the DagBag once instead of before each test I've also added a few tables to the list of tables that are cleaned up in between test runs to make the tests re-entrant.
The SchedulerJobTest suite now takes ~90 seconds on my laptop (down from ~900 seconds == 15 minutes) on Jenkins. There are a few optimizations here: 1. Don't sleep() for 1 second every scheduling loop (in unit tests) 2. Don't process the example DAGs 3. Use `subdir` to process only the DAGs we need, for a couple of tests that actually run the scheduler 4. Only load the DagBag once instead of before each test I've also added a few tables to the list of tables that are cleaned up in between test runs to make the tests re-entrant.
The SchedulerJobTest suite now takes ~90 seconds on my laptop (down from ~900 seconds == 15 minutes) on Jenkins. There are a few optimizations here: 1. Don't sleep() for 1 second every scheduling loop (in unit tests) 2. Don't process the example DAGs 3. Use `subdir` to process only the DAGs we need, for a couple of tests that actually run the scheduler 4. Only load the DagBag once instead of before each test I've also added a few tables to the list of tables that are cleaned up in between test runs to make the tests re-entrant.
Make sure you have checked all steps below.
Jira
Description
The SchedulerJobTest suite now takes ~90 seconds on my laptop (down from
~900 seconds == 15 minutes) on Jenkins.
There are a few optimizations here:
subdir
to process only the DAGs we need, for a couple of teststhat actually run the scheduler
I've also added a few tables to the list of tables that are cleaned up
in between test runs to make the tests re-entrant.
Tests
Commits
Documentation
Code Quality
flake8