Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facebook Marketing performance improvement #8385

Closed
wants to merge 36 commits into from
Closed

Conversation

avida
Copy link
Contributor

@avida avida commented Dec 1, 2021

Resolves #8282

Improve performance for getting Ads Insights metrics over async jobs.

In previous version we had constant number (10) of simultaneously running async jobs for ads insights.

Since there is no limit on concurrent async jobs number from facebook side ( there is only ads insights throttle parameter that displays how much facebook throttling our jobs execution) so in this PR algorithm is following:

  1. Generate jobs with consequent time ranges with 5 days window for single range and start it asynchronously. On each async job start facebook send "x-fb-ads-insights-throttle" parameter in response header representing current ads insights throttle. Keep adding new jobs until limit throttle threshold reached (0.7 in our case). Every job is stored in dequeue for processing in FIFO way
  2. Get first job in job dequeu and check its status. If its not ready yet wait for 30 seconds and check again. Otherwise proceed to next step.
  3. If job is ready pop it from the queue and repeat step 1 (add jobs until end of date range or throttle limit hit). Process result for ready job
  4. If there is no jobs left then stop. Otherwise go to step 2.

Notes:

  • If job on step 2. have "failed" status schedule it again and wait untill it completed (there is 5 attempt to restart job). Since jobs should be read sequentially according to their time range parameter we cannot move to the next job.
  • There is no timeout on how long single job could run. We wait it until Facebook server update job status to "failed","ready" or "skipped".
  • One of the possible way of improvement (not implemented in current PR) is to split range for failed job and try to run two smaller jobs instead of restarting job with same parameter again.
  • As a consequence of previous item if job have minimal 1 day range we cannot split it but we can reduce number of metrics in job query. But this require additional investigation how hard it could be on memory consumption and how we can unambiguously identify each record and combine multiple results into a single metric, so I would not go this way until we really need it.

Here is simple performance test Ive done trying to run it with different "days per job" parameter, time range was 2019-05-23 - 2021-11-02. All test were run on test account with very few data and single run so it just rough estimate on how much time it takes to run:

Days per job execution time (minutes)
1 13:14
5 2:37
7 2:01
50 0:56
1000 7:35

Also I tried to run it on maximum allowed range of 37 monthes with 1 day per job. It spawned maximum of 1109 concurrent jobs and took 16:15 to complete. There were no fails were detected for all test runs.


This change is Reviewable

@github-actions github-actions bot added the area/connectors Connector related issues label Dec 1, 2021
@avida
Copy link
Contributor Author

avida commented Dec 1, 2021

/test connector=connectors/source-facebook-marketing

🕑 connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1526267519
✅ connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1526267519
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        76      8    89%
	 source_acceptance_test/conftest.py                     108    108     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              235     95    60%
	 source_acceptance_test/tests/test_full_refresh.py       38     27    29%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     24    56%
	 source_acceptance_test/utils/compare.py                 62     25    60%
	 source_acceptance_test/utils/connector_runner.py        82     49    40%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  945    441    53%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                             Stmts   Miss  Cover
	 --------------------------------------------------------------------
	 source_facebook_marketing/__init__.py                2      0   100%
	 source_facebook_marketing/api.py                    86     21    76%
	 source_facebook_marketing/async_job.py              87     53    39%
	 source_facebook_marketing/async_job_manager.py      55     29    47%
	 source_facebook_marketing/common.py                 37     11    70%
	 source_facebook_marketing/source.py                112     65    42%
	 source_facebook_marketing/streams.py               212     57    73%
	 --------------------------------------------------------------------
	 TOTAL                                              591    236    60%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                             Stmts   Miss  Cover
	 --------------------------------------------------------------------
	 source_facebook_marketing/__init__.py                2      0   100%
	 source_facebook_marketing/api.py                    86     22    74%
	 source_facebook_marketing/async_job.py              87      0   100%
	 source_facebook_marketing/async_job_manager.py      55     29    47%
	 source_facebook_marketing/common.py                 37      1    97%
	 source_facebook_marketing/source.py                112     72    36%
	 source_facebook_marketing/streams.py               212     57    73%
	 --------------------------------------------------------------------
	 TOTAL                                              591    181    69%

@avida avida temporarily deployed to more-secrets December 1, 2021 14:37 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets December 1, 2021 14:38 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets December 1, 2021 14:38 Inactive
@avida avida force-pushed the drezchykov/fb-perf branch from 7ba1169 to 17147fe Compare December 2, 2021 14:30
@avida avida temporarily deployed to more-secrets December 2, 2021 14:32 Inactive
@avida
Copy link
Contributor Author

avida commented Dec 2, 2021

/test connector=connectors/source-facebook-marketing

🕑 connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1531299395
✅ connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1531299395
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        76      8    89%
	 source_acceptance_test/conftest.py                     108    108     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              235     95    60%
	 source_acceptance_test/tests/test_full_refresh.py       38     27    29%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     24    56%
	 source_acceptance_test/utils/compare.py                 62     25    60%
	 source_acceptance_test/utils/connector_runner.py        82     49    40%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  945    441    53%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                             Stmts   Miss  Cover
	 --------------------------------------------------------------------
	 source_facebook_marketing/__init__.py                2      0   100%
	 source_facebook_marketing/api.py                    86     21    76%
	 source_facebook_marketing/async_job.py              88     53    40%
	 source_facebook_marketing/async_job_manager.py      56     30    46%
	 source_facebook_marketing/common.py                 37     11    70%
	 source_facebook_marketing/source.py                112     65    42%
	 source_facebook_marketing/streams.py               210     58    72%
	 --------------------------------------------------------------------
	 TOTAL                                              591    238    60%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                             Stmts   Miss  Cover
	 --------------------------------------------------------------------
	 source_facebook_marketing/__init__.py                2      0   100%
	 source_facebook_marketing/api.py                    86     22    74%
	 source_facebook_marketing/async_job.py              88      0   100%
	 source_facebook_marketing/async_job_manager.py      56      0   100%
	 source_facebook_marketing/common.py                 37      1    97%
	 source_facebook_marketing/source.py                112     72    36%
	 source_facebook_marketing/streams.py               210     57    73%
	 --------------------------------------------------------------------
	 TOTAL                                              591    152    74%

@jrhizor jrhizor temporarily deployed to more-secrets December 2, 2021 15:40 Inactive
@avida
Copy link
Contributor Author

avida commented Dec 2, 2021

Here is SAT runnning time comparison for previous version (top) and this PR (bottom):
res

Same 5 days range window

@avida avida temporarily deployed to more-secrets December 2, 2021 17:31 Inactive
@avida
Copy link
Contributor Author

avida commented Dec 2, 2021

/publish connector=connectors/source-facebook-marketing run-test=false

Error: Unexpected inputs provided: ["run-test"]

@avida
Copy link
Contributor Author

avida commented Dec 2, 2021

/publish connector=connectors/source-facebook-marketing run-tests=false

🕑 connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1531750880
✅ connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1531750880

@avida avida force-pushed the drezchykov/fb-perf branch from 5500f20 to 69b2958 Compare December 3, 2021 11:52
@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Dec 3, 2021
@avida avida temporarily deployed to more-secrets December 3, 2021 11:54 Inactive
@avida avida marked this pull request as ready for review December 3, 2021 14:26
@avida
Copy link
Contributor Author

avida commented Dec 3, 2021

/test connector=connectors/source-facebook-marketing

🕑 connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1535645154
✅ connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1535645154
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        76      8    89%
	 source_acceptance_test/conftest.py                     108    108     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              235     95    60%
	 source_acceptance_test/tests/test_full_refresh.py       38     27    29%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     24    56%
	 source_acceptance_test/utils/compare.py                 62     25    60%
	 source_acceptance_test/utils/connector_runner.py        82     49    40%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  945    441    53%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                     Stmts   Miss  Cover
	 ----------------------------------------------------------------------------
	 source_facebook_marketing/__init__.py                        2      0   100%
	 source_facebook_marketing/api.py                            86     20    77%
	 source_facebook_marketing/source.py                        112     65    42%
	 source_facebook_marketing/streams/__init__.py                3      0   100%
	 source_facebook_marketing/streams/async_job.py              88     53    40%
	 source_facebook_marketing/streams/async_job_manager.py      58     31    47%
	 source_facebook_marketing/streams/common.py                 35     11    69%
	 source_facebook_marketing/streams/insights_streams.py       87     29    67%
	 source_facebook_marketing/streams/streams.py               131     28    79%
	 ----------------------------------------------------------------------------
	 TOTAL                                                      602    237    61%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                     Stmts   Miss  Cover
	 ----------------------------------------------------------------------------
	 source_facebook_marketing/__init__.py                        2      0   100%
	 source_facebook_marketing/api.py                            86     21    76%
	 source_facebook_marketing/source.py                        112     72    36%
	 source_facebook_marketing/streams/__init__.py                3      0   100%
	 source_facebook_marketing/streams/async_job.py              88      0   100%
	 source_facebook_marketing/streams/async_job_manager.py      58      0   100%
	 source_facebook_marketing/streams/common.py                 35      1    97%
	 source_facebook_marketing/streams/insights_streams.py       87     37    57%
	 source_facebook_marketing/streams/streams.py               131     19    85%
	 ----------------------------------------------------------------------------
	 TOTAL                                                      602    150    75%

@jrhizor jrhizor temporarily deployed to more-secrets December 3, 2021 14:36 Inactive
@avida avida temporarily deployed to more-secrets December 6, 2021 15:32 Inactive
@keu keu temporarily deployed to more-secrets January 22, 2022 23:53 Inactive
@keu keu temporarily deployed to more-secrets January 23, 2022 00:05 Inactive
@keu keu temporarily deployed to more-secrets January 23, 2022 00:14 Inactive
@keu keu temporarily deployed to more-secrets January 23, 2022 15:19 Inactive
@keu keu temporarily deployed to more-secrets January 23, 2022 15:26 Inactive
@keu keu temporarily deployed to more-secrets January 23, 2022 15:48 Inactive
@keu keu temporarily deployed to more-secrets January 23, 2022 15:52 Inactive
@keu keu self-assigned this Jan 23, 2022
@keu keu marked this pull request as draft January 23, 2022 16:18
@keu keu temporarily deployed to more-secrets January 23, 2022 22:15 Inactive
@keu keu temporarily deployed to more-secrets January 24, 2022 03:59 Inactive
@@ -13,21 +13,10 @@ tests:
- config_path: "secrets/config.json"
basic_read:
- config_path: "secrets/config.json"
configured_catalog_path: "integration_tests/configured_catalog.json"
timeout_seconds: 600
empty_streams: ["videos"]
incremental:
- config_path: "secrets/config.json"
configured_catalog_path: "integration_tests/configured_catalog_without_insights.json"
future_state_path: "integration_tests/future_state.json"
full_refresh:
- config_path: "secrets/config.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

means we test all streams from the catalog, and because we using new PK-based comparison we don't need to ignore some fields

@keu
Copy link
Contributor

keu commented Jan 24, 2022

blocked by #9718
blocked by #9746

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Source FB Marketing: improve insights jobs reliability & runtime