Source Google Search Console: add slicing by date range #9073

augan-rymkhan · 2021-12-23T05:25:31Z

What

Resolves 8572
if we do API call for long date range there is a big chance to timeout.

How

The solution can be slicing streams by N days: for each date based slice send separate request. The less the range, the less chance we can timeout. By default range of days is 2.

For example:

start_date =  "2021-09-01"
end_date =  "2021-09-05"

Then our slices will be:

{"start_date": "2021-09-01", "end_date": "2021-09-02"}
{"start_date": "2021-09-03", "end_date": "2021-09-04"}
{"start_date": "2021-09-05", "end_date": "2021-09-05"}

Recommended reading order

connectors/source-google-search-console/source_google_search_console/streams.py

augan-rymkhan · 2021-12-23T06:21:20Z

/test connector=connectors/source-google-search-console

🕑 connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1614423685
❌ connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1614423685
🐛 https://gradle.com/s/z5lf3ybxreur4
Python short test summary info:

=========================== short test summary info ============================
FAILED test_incremental.py::TestIncremental::test_state_with_abnormally_large_values[inputs0]
======================== 1 failed, 16 passed in 28.54s =========================

augan-rymkhan · 2021-12-23T09:16:41Z

/test connector=connectors/source-google-search-console

🕑 connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1614926494
✅ connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1614926494
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                            Stmts   Miss  Cover
	 -----------------------------------------------------------------------------------
	 source_google_search_console/__init__.py                            2      0   100%
	 source_google_search_console/service_account_authenticator.py      14      6    57%
	 source_google_search_console/source.py                             37     22    41%
	 source_google_search_console/streams.py                           117     28    76%
	 -----------------------------------------------------------------------------------
	 TOTAL                                                             170     56    67%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      6    92%
	 source_acceptance_test/conftest.py                     109    109     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              242     96    60%
	 source_acceptance_test/tests/test_full_refresh.py       38      0   100%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     17    69%
	 source_acceptance_test/utils/compare.py                 62     23    63%
	 source_acceptance_test/utils/connector_runner.py       110     48    56%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  979    404    59%

augan-rymkhan · 2021-12-23T09:35:51Z

/test connector=connectors/source-google-search-console

🕑 connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1614991396
✅ connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1614991396
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                            Stmts   Miss  Cover
	 -----------------------------------------------------------------------------------
	 source_google_search_console/__init__.py                            2      0   100%
	 source_google_search_console/service_account_authenticator.py      14      6    57%
	 source_google_search_console/source.py                             37     22    41%
	 source_google_search_console/streams.py                           119     28    76%
	 -----------------------------------------------------------------------------------
	 TOTAL                                                             172     56    67%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      6    92%
	 source_acceptance_test/conftest.py                     109    109     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              242     96    60%
	 source_acceptance_test/tests/test_full_refresh.py       38      0   100%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     17    69%
	 source_acceptance_test/utils/compare.py                 62     23    63%
	 source_acceptance_test/utils/connector_runner.py       110     48    56%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  979    404    59%

vitaliizazmic

What about rate limits? If we sync stream by day, will rate limits exceeded?

vitaliizazmic · 2021-12-28T09:19:50Z

...integrations/connectors/source-google-search-console/source_google_search_console/streams.py

+                end_date = self._get_end_date()
+
+                if start_date > end_date:
+                    yield from [


You can use yield instead of yield from list

I removed this yield from.

vitaliizazmic · 2021-12-28T09:40:03Z

...integrations/connectors/source-google-search-console/source_google_search_console/streams.py

+                start_date = self._get_start_date(stream_state, site_url, search_type)
+                end_date = self._get_end_date()
+
+                if start_date > end_date:


If start date greater than end date, you can set start date instead of duplicate yield dict.

@vitaliizazmic Done.

vitaliizazmic · 2021-12-28T09:43:19Z

...integrations/connectors/source-google-search-console/source_google_search_console/streams.py

+                        "end_date": next_end.to_date_string(),
+                    }
+                    # add 1 day for the next slice's start date not to duplicate data from previous slice's end date.
+                    next_start = next_end + pendulum.Duration(days=1)


Why you add 1 day instead of period?

@vitaliizazmic
The period is added here

Without this line
next_start = next_end + pendulum.Duration(days=1)

The two slices will intersect, then user gets duplicated records:
{"start_date": "2021-09-01", "end_date": "2021-09-02"}
{"start_date": "2021-09-02", "end_date": "2021-09-03"}

augan-rymkhan · 2021-12-28T15:01:34Z

/test connector=connectors/source-google-search-console

🕑 connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1630900873
✅ connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1630900873
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                            Stmts   Miss  Cover
	 -----------------------------------------------------------------------------------
	 source_google_search_console/__init__.py                            2      0   100%
	 source_google_search_console/service_account_authenticator.py      14      6    57%
	 source_google_search_console/source.py                             37     22    41%
	 source_google_search_console/streams.py                           119     28    76%
	 -----------------------------------------------------------------------------------
	 TOTAL                                                             172     56    67%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      6    92%
	 source_acceptance_test/conftest.py                     109    109     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              242     96    60%
	 source_acceptance_test/tests/test_full_refresh.py       38      0   100%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     17    69%
	 source_acceptance_test/utils/compare.py                 62     23    63%
	 source_acceptance_test/utils/connector_runner.py       110     48    56%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  979    404    59%

augan-rymkhan · 2021-12-28T15:02:08Z

What about rate limits? If we sync stream by day, will rate limits exceeded?

@vitaliizazmic I did not face rate limit. In this PR, range days is set to 2. One query will fetch records for 2 days.
{"start_date": "2021-09-01", "end_date": "2021-09-02"}

We can increase it to 3 days.

vitaliizazmic

LGTM, but just in case check syncing for long period.

…date-range # Conflicts: # docs/integrations/sources/google-search-console.md

augan-rymkhan · 2021-12-30T06:45:24Z

/test connector=connectors/source-google-search-console

🕑 connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1636597979
✅ connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1636597979
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      6    92%
	 source_acceptance_test/conftest.py                     109    109     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              242     96    60%
	 source_acceptance_test/tests/test_full_refresh.py       38      0   100%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     17    69%
	 source_acceptance_test/utils/compare.py                 62     23    63%
	 source_acceptance_test/utils/connector_runner.py       110     48    56%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  979    404    59%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                            Stmts   Miss  Cover
	 -----------------------------------------------------------------------------------
	 source_google_search_console/__init__.py                            2      0   100%
	 source_google_search_console/service_account_authenticator.py      14      6    57%
	 source_google_search_console/source.py                             37     22    41%
	 source_google_search_console/streams.py                           119     28    76%
	 -----------------------------------------------------------------------------------
	 TOTAL                                                             172     56    67%

augan-rymkhan · 2021-12-30T06:57:29Z

/publish connector=connectors/source-google-search-console

🕑 connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1636623361
✅ connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1636623361

Augan93 added 2 commits December 22, 2021 23:28

slice by date range

eb5b550

limit slice by the current date

9381939

github-actions bot added the area/connectors Connector related issues label Dec 23, 2021

augan-rymkhan temporarily deployed to more-secrets December 23, 2021 05:26 Inactive

fixed test_slice unit test

522472a

augan-rymkhan temporarily deployed to more-secrets December 23, 2021 05:56 Inactive

by default date range is 2 days

8d19377

augan-rymkhan temporarily deployed to more-secrets December 23, 2021 06:13 Inactive

removed unused print

a4586fd

augan-rymkhan temporarily deployed to more-secrets December 23, 2021 06:21 Inactive

jrhizor temporarily deployed to more-secrets December 23, 2021 06:23 Inactive

fix for abnormal state test

31e3fc4

augan-rymkhan temporarily deployed to more-secrets December 23, 2021 09:11 Inactive

jrhizor temporarily deployed to more-secrets December 23, 2021 09:18 Inactive

added _get_end_date method

8e797ef

augan-rymkhan temporarily deployed to more-secrets December 23, 2021 09:32 Inactive

jrhizor temporarily deployed to more-secrets December 23, 2021 09:37 Inactive

augan-rymkhan requested review from sergei-solonitcyn, keu, eliziario and vitaliizazmic December 23, 2021 09:47

updated version

393baf2

github-actions bot added the area/documentation Improvements or additions to documentation label Dec 23, 2021

augan-rymkhan temporarily deployed to more-secrets December 23, 2021 10:12 Inactive

format code

a045c8d

augan-rymkhan temporarily deployed to more-secrets December 23, 2021 10:38 Inactive

augan-rymkhan temporarily deployed to more-secrets December 23, 2021 11:35 Inactive

vitaliizazmic reviewed Dec 28, 2021

View reviewed changes

improve code if start_date > end_date

b213ffe

augan-rymkhan temporarily deployed to more-secrets December 28, 2021 15:02 Inactive

jrhizor temporarily deployed to more-secrets December 28, 2021 15:03 Inactive

augan-rymkhan requested a review from vitaliizazmic December 28, 2021 15:05

vitaliizazmic approved these changes Dec 29, 2021

View reviewed changes

changed range of days to 3

5cef6a9

augan-rymkhan temporarily deployed to more-secrets December 30, 2021 06:12 Inactive

Merge branch 'master' into arymkhan/google-search-console-slicing-by-…

c7c7ef3

…date-range # Conflicts: # docs/integrations/sources/google-search-console.md

augan-rymkhan temporarily deployed to more-secrets December 30, 2021 06:19 Inactive

change version in Dockerfile

6abf61c

augan-rymkhan temporarily deployed to more-secrets December 30, 2021 06:23 Inactive

jrhizor temporarily deployed to more-secrets December 30, 2021 06:47 Inactive

jrhizor temporarily deployed to more-secrets December 30, 2021 06:59 Inactive

update spec and def yamls

4bf3eee

augan-rymkhan temporarily deployed to more-secrets December 30, 2021 07:13 Inactive

augan-rymkhan merged commit c135e00 into master Dec 30, 2021

augan-rymkhan deleted the arymkhan/google-search-console-slicing-by-date-range branch December 30, 2021 07:13

jrhizor mentioned this pull request Jan 3, 2022

Bump Airbyte version from 0.35.2-alpha to 0.35.3-alpha #9262

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source Google Search Console: add slicing by date range #9073

Source Google Search Console: add slicing by date range #9073

augan-rymkhan commented Dec 23, 2021 •

edited

Loading

augan-rymkhan commented Dec 23, 2021 •

edited by github-actions bot

Loading

augan-rymkhan commented Dec 23, 2021 •

edited by github-actions bot

Loading

augan-rymkhan commented Dec 23, 2021 •

edited by github-actions bot

Loading

vitaliizazmic left a comment

vitaliizazmic Dec 28, 2021

augan-rymkhan Dec 28, 2021

vitaliizazmic Dec 28, 2021

augan-rymkhan Dec 28, 2021

vitaliizazmic Dec 28, 2021

augan-rymkhan Dec 28, 2021 •

edited

Loading

augan-rymkhan commented Dec 28, 2021 •

edited by github-actions bot

Loading

augan-rymkhan commented Dec 28, 2021

vitaliizazmic left a comment

augan-rymkhan commented Dec 30, 2021 •

edited by github-actions bot

Loading

augan-rymkhan commented Dec 30, 2021 •

edited by github-actions bot

Loading

Source Google Search Console: add slicing by date range #9073

Source Google Search Console: add slicing by date range #9073

Conversation

augan-rymkhan commented Dec 23, 2021 • edited Loading

What

How

Recommended reading order

augan-rymkhan commented Dec 23, 2021 • edited by github-actions bot Loading

augan-rymkhan commented Dec 23, 2021 • edited by github-actions bot Loading

augan-rymkhan commented Dec 23, 2021 • edited by github-actions bot Loading

vitaliizazmic left a comment

Choose a reason for hiding this comment

vitaliizazmic Dec 28, 2021

Choose a reason for hiding this comment

augan-rymkhan Dec 28, 2021

Choose a reason for hiding this comment

vitaliizazmic Dec 28, 2021

Choose a reason for hiding this comment

augan-rymkhan Dec 28, 2021

Choose a reason for hiding this comment

vitaliizazmic Dec 28, 2021

Choose a reason for hiding this comment

augan-rymkhan Dec 28, 2021 • edited Loading

Choose a reason for hiding this comment

augan-rymkhan commented Dec 28, 2021 • edited by github-actions bot Loading

augan-rymkhan commented Dec 28, 2021

vitaliizazmic left a comment

Choose a reason for hiding this comment

augan-rymkhan commented Dec 30, 2021 • edited by github-actions bot Loading

augan-rymkhan commented Dec 30, 2021 • edited by github-actions bot Loading

augan-rymkhan commented Dec 23, 2021 •

edited

Loading

augan-rymkhan commented Dec 23, 2021 •

edited by github-actions bot

Loading

augan-rymkhan commented Dec 23, 2021 •

edited by github-actions bot

Loading

augan-rymkhan commented Dec 23, 2021 •

edited by github-actions bot

Loading

augan-rymkhan Dec 28, 2021 •

edited

Loading

augan-rymkhan commented Dec 28, 2021 •

edited by github-actions bot

Loading

augan-rymkhan commented Dec 30, 2021 •

edited by github-actions bot

Loading

augan-rymkhan commented Dec 30, 2021 •

edited by github-actions bot

Loading