Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Google Search Console: add slicing by date range #9073

Merged

Conversation

augan-rymkhan
Copy link
Contributor

@augan-rymkhan augan-rymkhan commented Dec 23, 2021

What

Resolves 8572
if we do API call for long date range there is a big chance to timeout.

How

The solution can be slicing streams by N days: for each date based slice send separate request. The less the range, the less chance we can timeout. By default range of days is 2.

For example:

start_date =  "2021-09-01"
end_date =  "2021-09-05"

Then our slices will be:

{"start_date": "2021-09-01", "end_date": "2021-09-02"}
{"start_date": "2021-09-03", "end_date": "2021-09-04"}
{"start_date": "2021-09-05", "end_date": "2021-09-05"}

Recommended reading order

  1. connectors/source-google-search-console/source_google_search_console/streams.py

@github-actions github-actions bot added the area/connectors Connector related issues label Dec 23, 2021
@augan-rymkhan augan-rymkhan temporarily deployed to more-secrets December 23, 2021 05:26 Inactive
@augan-rymkhan augan-rymkhan temporarily deployed to more-secrets December 23, 2021 05:56 Inactive
@augan-rymkhan augan-rymkhan temporarily deployed to more-secrets December 23, 2021 06:13 Inactive
@augan-rymkhan augan-rymkhan temporarily deployed to more-secrets December 23, 2021 06:21 Inactive
@augan-rymkhan
Copy link
Contributor Author

augan-rymkhan commented Dec 23, 2021

/test connector=connectors/source-google-search-console

🕑 connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1614423685
❌ connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1614423685
🐛 https://gradle.com/s/z5lf3ybxreur4
Python short test summary info:

=========================== short test summary info ============================
FAILED test_incremental.py::TestIncremental::test_state_with_abnormally_large_values[inputs0]
======================== 1 failed, 16 passed in 28.54s =========================

@jrhizor jrhizor temporarily deployed to more-secrets December 23, 2021 06:23 Inactive
@augan-rymkhan augan-rymkhan temporarily deployed to more-secrets December 23, 2021 09:11 Inactive
@augan-rymkhan
Copy link
Contributor Author

augan-rymkhan commented Dec 23, 2021

/test connector=connectors/source-google-search-console

🕑 connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1614926494
✅ connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1614926494
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                            Stmts   Miss  Cover
	 -----------------------------------------------------------------------------------
	 source_google_search_console/__init__.py                            2      0   100%
	 source_google_search_console/service_account_authenticator.py      14      6    57%
	 source_google_search_console/source.py                             37     22    41%
	 source_google_search_console/streams.py                           117     28    76%
	 -----------------------------------------------------------------------------------
	 TOTAL                                                             170     56    67%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      6    92%
	 source_acceptance_test/conftest.py                     109    109     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              242     96    60%
	 source_acceptance_test/tests/test_full_refresh.py       38      0   100%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     17    69%
	 source_acceptance_test/utils/compare.py                 62     23    63%
	 source_acceptance_test/utils/connector_runner.py       110     48    56%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  979    404    59%

@jrhizor jrhizor temporarily deployed to more-secrets December 23, 2021 09:18 Inactive
@augan-rymkhan augan-rymkhan temporarily deployed to more-secrets December 23, 2021 09:32 Inactive
@augan-rymkhan
Copy link
Contributor Author

augan-rymkhan commented Dec 23, 2021

/test connector=connectors/source-google-search-console

🕑 connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1614991396
✅ connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1614991396
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                            Stmts   Miss  Cover
	 -----------------------------------------------------------------------------------
	 source_google_search_console/__init__.py                            2      0   100%
	 source_google_search_console/service_account_authenticator.py      14      6    57%
	 source_google_search_console/source.py                             37     22    41%
	 source_google_search_console/streams.py                           119     28    76%
	 -----------------------------------------------------------------------------------
	 TOTAL                                                             172     56    67%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      6    92%
	 source_acceptance_test/conftest.py                     109    109     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              242     96    60%
	 source_acceptance_test/tests/test_full_refresh.py       38      0   100%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     17    69%
	 source_acceptance_test/utils/compare.py                 62     23    63%
	 source_acceptance_test/utils/connector_runner.py       110     48    56%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  979    404    59%

@jrhizor jrhizor temporarily deployed to more-secrets December 23, 2021 09:37 Inactive
@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Dec 23, 2021
@augan-rymkhan augan-rymkhan temporarily deployed to more-secrets December 23, 2021 10:12 Inactive
@augan-rymkhan augan-rymkhan temporarily deployed to more-secrets December 23, 2021 10:38 Inactive
@augan-rymkhan augan-rymkhan temporarily deployed to more-secrets December 23, 2021 11:35 Inactive
Copy link
Contributor

@vitaliizazmic vitaliizazmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about rate limits? If we sync stream by day, will rate limits exceeded?

end_date = self._get_end_date()

if start_date > end_date:
yield from [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use yield instead of yield from list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this yield from.

start_date = self._get_start_date(stream_state, site_url, search_type)
end_date = self._get_end_date()

if start_date > end_date:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If start date greater than end date, you can set start date instead of duplicate yield dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"end_date": next_end.to_date_string(),
}
# add 1 day for the next slice's start date not to duplicate data from previous slice's end date.
next_start = next_end + pendulum.Duration(days=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you add 1 day instead of period?

Copy link
Contributor Author

@augan-rymkhan augan-rymkhan Dec 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vitaliizazmic
The period is added here

Without this line
next_start = next_end + pendulum.Duration(days=1)

The two slices will intersect, then user gets duplicated records:
{"start_date": "2021-09-01", "end_date": "2021-09-02"}
{"start_date": "2021-09-02", "end_date": "2021-09-03"}

@augan-rymkhan
Copy link
Contributor Author

augan-rymkhan commented Dec 28, 2021

/test connector=connectors/source-google-search-console

🕑 connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1630900873
✅ connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1630900873
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                            Stmts   Miss  Cover
	 -----------------------------------------------------------------------------------
	 source_google_search_console/__init__.py                            2      0   100%
	 source_google_search_console/service_account_authenticator.py      14      6    57%
	 source_google_search_console/source.py                             37     22    41%
	 source_google_search_console/streams.py                           119     28    76%
	 -----------------------------------------------------------------------------------
	 TOTAL                                                             172     56    67%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      6    92%
	 source_acceptance_test/conftest.py                     109    109     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              242     96    60%
	 source_acceptance_test/tests/test_full_refresh.py       38      0   100%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     17    69%
	 source_acceptance_test/utils/compare.py                 62     23    63%
	 source_acceptance_test/utils/connector_runner.py       110     48    56%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  979    404    59%

@augan-rymkhan augan-rymkhan temporarily deployed to more-secrets December 28, 2021 15:02 Inactive
@augan-rymkhan
Copy link
Contributor Author

What about rate limits? If we sync stream by day, will rate limits exceeded?

@vitaliizazmic I did not face rate limit. In this PR, range days is set to 2. One query will fetch records for 2 days.
{"start_date": "2021-09-01", "end_date": "2021-09-02"}

We can increase it to 3 days.

@jrhizor jrhizor temporarily deployed to more-secrets December 28, 2021 15:03 Inactive
Copy link
Contributor

@vitaliizazmic vitaliizazmic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but just in case check syncing for long period.

@augan-rymkhan augan-rymkhan temporarily deployed to more-secrets December 30, 2021 06:12 Inactive
…date-range

# Conflicts:
#	docs/integrations/sources/google-search-console.md
@augan-rymkhan augan-rymkhan temporarily deployed to more-secrets December 30, 2021 06:19 Inactive
@augan-rymkhan augan-rymkhan temporarily deployed to more-secrets December 30, 2021 06:23 Inactive
@augan-rymkhan
Copy link
Contributor Author

augan-rymkhan commented Dec 30, 2021

/test connector=connectors/source-google-search-console

🕑 connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1636597979
✅ connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1636597979
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      6    92%
	 source_acceptance_test/conftest.py                     109    109     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              242     96    60%
	 source_acceptance_test/tests/test_full_refresh.py       38      0   100%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     17    69%
	 source_acceptance_test/utils/compare.py                 62     23    63%
	 source_acceptance_test/utils/connector_runner.py       110     48    56%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  979    404    59%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                            Stmts   Miss  Cover
	 -----------------------------------------------------------------------------------
	 source_google_search_console/__init__.py                            2      0   100%
	 source_google_search_console/service_account_authenticator.py      14      6    57%
	 source_google_search_console/source.py                             37     22    41%
	 source_google_search_console/streams.py                           119     28    76%
	 -----------------------------------------------------------------------------------
	 TOTAL                                                             172     56    67%

@jrhizor jrhizor temporarily deployed to more-secrets December 30, 2021 06:47 Inactive
@augan-rymkhan
Copy link
Contributor Author

augan-rymkhan commented Dec 30, 2021

/publish connector=connectors/source-google-search-console

🕑 connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1636623361
✅ connectors/source-google-search-console https://github.com/airbytehq/airbyte/actions/runs/1636623361

@jrhizor jrhizor temporarily deployed to more-secrets December 30, 2021 06:59 Inactive
@augan-rymkhan augan-rymkhan temporarily deployed to more-secrets December 30, 2021 07:13 Inactive
@augan-rymkhan augan-rymkhan merged commit c135e00 into master Dec 30, 2021
@augan-rymkhan augan-rymkhan deleted the arymkhan/google-search-console-slicing-by-date-range branch December 30, 2021 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐛 Source Google Search Console: DefaultBackoffException while the rate limit is not reached in full refresh
4 participants