Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry requests.exceptions.ConnectionError #1929

Closed
nitishxp opened this issue May 22, 2024 · 6 comments · Fixed by #1930
Closed

Retry requests.exceptions.ConnectionError #1929

nitishxp opened this issue May 22, 2024 · 6 comments · Fixed by #1930
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: question Request for information or clarification. Not an issue.

Comments

@nitishxp
Copy link

nitishxp commented May 22, 2024

Hi Team,

Could you add a retry to this exception? We are running this code in Cloud Function and GKE infrastructure from time to time we get these errors

Bigquery SDK == google-cloud-bigquery==3.18.0

Error Type: <class 'requests.exceptions.ConnectionError'> error: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')) 
Traceback (most recent call last): File "/workspace/visionCommon/common.py", line 117, 
in wrapper return func(*args, **kwargs) File "/workspace/main.py", line 576, in controller current_step = check_eligibility() File "/workspace/main.py", line 314, 
in check_eligibility total_rows = service.execute_query(query).result().total_rows File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/cloud/bigquery/job/query.py", line 1595, 
in result do_get_result() File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/api_core/retry/retry_unary.py", line 293, 
in retry_wrapped_func return retry_target( File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/api_core/retry/retry_unary.py", line 153, 
in retry_target _retry_error_helper( File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/api_core/retry/retry_base.py", line 212, 
in _retry_error_helper raise final_exc from source_exc File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/api_core/retry/retry_unary.py", line 144, 
in retry_target result = target() File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/cloud/bigquery/job/query.py", line 1584, 
in do_get_result super(QueryJob, self).result(retry=retry, timeout=timeout) File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/cloud/bigquery/job/base.py", line 971, 
in result return super(_AsyncJob, self).result(timeout=timeout, **kwargs) File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/api_core/future/polling.py", line 256, 
in result self._blocking_poll(timeout=timeout, retry=retry, polling=polling) File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/cloud/bigquery/job/query.py", line 1326, 
in _blocking_poll super(QueryJob, self)._blocking_poll(timeout=timeout, **kwargs) File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/api_core/future/polling.py", line 137, 
in _blocking_poll polling(self._done_or_raise)(retry=retry) File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/api_core/retry/retry_unary.py", line 293, 
in retry_wrapped_func return retry_target( File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/api_core/retry/retry_unary.py", line 153, 
in retry_target _retry_error_helper( File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/api_core/retry/retry_base.py", line 212, 
in _retry_error_helper raise final_exc from source_exc File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/api_core/retry/retry_unary.py", line 144, 
in retry_target result = target() File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/cloud/bigquery/job/query.py", line 1448, 
in _done_or_raise self._reload_query_results(retry=retry, timeout=transport_timeout) File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/cloud/bigquery/job/query.py", line 1429, 
in _reload_query_results self._query_results = self._client._get_query_results( File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 1936, 
in _get_query_results resource = self._call_api( File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 827, 
in _call_api return call() File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/cloud/_http/__init__.py", line 482, 
in api_request response = self._make_request( File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/cloud/_http/__init__.py", line 341, 
in _make_request return self._do_request( File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/cloud/_http/__init__.py", line 379, 
in _do_request return self.http.request( File "/layers/google.python.pip/pip/lib/python3.9/site-packages/google/auth/transport/requests.py", line 541, 
in request response = super(AuthorizedSession, self).request( File "/layers/google.python.pip/pip/lib/python3.9/site-packages/requests/sessions.py", line 589, 
in request resp = self.send(prep, **send_kwargs) File "/layers/google.python.pip/pip/lib/python3.9/site-packages/requests/sessions.py", line 703, 
in send r = adapter.send(request, **kwargs) File "/layers/google.python.pip/pip/lib/python3.9/site-packages/requests/adapters.py", line 501, 
in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label May 22, 2024
@tswast tswast added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label May 22, 2024
@tswast
Copy link
Contributor

tswast commented May 22, 2024

Looks like we already retry this at at the API request level here: https://github.com/googleapis/python-bigquery/blob/main/google/cloud/bigquery/retry.py#L32

It looks like the failure you're seeing is at the "retry the query" layer, which we need to be much more careful about. If your query is not idempotent (e.g. some DML & DDL queries), we don't want to retry without knowing the job has failed.

I would encourage updating to the latest version, as I made sure more of the API requests in the "wait for the query to finish" code path use our API-level retries in #1900.

@tswast tswast added type: question Request for information or clarification. Not an issue. and removed type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels May 22, 2024
@nitishxp
Copy link
Author

nitishxp commented May 23, 2024

Hi @tswast the query is idempotent how to force it to retry?
I have started to see this issue more frequently as on 20th there was some BigQuery Issue in us region

@tswast
Copy link
Contributor

tswast commented May 23, 2024

I would recommend passing in a custom value for the job_retry parameter.

See:

DEFAULT_JOB_RETRY = retry.Retry(
for what it currently is.

That said, I'm not sure why the call to jobs.getQueryResults isn't being retried here. I'll do some more investigation.

@tswast
Copy link
Contributor

tswast commented May 23, 2024

I think I'm able to reproduce this at HEAD with the following test:

def test_retry_connection_error_with_default_retry_and_job_retry(monkeypatch, client):
    """
    Make sure ConnectionError can be retried at `is_job_done` level, even if
    retries are exhaused by API-level retry.

    Note: Because restart_query_job is set to True only in the case of a
    confirmed job failure, this should be safe to do even when a job is not
    idempotent.

    Regression test for issue
    https://github.com/googleapis/python-bigquery/issues/1929
    """
    job_counter = 0

    def make_job_id(*args, **kwargs):
        nonlocal job_counter
        job_counter += 1
        return f"{job_counter}"

    monkeypatch.setattr(_job_helpers, "make_job_id", make_job_id)
    conn = client._connection = make_connection()
    project = client.project
    job_reference_1 = {"projectId": project, "jobId": "1", "location": "test-loc"}
    NUM_API_RETRIES = 2

    with freezegun.freeze_time(
        "2024-01-01 00:00:00",
        # Note: because of exponential backoff and a bit of jitter,
        # NUM_API_RETRIES will get less accurate the greater the value.
        # We add 1 because we know there will be at least some additional
        # calls to fetch the time / sleep before the retry deadline is hit.
        auto_tick_seconds=(
            google.cloud.bigquery.retry._DEFAULT_RETRY_DEADLINE / NUM_API_RETRIES
        )
        + 1,
    ):
        conn.api_request.side_effect = [
            # jobs.insert
            {"jobReference": job_reference_1, "status": {"state": "PENDING"}},
            # jobs.get
            {"jobReference": job_reference_1, "status": {"state": "RUNNING"}},
            # jobs.getQueryResults x2
            requests.exceptions.ConnectionError(),
            requests.exceptions.ConnectionError(),
            # jobs.get
            # Job actually succeeeded, so we shouldn't be restarting the job,
            # even though we are retrying at the `is_job_done` level.
            {"jobReference": job_reference_1, "status": {"state": "DONE"}},
        ]

        job = client.query("select 1")
        job.result()

It never gets to the final jobs.get call, I think because _job_should_retry is returning False for RetryError.cause of type ConnectionError. I think because we have separate logic for when we should restart the query versus retry at this layer, it should be safe to retry here. That said, if we get here it's because some API request has already hit its retry timeout of 600 seconds, so I'm not sure how much the second layer of retries will help.

@tswast
Copy link
Contributor

tswast commented May 23, 2024

Fix awaiting review: #1930

@nitishxp
Copy link
Author

@tswast Thank you again for your quick response and resolution to the issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants