Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: no public method to retry/resubmit an existing job #5555

Closed
yan-hic opened this issue Jun 29, 2018 · 6 comments
Closed

BigQuery: no public method to retry/resubmit an existing job #5555

yan-hic opened this issue Jun 29, 2018 · 6 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@yan-hic
Copy link

yan-hic commented Jun 29, 2018

There is currently no public method to resubmit an existing job for instance when a transient error
like 403 rateLimitExceeded is returned.

As a workaround, we currently use the following, which works for query and load :

bq = bigquery.Client()    
job_ref = bq.get_job('jobid_that_failed_because_of_ratelimit')._build_resource()
job_ref['jobReference']['jobId'] = str(uuid4())

# clear properties set by job response
 if 'query' in job_ref['configuration']:
             del (job_ref['configuration']['query']['destinationTable'])

  new_job = bq.job_from_resource(job_ref)
  new_job._begin()

Request is to have a method that works for any job type and is included in core package - as above private methods may change

@yan-hic yan-hic changed the title BigQuery: add rateLimitExceeded as transient error to retry BigQuery: no public method to retry/resubmit an existing job Jun 29, 2018
@tseaver tseaver added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. api: bigquery Issues related to the BigQuery API. labels Jun 29, 2018
@tseaver
Copy link
Contributor

tseaver commented Jun 29, 2018

As luck with have it, PR #5552 (merged yesterday) moves the DEFAULT_RETRY policy into a new, public module: it does indeed retry rateLimitExceeded errors, and can be customized by users, e.g.:

from google.cloud.bigquery import retry

my_retry = retry.DEFAULT_RETRY.with_deadline(30)

@tseaver tseaver closed this as completed Jun 29, 2018
@yan-hic
Copy link
Author

yan-hic commented Jun 29, 2018

Not sure the PR helps. The retry is for inserting a job, not for resubmitting an existing job.
As per Tim: https://stackoverflow.com/questions/49926546/retry-an-update-in-bigquery-does-not-seem-to-work

@tseaver tseaver reopened this Jun 29, 2018
@dillonjohnson
Copy link

We have been retrying existing jobs previously by taking a job config and resubmitting it with a new job id. This seemed to work previously, but we are now experiencing a new issue.

The _begin() method makes a call to _to_api_repr() which is now Abstract and raises an error.

    def to_api_repr(self):
        """Generate a resource for the job."""
        raise NotImplementedError("Abstract")

    def _begin(self, client=None, retry=DEFAULT_RETRY):
        """API call:  begin the job via a POST request
        See
        https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/insert
        :type client: :class:`~google.cloud.bigquery.client.Client` or
                      ``NoneType``
        :param client: the client to use.  If not passed, falls back to the
                       ``client`` stored on the current dataset.
        :type retry: :class:`google.api_core.retry.Retry`
        :param retry: (Optional) How to retry the RPC.
        :raises: :exc:`ValueError` if the job has already begin.
        """
        if self.state is not None:
            raise ValueError("Job already begun.")

        client = self._require_client(client)
        path = "/projects/%s/jobs" % (self.project,)

        # jobs.insert is idempotent because we ensure that every new
        # job has an ID.
        api_response = client._call_api(
            retry, method="POST", path=path, data=self.to_api_repr()
        )
        self._set_properties(api_response)

If this is expected behavior, how should this be circumvented?

@tseaver
Copy link
Contributor

tseaver commented Mar 19, 2019

@dillonjohnson Rather than _AsyncJob, you need to instantiate one of the concrete job classes (e.g., CopyJob, QueryJob, etc.)

@yan-hic
Copy link
Author

yan-hic commented Sep 2, 2019

No much traction but I wonder if an optional arg resubmit_on_quota_hit in the result() function would not be ideal altogether.
Downside (is it ?) is that it would change the underlying job properties like jobId

@tswast
Copy link
Contributor

tswast commented Nov 12, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

4 participants