Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The http object doesn't need to be service specific #5

Closed
dhermes opened this issue Dec 20, 2015 · 8 comments
Closed

The http object doesn't need to be service specific #5

dhermes opened this issue Dec 20, 2015 · 8 comments

Comments

@dhermes
Copy link

dhermes commented Dec 20, 2015

Your README says

The following connections are available:

  • bigquery_connection
  • datastore_connection
  • dns_connection
  • pubsub_connection
  • resource_manager_connection
  • storage_connection

but they all just assume that http.request is defined (with a given signature), so no special casing needs to be done. (Caveat, storage.Blob.generate_signed_url ALSO requires a credentials object.)

@Bogdanp
Copy link
Contributor

Bogdanp commented Dec 20, 2015

You are correct. This made more sense when we only supported gcloud 0.7.0 where the API for Clients only allowed for Connection objects to be passed in on init whereas in 0.8.0 we can now pass the http object. It should be fairly easy to fix the API, however we will still end up with at least one special-cased http class so we can handle certain types of retries for Datastore (see https://github.com/LeadPages/gcloud_requests/blob/master/gcloud_requests/requests_connection.py#L103).

I think credential handling is also not 100% correct right now, if you pass in a set of credentials to your client they will be ignored in favor of the ones generated for our connection objects, but this seems like it'd be harder to solve in a nice way since Connections only authorize http objects if they don't exist so, presumably, one should always generate a credentials instance and use that to authorize the http object before passing both into the client.

@Bogdanp Bogdanp closed this as completed Dec 20, 2015
@Bogdanp
Copy link
Contributor

Bogdanp commented Dec 20, 2015

Er, didn't mean to close it :P

@Bogdanp Bogdanp reopened this Dec 20, 2015
@dhermes
Copy link
Author

dhermes commented Dec 20, 2015

Gotcher.

That class seems to have some very specific values. I'd be interested in bringing some of it upstream, though I'm curious about some of the values. Have you given feedback to the datastore folks that 500's and 503's seem to be common? I'm surprised to see 403 and 409 in there though, as those should be well-defined / don't need a retry.

@bendemaree
Copy link
Contributor

I'd be interested in bringing some of it upstream

👍 We'd love to see those handled upstream.

For the most part, we implemented the retries and values as listed in the error handling docs but the backoff function and retry counts are our own where that doc doesn't specify a maximum.

  • We've seen higher numbers of 500s and 503s in production than we'd like; bumping up this count made our overall error rate drop significantly, though we haven't opened a ticket yet on it (soon!).
  • 403 is awkward because it's handled differently in two different cases. The retries are made for the "deadline exceeded" case and the number of retries is arbitrary right now.
  • 409 for contention is thrown sporadically (expectedly?) when we have large request load on items in contention. We're not concerned about it at this point, but retrying it did smooth out error rates.

@dhermes
Copy link
Author

dhermes commented Dec 22, 2015

I was unaware that 403 was overloaded by the service. That seems scary to me and I'd say if we were to hard-code the logic in, we'd want to do a bit more (like check that the 403 is for a DEADLINE_EXCEEDED rather than for a Forbidden / PERMISSION_DENIED, e.g. a token for the wrong project).

It seems the 409 retry behavior should be built into your code that does transactions rather than the connection itself.

@Bogdanp
Copy link
Contributor

Bogdanp commented Dec 22, 2015

like check that the 403 is for a DEADLINE_EXCEEDED rather than for a Forbidden / PERMISSION_DENIED, e.g. a token for the wrong project

The code currently does this in reverse: it checks if PERMISSION_DENIED is among the error reasons and aborts the retry altogether if it is. I don't think we've encountered a Forbidden (as in a 403 w/o a JSON payload describing the error) yet but I agree that changing it so that we only retry when we get a DEADLINE_EXCEEDED is for the best.

Re the Forbidden case: do you think we should only retry responses according to their error reasons (i.e. only responses that have a well-formed JSON payload)? The docs we mentioned are a bit light on these sorts of details (although their examples do seem to assume that there will always be a valid payload in the response).

It seems the 409 retry behavior should be built into your code that does transactions rather than the connection itself.

Could gcloud expose a more meaningful exception in that situation rather than just Conflict or is that out of scope? In doing this at the client level right now, one would either have to assume that all Conflicts mean that a Transaction was ABORTED (potentially a fair assumption to make) or check the Conflict's error list for the ABORTED reason.

@bendemaree
Copy link
Contributor

So, expanding on what @Bogdanp noted above, the enhancement here would have to involve the end user always specifying both the http and credentials parameters when instantiating a new Client (or subclass), because both must be specified to be useful.

So for example (inventing a new thing called requests_http that would unify all of our *_connections here):

from gcloud import datastore, credentials
from gcloud_requests.connection import requests_http

gcloud_credentials = credentials.get_credentials()
client = datastore.Client(credentials=gcloud_credentials, http=requests_http)

The issue then is that if the user (for whatever reason) does not also supply credentials, the Connection will get None and will remain None and never be authorize()'d. It also clutters the setup a bit from what it is now. My gut feeling is that the alternative is something like what we have now: pre-bake, in some fashion, the connection on the http object with credentials.

We could parameterize it; something like:

from gcloud import datastore
from gcloud_requests.client import RequestsClient

client = RequestsClient(client=datastore.Client)

But, we'd need to either special-case datastore (if client is datastore.Client) or provide some way to specify that (client = DatastoreRequestsClient(client=datastore.Client) or client = RequestsClient(client=datastore.Client, connection=RequestsDatastoreConnection)).

But then of course, if all of the datastore-specific work moves upstream, this last point is negated.

Thoughts or preferences from either of you? @Bogdanp @dhermes

Also, I think we should open new issues for any remaining questions on retries and enhancing gcloud-python's exceptions. I want to keep the original issue moving. 😄 Are there any outstanding questions?

@dhermes
Copy link
Author

dhermes commented Jan 5, 2016

I don't have preferences for the current set-up or your suggestion. I think we'd need to see what it'd look like if we make a custom class. In googleapis/google-cloud-python#1274 @jonparrott has made us re-consider the current set-up so hopefully after the dust clears there, we can have a better idea of how to over-ride without having to have a credentials object to authorize with.

@Bogdanp Bogdanp closed this as completed Jun 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants