-
Notifications
You must be signed in to change notification settings - Fork 430
Add multiprocess file storage. #504
Add multiprocess file storage. #504
Conversation
_backends_lock = threading.Lock() | ||
|
||
|
||
@util.positional(1) |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
After chatting with @thobrla in person, I have a better grasp of how this really needs to work. I'll work on updating this a bit over the weekend to solve some of the behavior problems, feedback is still welcome. |
Discussed a bit with @jonparrott offline. I don't think taking a binary dependency is feasible, at least not in the near future. It might be possible in the future if we condense to a single install mechanism that can install the binary dependency, but I think that's a long ways off. I don't think we need a lot of logic in multistore_file[_storage]. All we really want is an effective and portable cache to limit the amount of HTTP requests we need to make to the OAuth endpoint. The logic, loosely, is:
If we make an extra HTTP request here or there due to lock contention or timeout, that's not a big deal. The scenario we want to avoid is sending 100's of HTTP requests in a short span because all of our processes and threads are trying to refresh credentials at once. |
e3db6aa
to
3067a17
Compare
@nathanielmanistaatgoogle I am tentatively putting this forward for initial review. For the first round, please focus on high-level thoughts- we can get into details once you're comfortable with the API surface. @thobrla this has been adjust as per our conversation and this now works quite differently from This module provides the MultiprocessFileStorage class that:
Process & thread safety guarantees the following behavior:
|
Approach seems reasonable to me at a high-level; I'll save low-level comments until after @nathanielmanistaatgoogle 's review. I'd like to test this out with gsutil, but probably will not have an opportunity to do so for a couple of weeks as I'm working on trying to add compatbility across oauth2client 1.5.2 and 2.x+ |
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
"""File-based storage that supports multiple credentials and cross-process |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
FYI, testing this out is currently on the gsutil backlog - I want to take it for a spin but it's nontrivial and there are many work items in front of it. |
""" | ||
credential = self._backend.locked_get(self._key) | ||
|
||
if credential: |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
I'm wary of a catch-all-exceptions for all the usual reasons to be wary of a catch-all-exceptions: I don't think we should be writing code that catches Exceptions coming out of the underlying |
We can likely get away with catching what |
The problem is that the intended use of credential storage is to act as a cache. Getting an exception of any kind when accessing storage effectively amounts to a cache miss. Should callers be required to handle that? Do we want to find out in our application much later on that fasteners could throw some exception that we didn't anticipate in rare cases? This will likely crash the application - if we don't know here which exceptions to handle, then how could the application know? In this case, I'd much prefer to log the exception that led to the cache miss and fall back to refreshing the credentials. If that also fails, then we can surface an exception to the caller. |
At the locking level: It seems like fasteners broadly catches IOError. This should be parity with what we do. At the storage level: We catch a subset of Thoughts @nathanielmanistaatgoogle @thobrla? We can always release this and give @thobrla time to test it thoroughly in gsutil and we can always add on later. |
I dispute that "[g]etting an exception of any kind when accessing storage effectively amounts to a cache miss" - an exception could mean anything, from interpreter shutdown to local filesystem corruption to a programming defect in the called system to a programming defect in our own system to global thermonuclear conflict. An exception that comes from Now that I look at |
if os.path.exists(filename): | ||
return False | ||
|
||
# Equivalent to "touch". |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
Their source (and tests) indicate that they catch all relevant exceptions.
The only experience we have contrariwise in I'm +1 on getting this in and seeing what happens in the real world and I'm happy to work with you @thobrla to get this integrated into gsutil and tested. |
Looks nearly good to me. |
The point about the types of exceptions thrown is salient. However, what's the exception model for a consuming application in this case? The problem with the "wait and see" approach is that every real world scenario we see translates to a reliability failure from the user's perspective; a failure that generally could have been avoided via refreshing the credential. This is exactly what we saw with I guess the real question is: does the application have control over attempting to refresh the credential as a fallback mechanism? Do they need to wrap every HTTP request with catch-all logic if they want this behavior? |
The only class of errors not handled in this new code seem to be
Not directly, but you can with your own storage class that wraps this one. |
Ready (to me, at least) to merge; make sure to follow the rules with commits that get formed on merge. |
@nathanielmanistaatgoogle will do so when I merge it. Do you think it's worthwhile for you to add that into CONTRIBUTING? Or is it unnecessary since you, me, and Danny are in full control over the commit message for merges? |
I'm fine with wrapping it, so generally looks good. |
We should add that link to |
@thobrla I'm happy to help you write whatever glue code is needed, and upstream whatever seems necessary. :) |
@nathanielmanistaatgoogle will merge after travis is happy. There will be a follow-up PR to make |
Totes cool. |
@thobrla tentative plan: oauth2client 3.0.0 will warn about usage of |
Sounds good. Thanks for putting this together, I'm excited to try it out (when I can find time). |
Towards #470
This is not ready for merge, this is an initial sketch of what this would look like. Please note:
multistore_file
, I am introducing a new module to cover the same functionality namedmultiprocess_file_storage
. We can do a release to deprecatemultistore_file
in favor ofmultiprocess_file_storage
and then another release to remove it.get_storage_{}
functions, there is now justMultiprocesFileStorage(filename, key)
. The key argument can be used to emulate how the previous helper methods worked, eg.key = '{}-{}-{}'.format(client_id, user_agent, scope)
.@thobrla, while writing this, I came across some interesting behavioral cases:
put
orget
, but if at any point in time it fails to acquire the lock it will switch to read only mode and will never try locking again.It seems that
multistore_file
never lived up to its promises, and neither will this class. As I mentioned before, a sqlite-based storage would be a true solution but it would require the cloud sdk and gsutil to take on a third-party binary dependency, which may be untenable for you. What are your thoughts?