-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Cosmos] split proof queries sync client #22237
Conversation
* Removed some stuff * Looking at constructors * Updated request * Added client close * working client creation Co-authored-by: simorenoh <[email protected]>
database read works, but ignored exception is returned: Fatal error on SSL transport NoneType has no attribute 'send' (_loop._proactor.send) RuntimeError: Event loop is closed Unclosed connector/ connection
Added methods needed to use async with when initializing client, but logs output "Exception ignored... Runtime Error: Event loop is closed"
Move branch into local fork
missing upsert and other resources
missing read_all_items and both query methods for container class
Complete item CRUD functionality - only missing queries
Query functionality and container query methods
also fixed README and added examples_async
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/document_producer.py
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/multi_execution_aggregator.py
Outdated
Show resolved
Hide resolved
# return False to raise error to multi_execution_aggregator and repair document producer | ||
self.client.refresh_routing_map_provider() | ||
return False | ||
if self.current_retry_attempt_count < self._max_retry_attempt_count: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we only checking the current retry attempt count in this if condition, and not when updating the partition key range cache.
Let's say if we keep getting gone exception, this if condition will never hit, and we will never get out of this retry policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was returning a False immediately for sub status 1002 in order to further handle the exception on a first-try basis, since we need the exception to bubble up in order to repair the execution context.
I think this is the way to go, but I see what you mean. I'll add a flag that only returns False pre-emptively if it's the first time we run into a 1002 with that instance of the gone retry policy, and if it's not the first time then continues with the normal attempts.
@@ -129,6 +136,32 @@ def fetch_next_block(self): | |||
|
|||
raise NotImplementedError("You should use pipeline's fetch_next_block.") | |||
|
|||
def _repair_document_producer(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to add some documentation to this function, like what it does and how it repairs the document producers.
@@ -0,0 +1,82 @@ | |||
import time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why have we added this test in samples ?
Also, I see license header is missing for this one and some other samples, we should add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this in the PR description, but I'm just keeping this here for now while I get approvals and will remove afterwards.
Edit: I have now added a test to follow the way the Java SDK tests for a partition split.
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/document_producer.py
Outdated
Show resolved
Hide resolved
This pull request is protected by Check Enforcer. What is Check Enforcer?Check Enforcer helps ensure all pull requests are covered by at least one check-run (typically an Azure Pipeline). When all check-runs associated with this pull request pass then Check Enforcer itself will pass. Why am I getting this message?You are getting this message because Check Enforcer did not detect any check-runs being associated with this pull request within five minutes. This may indicate that your pull request is not covered by any pipelines and so Check Enforcer is correctly blocking the pull request being merged. What should I do now?If the check-enforcer check-run is not passing and all other check-runs associated with this PR are passing (excluding license-cla) then you could try telling Check Enforcer to evaluate your pull request again. You can do this by adding a comment to this pull request as follows: What if I am onboarding a new service?Often, new services do not have validation pipelines associated with them, in order to bootstrap pipelines for a new service, you can issue the following command as a pull request comment: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have a look to the comments I added , apart from that LGTM
self.client.refresh_routing_map_provider() | ||
self.refresh_partition_key_range_cache = False | ||
return False | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we want to retry after one attempt without refreshing the cache . PartitionKeyRangeGoneRetryPolicy in V4 Java SDK only retrying one time https://github.com/Azure/azure-sdk-for-java/blob/c4c068cc0d2061df9a4fcc49419acc10be223d4a/sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/PartitionKeyRangeGoneRetryPolicy.java#L60
print("created items, waiting 4m before queries") | ||
print("change offer to 11k manually") | ||
print("--------------------------------") | ||
time.sleep(240) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current test will not guarantee that your split completed , please refer https://github.com/Azure/azure-sdk-for-java/blob/c4c068cc0d2061df9a4fcc49419acc10be223d4a/sdk/cosmos/azure-cosmos/src/test/java/com/azure/cosmos/rx/QueryValidationTests.java#L397
…e-sdk-for-python into split-proof-query
from .http_constants import HttpHeaders, StatusCodes, SubStatusCodes | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove extra line
@@ -32,8 +32,10 @@ | |||
from . import _resource_throttle_retry_policy | |||
from . import _default_retry_policy | |||
from . import _session_retry_policy | |||
from . import _gone_retry_policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: usually the import statements should be sorted in java & SDK, do we need to sort in Python as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but this is something that I need to do probably in every file, will take care of all of it in one PR later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the fix 👍
* initial commit * Client Constructor (Azure#20310) * Removed some stuff * Looking at constructors * Updated request * Added client close * working client creation Co-authored-by: simorenoh <[email protected]> * read database database read works, but ignored exception is returned: Fatal error on SSL transport NoneType has no attribute 'send' (_loop._proactor.send) RuntimeError: Event loop is closed Unclosed connector/ connection * Update simon_testfile.py * with coroutine Added methods needed to use async with when initializing client, but logs output "Exception ignored... Runtime Error: Event loop is closed" * Update simon_testfile.py * small changes * async with returns no exceptions * async read container * async item read * cleaning up * create item/ database methods * item delete working * docs replace functionality missing upsert and other resources * upsert functionality missing read_all_items and both query methods for container class * missing query methods * CRUD for udf, sproc, triggers * initial query logic + container methods * missing some execution logic and tests * oops * fully working queries * small fix to query_items() also fixed README and added examples_async * Update _cosmos_client_connection_async.py * Update _cosmos_client_connection.py * documentation update * updated MIT dates and get_user_client() description * Update CHANGELOG.md * Delete simon_testfile.py * leftover retry utility * Update README.md * docs and removed six package * changes based on comments still missing discussion resolution on SSL verification and tests for async functionality under test module (apart from samples which are basically end to end tests) * small change in type hints * updated readme * fixes based on conversations * added missing type comments * update changelog for ci pipeline * added typehints, moved params into keywords, added decorators, made _connection_policy private * changes based on sync with central sdk * remove is_system_key from scripts (only used in execute_sproc) is_system_key verifies that an empty partition key is properly dealt with if ['partitionKey']['systemKey'] exists in the container options - however, we do not allow containers to be created with empty partition key values in the python sdk, so the functionality is needless * Revert "remove is_system_key from scripts (only used in execute_sproc)" Reverting last commit, will find way to init is_system_key for now * async script proxy using composition * pylint * capitalized constants * Apply suggestions from code review Clarifying comments for README Co-authored-by: Gahl Levy <[email protected]> * closing python code snippet * last doc updates * Update sdk/cosmos/azure-cosmos/CHANGELOG.md Co-authored-by: Simon Moreno <[email protected]> * version update * cosmos updates for release * current state gone_retry_policy might end up being unneccesary, based on what we feel is best from an arch standpoint * working split proof, need to remove prints * improving comments and removing print statements * removed last prints and used constants * Update CHANGELOG.md * small fixes based on comments * addressed more comments * added test, made slight changes * rename test and small changes * pylint * pylintpylintpylint * moved partition_range_gone check to exceptions since makes more sense * re use code Co-authored-by: annatisch <[email protected]> Co-authored-by: Gahl Levy <[email protected]> Co-authored-by: Travis Prescott <[email protected]>
Addressing the current gap in the cosmos sdk not being split proof
The gone_retry_policy is used exclusively to refresh the partition key range cache and raise the exception immediately in order to repair the document producers and not waste time retrying several times.
One note on these changes is that I still need to remove certain print statements and comments but want to get input first so I can keep logging the right things in case some further iteration is needed.
I also left the testing file that I was using (
test_xq.py
) in order to walk through the process for triggering a partition split that I was using, but I will remove this file once the changes are approved.