[Cosmos] split proof queries sync client #22237

simorenoh · 2021-12-20T23:26:22Z

Addressing the current gap in the cosmos sdk not being split proof

The gone_retry_policy is used exclusively to refresh the partition key range cache and raise the exception immediately in order to repair the document producers and not waste time retrying several times.

One note on these changes is that I still need to remove certain print statements and comments but want to get input first so I can keep logging the right things in case some further iteration is needed.

I also left the testing file that I was using (test_xq.py) in order to walk through the process for triggering a partition split that I was using, but I will remove this file once the changes are approved.

* Removed some stuff * Looking at constructors * Updated request * Added client close * working client creation Co-authored-by: simorenoh <[email protected]>

database read works, but ignored exception is returned: Fatal error on SSL transport NoneType has no attribute 'send' (_loop._proactor.send) RuntimeError: Event loop is closed Unclosed connector/ connection

Added methods needed to use async with when initializing client, but logs output "Exception ignored... Runtime Error: Event loop is closed"

Move branch into local fork

missing upsert and other resources

missing read_all_items and both query methods for container class

Complete item CRUD functionality - only missing queries

Query functionality and container query methods

also fixed README and added examples_async

sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/document_producer.py

sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/multi_execution_aggregator.py

sdk/cosmos/azure-cosmos/azure/cosmos/_gone_retry_policy.py

kushagraThapar · 2022-01-19T15:42:38Z

sdk/cosmos/azure-cosmos/azure/cosmos/_gone_retry_policy.py

+            # return False to raise error to multi_execution_aggregator and repair document producer
+            self.client.refresh_routing_map_provider()
+            return False
+        if self.current_retry_attempt_count < self._max_retry_attempt_count:


why are we only checking the current retry attempt count in this if condition, and not when updating the partition key range cache.
Let's say if we keep getting gone exception, this if condition will never hit, and we will never get out of this retry policy.

I was returning a False immediately for sub status 1002 in order to further handle the exception on a first-try basis, since we need the exception to bubble up in order to repair the execution context.

I think this is the way to go, but I see what you mean. I'll add a flag that only returns False pre-emptively if it's the first time we run into a 1002 with that instance of the gone retry policy, and if it's not the first time then continues with the normal attempts.

kushagraThapar · 2022-01-19T15:53:01Z

sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/multi_execution_aggregator.py

@@ -129,6 +136,32 @@ def fetch_next_block(self):

        raise NotImplementedError("You should use pipeline's fetch_next_block.")

+    def _repair_document_producer(self):


Would be good to add some documentation to this function, like what it does and how it repairs the document producers.

kushagraThapar · 2022-01-19T15:55:39Z

sdk/cosmos/azure-cosmos/samples/test_xq.py

@@ -0,0 +1,82 @@
+import time


why have we added this test in samples ?
Also, I see license header is missing for this one and some other samples, we should add it.

I mentioned this in the PR description, but I'm just keeping this here for now while I get approvals and will remove afterwards.
Edit: I have now added a test to follow the way the Java SDK tests for a partition split.

sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/document_producer.py

check-enforcer · 2022-01-19T17:48:32Z

This pull request is protected by Check Enforcer.

What is Check Enforcer?

Check Enforcer helps ensure all pull requests are covered by at least one check-run (typically an Azure Pipeline). When all check-runs associated with this pull request pass then Check Enforcer itself will pass.

Why am I getting this message?

You are getting this message because Check Enforcer did not detect any check-runs being associated with this pull request within five minutes. This may indicate that your pull request is not covered by any pipelines and so Check Enforcer is correctly blocking the pull request being merged.

What should I do now?

If the check-enforcer check-run is not passing and all other check-runs associated with this PR are passing (excluding license-cla) then you could try telling Check Enforcer to evaluate your pull request again. You can do this by adding a comment to this pull request as follows:
/check-enforcer evaluate
Typically evaulation only takes a few seconds. If you know that your pull request is not covered by a pipeline and this is expected you can override Check Enforcer using the following command:
/check-enforcer override
Note that using the override command triggers alerts so that follow-up investigations can occur (PRs still need to be approved as normal).

What if I am onboarding a new service?

Often, new services do not have validation pipelines associated with them, in order to bootstrap pipelines for a new service, you can issue the following command as a pull request comment:
/azp run prepare-pipelines
This will run a pipeline that analyzes the source tree and creates the pipelines necessary to build and validate your pull request. Once the pipeline has been created you can trigger the pipeline using the following comment:
/azp run python - [service] - ci

simplynaveen20

Please have a look to the comments I added , apart from that LGTM

simplynaveen20 · 2022-01-19T22:26:41Z

sdk/cosmos/azure-cosmos/azure/cosmos/_gone_retry_policy.py

+                self.client.refresh_routing_map_provider()
+                self.refresh_partition_key_range_cache = False
+                return False
+            return True


Why we want to retry after one attempt without refreshing the cache . PartitionKeyRangeGoneRetryPolicy in V4 Java SDK only retrying one time https://github.com/Azure/azure-sdk-for-java/blob/c4c068cc0d2061df9a4fcc49419acc10be223d4a/sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/PartitionKeyRangeGoneRetryPolicy.java#L60

simplynaveen20 · 2022-01-19T22:31:48Z

sdk/cosmos/azure-cosmos/samples/test_xq.py

+    print("created items, waiting 4m before queries")
+    print("change offer to 11k manually")
+    print("--------------------------------")
+    time.sleep(240)


Current test will not guarantee that your split completed , please refer https://github.com/Azure/azure-sdk-for-java/blob/c4c068cc0d2061df9a4fcc49419acc10be223d4a/sdk/cosmos/azure-cosmos/src/test/java/com/azure/cosmos/rx/QueryValidationTests.java#L397

…e-sdk-for-python into split-proof-query

xinlian12 · 2022-01-21T01:38:23Z

sdk/cosmos/azure-cosmos/azure/cosmos/_retry_utility.py

 from .http_constants import HttpHeaders, StatusCodes, SubStatusCodes

+


nit: remove extra line

xinlian12 · 2022-01-21T01:39:37Z

sdk/cosmos/azure-cosmos/azure/cosmos/_retry_utility.py

@@ -32,8 +32,10 @@
 from . import _resource_throttle_retry_policy
 from . import _default_retry_policy
 from . import _session_retry_policy
+from . import _gone_retry_policy


question: usually the import statements should be sorted in java & SDK, do we need to sort in Python as well?

yes but this is something that I need to do probably in every file, will take care of all of it in one PR later

xinlian12

LGTM, thanks for the fix 👍

* initial commit * Client Constructor (Azure#20310) * Removed some stuff * Looking at constructors * Updated request * Added client close * working client creation Co-authored-by: simorenoh <[email protected]> * read database database read works, but ignored exception is returned: Fatal error on SSL transport NoneType has no attribute 'send' (_loop._proactor.send) RuntimeError: Event loop is closed Unclosed connector/ connection * Update simon_testfile.py * with coroutine Added methods needed to use async with when initializing client, but logs output "Exception ignored... Runtime Error: Event loop is closed" * Update simon_testfile.py * small changes * async with returns no exceptions * async read container * async item read * cleaning up * create item/ database methods * item delete working * docs replace functionality missing upsert and other resources * upsert functionality missing read_all_items and both query methods for container class * missing query methods * CRUD for udf, sproc, triggers * initial query logic + container methods * missing some execution logic and tests * oops * fully working queries * small fix to query_items() also fixed README and added examples_async * Update _cosmos_client_connection_async.py * Update _cosmos_client_connection.py * documentation update * updated MIT dates and get_user_client() description * Update CHANGELOG.md * Delete simon_testfile.py * leftover retry utility * Update README.md * docs and removed six package * changes based on comments still missing discussion resolution on SSL verification and tests for async functionality under test module (apart from samples which are basically end to end tests) * small change in type hints * updated readme * fixes based on conversations * added missing type comments * update changelog for ci pipeline * added typehints, moved params into keywords, added decorators, made _connection_policy private * changes based on sync with central sdk * remove is_system_key from scripts (only used in execute_sproc) is_system_key verifies that an empty partition key is properly dealt with if ['partitionKey']['systemKey'] exists in the container options - however, we do not allow containers to be created with empty partition key values in the python sdk, so the functionality is needless * Revert "remove is_system_key from scripts (only used in execute_sproc)" Reverting last commit, will find way to init is_system_key for now * async script proxy using composition * pylint * capitalized constants * Apply suggestions from code review Clarifying comments for README Co-authored-by: Gahl Levy <[email protected]> * closing python code snippet * last doc updates * Update sdk/cosmos/azure-cosmos/CHANGELOG.md Co-authored-by: Simon Moreno <[email protected]> * version update * cosmos updates for release * current state gone_retry_policy might end up being unneccesary, based on what we feel is best from an arch standpoint * working split proof, need to remove prints * improving comments and removing print statements * removed last prints and used constants * Update CHANGELOG.md * small fixes based on comments * addressed more comments * added test, made slight changes * rename test and small changes * pylint * pylintpylintpylint * moved partition_range_gone check to exceptions since makes more sense * re use code Co-authored-by: annatisch <[email protected]> Co-authored-by: Gahl Levy <[email protected]> Co-authored-by: Travis Prescott <[email protected]>

simorenoh and others added 30 commits August 13, 2021 13:14

initial commit

61ba8d1

Client Constructor (#20310)

15dcceb

* Removed some stuff * Looking at constructors * Updated request * Added client close * working client creation Co-authored-by: simorenoh <[email protected]>

read database

bda95c3

database read works, but ignored exception is returned: Fatal error on SSL transport NoneType has no attribute 'send' (_loop._proactor.send) RuntimeError: Event loop is closed Unclosed connector/ connection

Update simon_testfile.py

c9648ab

with coroutine

80540dc

Added methods needed to use async with when initializing client, but logs output "Exception ignored... Runtime Error: Event loop is closed"

Update simon_testfile.py

1285438

small changes

992b0cd

async with returns no exceptions

47cb688

Merge pull request #1 from Azure/simonmoreno/async

f3fa79f

Move branch into local fork

async read container

0c49739

async item read

47f4af5

cleaning up

c97c946

create item/ database methods

fcd95db

item delete working

36c5b90

docs replace functionality

44db2a2

missing upsert and other resources

upsert functionality

ec5b6ed

missing read_all_items and both query methods for container class

Merge pull request #2 from simorenoh/item-read

d63d052

Complete item CRUD functionality - only missing queries

missing query methods

5d74c8f

CRUD for udf, sproc, triggers

89fc2f7

Merge branch 'Azure:main' into async-client

fdaa880

Merge branch 'Azure:main' into async-client

3f9baf2

Merge branch 'Azure:main' into query-functionality

d6650bc

initial query logic + container methods

043dfe0

Merge branch 'async-client' into query-functionality

befdb41

Merge pull request #3 from simorenoh/query-functionality

8cffbe2

Query functionality and container query methods

missing some execution logic and tests

72de7c8

oops

5b805b8

fully working queries

8d8d0c4

small fix to query_items()

b597ca8

also fixed README and added examples_async

Update _cosmos_client_connection_async.py

18319df

xinlian12 reviewed Jan 14, 2022

View reviewed changes

sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/document_producer.py Outdated Show resolved Hide resolved

xinlian12 reviewed Jan 14, 2022

View reviewed changes

sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/multi_execution_aggregator.py Outdated Show resolved Hide resolved

xinlian12 reviewed Jan 14, 2022

View reviewed changes

sdk/cosmos/azure-cosmos/azure/cosmos/_gone_retry_policy.py Show resolved Hide resolved

xinlian12 reviewed Jan 14, 2022

View reviewed changes

sdk/cosmos/azure-cosmos/azure/cosmos/_gone_retry_policy.py Show resolved Hide resolved

small fixes based on comments

950fbe2

kushagraThapar requested changes Jan 19, 2022

View reviewed changes

xinlian12 reviewed Jan 19, 2022

View reviewed changes

sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/document_producer.py Outdated Show resolved Hide resolved

addressed more comments

10ef2d6

simorenoh requested review from kushagraThapar and removed request for xinlian12 January 19, 2022 18:10

simplynaveen20 approved these changes Jan 19, 2022

View reviewed changes

simorenoh and others added 8 commits January 20, 2022 12:50

Merge branch 'main' into split-proof-query

e0ec985

added test, made slight changes

4eeec22

Merge branch 'split-proof-query' of https://github.com/simorenoh/azur…

988ea5a

…e-sdk-for-python into split-proof-query

rename test and small changes

4f26d91

pylint

5c42486

pylintpylintpylint

154291a

moved partition_range_gone check to exceptions since makes more sense

782268e

re use code

3d352cb

xinlian12 reviewed Jan 21, 2022

View reviewed changes

xinlian12 approved these changes Jan 21, 2022

View reviewed changes

Merge branch 'main' into split-proof-query

90d05ea

kushagraThapar approved these changes Jan 24, 2022

View reviewed changes

simorenoh merged commit 18981ce into Azure:main Jan 25, 2022

simorenoh deleted the split-proof-query branch January 25, 2022 18:21

simorenoh mentioned this pull request Aug 29, 2022

When will limitations be fixed? #20497

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cosmos] split proof queries sync client #22237

[Cosmos] split proof queries sync client #22237

simorenoh commented Dec 20, 2021 •

edited

Loading

kushagraThapar Jan 19, 2022

simorenoh Jan 19, 2022 •

edited

Loading

kushagraThapar Jan 19, 2022

kushagraThapar Jan 19, 2022

simorenoh Jan 19, 2022 •

edited

Loading

check-enforcer bot commented Jan 19, 2022

simplynaveen20 left a comment

simplynaveen20 Jan 19, 2022

simplynaveen20 Jan 19, 2022

xinlian12 Jan 21, 2022

xinlian12 Jan 21, 2022

simorenoh Jan 21, 2022

xinlian12 left a comment

		@@ -129,6 +136,32 @@ def fetch_next_block(self):

		raise NotImplementedError("You should use pipeline's fetch_next_block.")

		def _repair_document_producer(self):

		from .http_constants import HttpHeaders, StatusCodes, SubStatusCodes

[Cosmos] split proof queries sync client #22237

[Cosmos] split proof queries sync client #22237

Conversation

simorenoh commented Dec 20, 2021 • edited Loading

Choose a reason for hiding this comment

simorenoh Jan 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simorenoh Jan 19, 2022 • edited Loading

Choose a reason for hiding this comment

check-enforcer bot commented Jan 19, 2022

What is Check Enforcer?

Why am I getting this message?

What should I do now?

What if I am onboarding a new service?

simplynaveen20 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xinlian12 left a comment

Choose a reason for hiding this comment

simorenoh commented Dec 20, 2021 •

edited

Loading

simorenoh Jan 19, 2022 •

edited

Loading

simorenoh Jan 19, 2022 •

edited

Loading