Added support for opensearch-py client connections urllib3httpconnection, requestshttpconnection #445

saimedhi · 2024-01-23T06:54:31Z

Description

Added support for opensearch-py client connections urllib3httpconnection, requestshttpconnection.

Issues Resolved

#437

Follow up Issues

Document how to use urllib3httpconnection, requestshttpconnection client connections.

Usage examples:

below connection_class in --client-options is case insensitive.
opensearch-benchmark execute-test --distribution-version=2.11.1 --workload=geonames --test-mode --kill-running-processes --client-options "connection_class:urllib3"
For urllib3httpconnection: both options (urllib3, urllib3httpconnection) will work. Case insensitive
For requestshttpconnection: both options (requests, requestshttpconnection) will work. Case insensitive

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ion, requestshttpconnection Signed-off-by: saimedhi <[email protected]>

…ion, requestshttpconnection. Signed-off-by: saimedhi <[email protected]>

dblock · 2024-01-23T20:43:15Z

osbenchmark/client.py

+                connection_class = osbenchmark.sync_connection.Urllib3HttpConnection
+        else:
+            connection_class = osbenchmark.sync_connection.Urllib3HttpConnection
+


This seems unnecessarily complicated, and should raise an error if the user specified an invalid class.

connection_class = self.client_options.get("connection_class", "urllib3") if connection_class == "requests": connection_class = osbenchmark.sync_connection.RequestsHttpConnection elif connection_class == "..." ... else: throw ...

This seems unnecessarily complicated, and should raise an error if the user specified an invalid class.

connection_class = self.client_options.get("connection_class", "urllib3") if connection_class == "requests": connection_class = osbenchmark.sync_connection.RequestsHttpConnection elif connection_class == "..." ... else: throw ...

Error handling is done here

I will simplify the code

Agree with @dblock Since Python does not have a switch or case statement, at least pre-3.10, you can use a structured if-elif block or perhaps a map. Also, a default arm with error handling is good practice, even if there is error handling elsewhere, since a reader of this section might not be aware of that.

dblock · 2024-01-23T20:44:02Z

osbenchmark/worker_coordinator/worker_coordinator.py

@@ -1449,7 +1449,17 @@ async def run(self):
        def os_clients(all_hosts, all_client_options):
            opensearch = {}
            for cluster_name, cluster_hosts in all_hosts.items():
-                opensearch[cluster_name] = client.OsClientFactory(cluster_hosts, all_client_options[cluster_name]).create_async()
+                if "connection_class" in all_client_options["default"]:


Same as above, collapse the nested if/else.

dblock · 2024-01-23T20:45:25Z

osbenchmark/worker_coordinator/worker_coordinator.py

+                if asyncio.iscoroutinefunction(s.transport.close):
+                    await s.transport.close()
+                else:
+                    s.transport.close()


Ideally there should be a sync and an async worker coordinator that have a close method, or a transport class that wraps the async version so that you can change this code to just s.transport.close().

This could be refactored in a future check-in.

gkamat · 2024-02-05T07:32:56Z

osbenchmark/client.py

+                connection_class = osbenchmark.sync_connection.Urllib3HttpConnection
+        else:
+            connection_class = osbenchmark.sync_connection.Urllib3HttpConnection
+


Agree with @dblock Since Python does not have a switch or case statement, at least pre-3.10, you can use a structured if-elif block or perhaps a map. Also, a default arm with error handling is good practice, even if there is error handling elsewhere, since a reader of this section might not be aware of that.

gkamat · 2024-02-05T07:35:21Z

osbenchmark/client.py

        if "amazon_aws_log_in" not in self.client_options:
-            return opensearchpy.OpenSearch(hosts=self.hosts, ssl_context=self.ssl_context, **self.client_options)
+            return self.BenchmarkOpenSearch(hosts=self.hosts, ssl_context=self.ssl_context,


Would suggest renaming this to BenchmarkSyncOpenSearch to match the other class. Likewise with create() vis-a-vis create_async(). Please add comments prior to each indicating the presence of the other.

Actually, you might. consider renaming both classes to be more descriptive, something like PythonSyncClient and PythonAsyncClient, so future clients can be named accordingly.

gkamat · 2024-02-05T07:38:09Z

osbenchmark/worker_coordinator/worker_coordinator.py

+                if asyncio.iscoroutinefunction(s.transport.close):
+                    await s.transport.close()
+                else:
+                    s.transport.close()


This could be refactored in a future check-in.

gkamat · 2024-02-05T07:40:19Z

osbenchmark/sync_connection.py

+            request_context_holder.on_request_start()
+            status, headers, raw_data = super().perform_request(method=method, url=url, params=params, body=body, timeout=timeout,
+                                                                allow_redirects=allow_redirects, ignore=ignore, headers=headers)
+            request_context_holder.on_request_end()


There is no error handling here. If the request fails, the on_request_end callback will not be executed, leading to incorrect metrics in the command dispatcher, or possibly an error there.

Note: this may be the wrong location to insert these calls, if they are intended to measure service time. They should be at the lowest level possible just prior to the HTTP socket send call. The aiohttp library has callbacks, but the others might not. In that case, the calls will need to be somewhere within opensearchpy, not here.

Hey @dblock, any ideas on how we can get the exact timing? Should we make changes in opensearch-py?

For the issue that @gkamat is bringing up, the calling code and the implementation should become a with block / __enter__ and __exit__ so that on_request_end is always called, even on error.

On the second issue of lowest level I think we are trying to measure "client overhead" that begins "around" the client. I also do think it would be good to get lower level HTTPs library metrics for pure transport. Finally, the entire client will become swappable, so keeping the code here makes sense to me.

@gkamat WDYT?

@dblock, yes, either a context manager or exception handling will be an appropriate mechanism to handle the issue.

Regarding the second point, I believe those calls were added by @saimedhi to measure service time, which reflects the time the request is placed on the wire and received from the network (as accurately as is feasible). The aiohttp client provides trace event callbacks that make this possible. The requests library has a response hook here, but couldn't find a request callback. Have not looked at urllib3.

You are correct about the client processing time in that it measures latency around the client and that location above would be the correct entry point to measure this metric. But I believe it is the other metric that is being captured here and the difference between the two would be pure client overhead.

gkamat · 2024-02-05T07:40:31Z

osbenchmark/sync_connection.py

+            request_context_holder.on_request_start()
+            status, headers, raw_data = super().perform_request(method=method, url=url, params=params, body=body,
+                                                                timeout=timeout, ignore=ignore, headers=headers)
+            request_context_holder.on_request_end()


Same comment as above.

gkamat · 2024-02-05T07:45:56Z

osbenchmark/worker_coordinator/runner.py

+        if "BenchmarkOpenSearch" in str(opensearch):
+            if with_action_metadata:
+                api_kwargs.pop("index", None)
+                # only half of the lines are documents
+                response = opensearch.bulk(params=bulk_params, **api_kwargs)
+            else:
+                response = opensearch.bulk(doc_type=params.get("type"), params=bulk_params, **api_kwargs)
        else:
-            response = await opensearch.bulk(doc_type=params.get("type"), params=bulk_params, **api_kwargs)
-
+            if with_action_metadata:
+                api_kwargs.pop("index", None)
+                # only half of the lines are documents
+                response = await opensearch.bulk(params=bulk_params, **api_kwargs)
+            else:
+                response = await opensearch.bulk(doc_type=params.get("type"), params=bulk_params, **api_kwargs)


There is a lot of code duplication in this file that is likely not necessary. It should be possible to indirect the call to the client API via a routine that calls the desired sync or async function by switching on the predicate. The rest of the code can stay the same. Alternatively, a ternary operator is perhaps a better option.

@gkamat, can we do it as below?

Adding await conditionally on each line where opensearch call is made. This prevents code duplication.

if with_action_metadata: api_kwargs.pop("index", None) response = opensearch.bulk(params=bulk_params, **api_kwargs) if "BenchmarkOpenSearch" in str(opensearch) else await opensearch.bulk(params=bulk_params, **api_kwargs) else: response = opensearch.bulk(doc_type=params.get("type"), params=bulk_params, **api_kwargs) if "BenchmarkOpenSearch" in str(opensearch) else await opensearch.bulk(doc_type=params.get("type"), params=bulk_params, **api_kwargs)

Yes, that is the ternary operator referred to above. It will be more compact to encapsulate the predicate within a function. Also, it is probably cleaner to define the classes at the top level in client.py, import them and test directly, rather than using str():

opensearch.__class__ == PythonSyncClient

or something like that.

gkamat · 2024-02-05T18:19:26Z

osbenchmark/worker_coordinator/worker_coordinator.py

@@ -1449,7 +1449,17 @@ async def run(self):
        def os_clients(all_hosts, all_client_options):
            opensearch = {}
            for cluster_name, cluster_hosts in all_hosts.items():
-                opensearch[cluster_name] = client.OsClientFactory(cluster_hosts, all_client_options[cluster_name]).create_async()
+                if "connection_class" in all_client_options["default"]:


Consider changing the user flag to backend-client or client, since this flag will continue to be used for future clients like a Rust client, C++ client, etc.

I will rename the user flag to backend-client

saimedhi · 2024-04-05T17:13:51Z

Closing this PR. I will resubmit it once opensearch-py has service time metrics.

saimedhi and others added 2 commits January 22, 2024 22:37

Added support for opensearch-py client connections urllib3httpconnect…

fa951e2

…ion, requestshttpconnection Signed-off-by: saimedhi <[email protected]>

Merge branch 'opensearch-project:main' into client_connections

cba1df9

saimedhi requested review from IanHoang and gkamat as code owners January 23, 2024 06:54

Added support for opensearch-py client connections urllib3httpconnect…

e43f9bf

…ion, requestshttpconnection. Signed-off-by: saimedhi <[email protected]>

dblock reviewed Jan 23, 2024

View reviewed changes

IanHoang added the High Priority label Feb 1, 2024

gkamat requested changes Feb 5, 2024

View reviewed changes

gkamat reviewed Feb 5, 2024

View reviewed changes

gkamat removed the High Priority label Feb 5, 2024

gkamat mentioned this pull request Feb 10, 2024

Added a new metric: Client Processing Time #450

Merged

saimedhi closed this Apr 5, 2024

saimedhi mentioned this pull request Oct 1, 2024

Support opensearch-py Client Connections (urllib3httpconnection, requestshttpconnection) #437

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for opensearch-py client connections urllib3httpconnection, requestshttpconnection #445

Added support for opensearch-py client connections urllib3httpconnection, requestshttpconnection #445

saimedhi commented Jan 23, 2024

dblock Jan 23, 2024

saimedhi Jan 23, 2024

gkamat Feb 5, 2024 •

edited

Loading

dblock Jan 23, 2024

dblock Jan 23, 2024

gkamat Feb 5, 2024

gkamat Feb 5, 2024 •

edited

Loading

gkamat Feb 5, 2024

gkamat Feb 6, 2024

gkamat Feb 5, 2024

gkamat Feb 5, 2024

gkamat Feb 7, 2024

saimedhi Feb 7, 2024

saimedhi Feb 7, 2024

dblock Feb 7, 2024

gkamat Feb 10, 2024 •

edited

Loading

gkamat Feb 5, 2024

gkamat Feb 5, 2024 •

edited

Loading

saimedhi Feb 5, 2024 •

edited

Loading

gkamat Feb 6, 2024

gkamat Feb 5, 2024

saimedhi Feb 20, 2024

saimedhi commented Apr 5, 2024

Added support for opensearch-py client connections urllib3httpconnection, requestshttpconnection #445

Added support for opensearch-py client connections urllib3httpconnection, requestshttpconnection #445

Conversation

saimedhi commented Jan 23, 2024

Description

Issues Resolved

Follow up Issues

Usage examples:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gkamat Feb 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gkamat Feb 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gkamat Feb 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gkamat Feb 5, 2024 • edited Loading

Choose a reason for hiding this comment

saimedhi Feb 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saimedhi commented Apr 5, 2024

gkamat Feb 5, 2024 •

edited

Loading

gkamat Feb 5, 2024 •

edited

Loading

gkamat Feb 10, 2024 •

edited

Loading

gkamat Feb 5, 2024 •

edited

Loading

saimedhi Feb 5, 2024 •

edited

Loading