You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add option to return requests in order using the Client (#5404)
If you use replicated Executors, those which finish processing first return their results to the Gateway which then returns them to the client. This is useful if you want results as soon as each replicated Executor finishes processing your Documents.
However, this may be inconvenient if you want the Documents you send to the Flow to return in order. In this release, you can retain the order of sent Documents (when using replicated Executors) by passing the results_in_order parameter in the Client.
The new parameter docs_map indicates a dictionary that maps previous Executor names to DocumentArrays. This is useful when you have an Executor that combines results from many previous Executors, and you need information about where each resulting DocumentArray comes from.
Prior to this release, all Gateway configurations were specified in the Flow API. However, in principle, Flow parameters are commonly inherited by Executors and the Gateway. We already gave the Executor its own API to be customized (either using the method add() or the executors YAML section in Flow YAML).
In this release, we have done the same for Gateway. It defines its own API in both the Python API and YAML interface. In the Python API, you can configure the Gateway using the config_gateway() method:
This is useful when you want to apply parameters just for the Gateway. If you want a parameter to be applied to all pods, then continue to use the Flow API.
Keep in mind that you can still provide Gateway parameters using the Flow API. This means there are no breaking changes introduced.
Support UUID in CUDA_VISIBLE_DEVICES round-robin assignment (#5360)
You can specify a comma-separated list of GPU UUIDs in the CUDA_VISIBLE_DEVICES to assign devices to Executor replicas in a round-robin fashion. For instance:
Check CUDA's documentation to see the accepted formats to assign CUDA devices by UUID.
GPU device
Replica ID
GPU-0aaaaaaa-74d2-7297-d557-12771b6a79d5
0
GPU-0bbbbbbb-74d2-7297-d557-12771b6a79d5
1
GPU-0ccccccc-74d2-7297-d557-12771b6a79d5
2
GPU-0ddddddd-74d2-7297-d557-12771b6a79d5
3
GPU-0aaaaaaa-74d2-7297-d557-12771b6a79d5
4
Thanks to our community member @mchaker for submitting this feature request!
Capture shard failures in the head runtime (#5338)
In case you use Executor shards, partially failed requests (those that fail on a subset of the shards) no longer raise an error.
Instead, successful results are returned. An error is raised only when all shards fail to process Documents. Basically, the HeadRuntime's behavior is updated to fail only when all shards fail.
Thanks to our community user @soumil1 for submitting this feature request.
Add successful, pending and failed metrics to HeadRuntime (#5374)
More metrics have been added to the Head Pods:
jina_number_of_pending_requests: number of pending requests
jina_successful_requests: number of successful requests
jina_failed_requests: number of failed requests
jina_received_request_bytes: the size of received requests in bytes
jina_sent_response_bytes: the size of sent responses in bytes
Executor metrics used to show up aggregated at the Gateway level and users couldn't see separate metrics per Executor. With this release, we have added labels for Executors so that metrics in the Gateway can be generated per Executor or aggregated over all Executors.
🐞 Bug Fixes
Check whether the deployment is in Executor endpoints mapping (#5440)
This release adds an extra check in the Gateway when sending requests to deployments: The Gateway sends requests to the deployment only if it is in the Executor endpoint mapping.
Unblock event loop to allow health service (#5433)
Prior to this release, sync function calls inside Executor endpoints blocked the event loop. This meant that health-checks submitted to Executors failed for long tasks (for instance, inference using a large model).
In this release, such tasks no longer block the event loop. While concurrent requests to the same Executor wait until the sync task finishes, other runtime tasks remain functional, mainly health-checks.
Dump environment variables to string for Kubernetes (#5430)
Environment variables are now cast to strings before dumping them to Kubernetes YAML.
This release frees (unpins) jina-hubble-sdk version. The latest jina-hubble-sdk is installed with the latest Jina.
Bind servers to host argument instead of __default_host__ (#5405)
This release makes servers at each Jina pod (head, Gateway, worker) bind to the host address specified by the user, instead of always binding to the __default_host__ corresponding to the OS. This lets you, depending on your network interface, restrict or expose your Flow services in your network.
For instance, if you wish to expose all pods to the internet, except for the last Executor, you can do:
Fix backoff_multiplier format when using max_attempts in the Client (#5403)
This release fixes the format of backoff_multiplier parameter when injected into the gRPC request. The issue appeared when you use the max_attempts parameter in Client.
Maintain the correct tracing operations chain (#5391)
Tracing spans for Executors used to show up out of order. This behavior has been fixed by using the method start_as_current_span instead of start_span to maintain the tracing chain in the correct order.
Use Async health servicer for tracing interceptors when tracing is enabled (#5392)
When tracing is enabled, health checks in Docker and Kubernetes deployments used to fail silently until the Flow timed out. This happened because tracing interceptors expected RPC stubs to be coroutines.
This release fixes this issue by using the async aio.HealthServicer instead of grpc_health.HealthServicer. Health checks submitted to runtimes (Gateway, head, worker) no longer fail when tracing is enabled.
Properly update requests count in case of Exception inside the HeadRuntime (#5383)
In case of an Exception being raised in the HeadRuntime, request counts were not updated properly (pending requests should have been decremented and failed requests should have been incremented). This is fixed in this release and the Exception is caught to update request counts.
Fix endpoint binding when inheriting Executors (#5380)
When an Executor is inherited, the bound endpoints of the parent Executor used to be overridden by those of the child Executor. This meant, if you inherited from Executors but still chose to use the parent Executor in your Flow, a wrong endpoint could have been called. This behavior is fixed by making Executor.requests a nested dict that also includes information about the class name. This helps to properly support Executor inheritance.
Missing recording logic in connection stub metrics (#5363)
Recording of request and response size in bytes is fixed to track all cases. This makes these metrics more accurate for the Gateway.
The MetricsTimer in instrumentation previously created new timers without keeping the histogram metric labels. This behavior is fixed and new timers retain the same labels.
Use non-mutable default for MetricsTimer constructor (#5339)
Use None instead of empty dict as a default value for histogram_metric_labels in MetricsTimer constructor.
Catch RpcErrors and show better error messages in Client (#5325)
In the Client, we catch RpcError and show its details instead of showing a standard error message.
Import OpenTelemetry functions only when tracing is enabled in WorkerRuntime (#5321)
This release ensures OpenTelemetry functions are only imported when tracing is enabled in the worker.
Release Note
This release contains 8 new features, 16 bug fixes and 15 documentation improvements.
🆕 Features
Support multiple protocols at the same time in Flow Gateways (#5435 and #5378)
Prior to this release, a Flow only exposed one server in its Gateway with one of the following protocols: HTTP, gRPC or WebSockets.
Now, you can specify multiple protocols and for each one, a separate server is started. Each server is bound to its own port.
For instance, you can do:
or:
jina flow --uses flow.yml
whereflow.yml
is:The
protocol
andport
parameters can still accept single values rather than a list. Therefore, there is no breaking change.Alias parameters
protocols
andports
are also defined:In Kubernetes, this exposes separate services for each protocol.
Read the Jina Gateway documentation on multiple protocols for more information.
Add Kubernetes information to resource attributes in instrumentation (#5372)
When deploying to Kubernetes, the Gateway and Executors expose the following Kubernetes information as resource attributes in instrumentation:
k8s.namespace.name
k8s.pod.name
k8s.deployment.name
/k8s.statefulset.name
Besides that, the following resource attributes are set if they are present in the environment variables of the container:
k8s.cluster.name
(setK8S_CLUSTER_NAME
environment variable)k8s.node.name
(setK8S_NODE_NAME
environment variable)Add option to return requests in order using the
Client
(#5404)If you use replicated Executors, those which finish processing first return their results to the Gateway which then returns them to the client. This is useful if you want results as soon as each replicated Executor finishes processing your Documents.
However, this may be inconvenient if you want the Documents you send to the Flow to return in order. In this release, you can retain the order of sent Documents (when using replicated Executors) by passing the
results_in_order
parameter in theClient
.For instance, if your Flow looks like this:
You can do the following to keep results in order:
Add
docs_map
parameter to Executor endpoints (#5366)Executor endpoint signatures are extended to the following:
The new parameter
docs_map
indicates a dictionary that maps previous Executor names to DocumentArrays. This is useful when you have an Executor that combines results from many previous Executors, and you need information about where each resulting DocumentArray comes from.Add Gateway API (#5342)
Prior to this release, all Gateway configurations were specified in the Flow API. However, in principle, Flow parameters are commonly inherited by Executors and the Gateway. We already gave the Executor its own API to be customized (either using the method
add()
or theexecutors
YAML section in Flow YAML).In this release, we have done the same for Gateway. It defines its own API in both the Python API and YAML interface. In the Python API, you can configure the Gateway using the
config_gateway()
method:And in the YAML interface, you can configure the Gateway using the
gateway
section:This is useful when you want to apply parameters just for the Gateway. If you want a parameter to be applied to all pods, then continue to use the Flow API.
Keep in mind that you can still provide
Gateway
parameters using the Flow API. This means there are no breaking changes introduced.Support UUID in CUDA_VISIBLE_DEVICES round-robin assignment (#5360)
You can specify a comma-separated list of GPU UUIDs in the
CUDA_VISIBLE_DEVICES
to assign devices to Executor replicas in a round-robin fashion. For instance:Check CUDA's documentation to see the accepted formats to assign CUDA devices by UUID.
GPU-0aaaaaaa-74d2-7297-d557-12771b6a79d5
0
GPU-0bbbbbbb-74d2-7297-d557-12771b6a79d5
1
GPU-0ccccccc-74d2-7297-d557-12771b6a79d5
2
GPU-0ddddddd-74d2-7297-d557-12771b6a79d5
3
GPU-0aaaaaaa-74d2-7297-d557-12771b6a79d5
4
Thanks to our community member @mchaker for submitting this feature request!
Capture shard failures in the head runtime (#5338)
In case you use Executor shards, partially failed requests (those that fail on a subset of the shards) no longer raise an error.
Instead, successful results are returned. An error is raised only when all shards fail to process Documents. Basically, the
HeadRuntime
's behavior is updated to fail only when all shards fail.Thanks to our community user @soumil1 for submitting this feature request.
Add successful, pending and failed metrics to HeadRuntime (#5374)
More metrics have been added to the Head Pods:
jina_number_of_pending_requests
: number of pending requestsjina_successful_requests
: number of successful requestsjina_failed_requests
: number of failed requestsjina_received_request_bytes
: the size of received requests in bytesjina_sent_response_bytes
: the size of sent responses in bytesSee more in the instrumentation docs.
Add deployment label in grpc stub metrics (#5344)
Executor metrics used to show up aggregated at the Gateway level and users couldn't see separate metrics per Executor. With this release, we have added labels for Executors so that metrics in the Gateway can be generated per Executor or aggregated over all Executors.
🐞 Bug Fixes
Check whether the deployment is in Executor endpoints mapping (#5440)
This release adds an extra check in the Gateway when sending requests to deployments: The Gateway sends requests to the deployment only if it is in the Executor endpoint mapping.
Unblock event loop to allow health service (#5433)
Prior to this release, sync function calls inside Executor endpoints blocked the event loop. This meant that health-checks submitted to Executors failed for long tasks (for instance, inference using a large model).
In this release, such tasks no longer block the event loop. While concurrent requests to the same Executor wait until the sync task finishes, other runtime tasks remain functional, mainly health-checks.
Dump environment variables to string for Kubernetes (#5430)
Environment variables are now cast to strings before dumping them to Kubernetes YAML.
Unpin jina-hubble-sdk version (#5412)
This release frees (unpins)
jina-hubble-sdk
version. The latestjina-hubble-sdk
is installed with the latest Jina.Bind servers to
host
argument instead of__default_host__
(#5405)This release makes servers at each Jina pod (head, Gateway, worker) bind to the host address specified by the user, instead of always binding to the
__default_host__
corresponding to the OS. This lets you, depending on your network interface, restrict or expose your Flow services in your network.For instance, if you wish to expose all pods to the internet, except for the last Executor, you can do:
After this fix, Jina respects this syntax and binds the last Executor only to
127.0.0.1
(accessible only inside the host machine).Thanks to @wqh17101 for reporting this issue!
Fix
backoff_multiplier
format when usingmax_attempts
in theClient
(#5403)This release fixes the format of
backoff_multiplier
parameter when injected into the gRPC request. The issue appeared when you use themax_attempts
parameter inClient
.Maintain the correct tracing operations chain (#5391)
Tracing spans for Executors used to show up out of order. This behavior has been fixed by using the method
start_as_current_span
instead ofstart_span
to maintain the tracing chain in the correct order.Use Async health servicer for tracing interceptors when tracing is enabled (#5392)
When tracing is enabled, health checks in Docker and Kubernetes deployments used to fail silently until the Flow timed out. This happened because tracing interceptors expected RPC stubs to be coroutines.
This release fixes this issue by using the async
aio.HealthServicer
instead ofgrpc_health.HealthServicer
. Health checks submitted to runtimes (Gateway, head, worker) no longer fail when tracing is enabled.Properly update requests count in case of Exception inside the HeadRuntime (#5383)
In case of an Exception being raised in the HeadRuntime, request counts were not updated properly (pending requests should have been decremented and failed requests should have been incremented). This is fixed in this release and the Exception is caught to update request counts.
Fix endpoint binding when inheriting Executors (#5380)
When an Executor is inherited, the bound endpoints of the parent Executor used to be overridden by those of the child Executor. This meant, if you inherited from Executors but still chose to use the parent Executor in your Flow, a wrong endpoint could have been called. This behavior is fixed by making
Executor.requests
a nested dict that also includes information about the class name. This helps to properly support Executor inheritance.Missing recording logic in connection stub metrics (#5363)
Recording of request and response size in bytes is fixed to track all cases. This makes these metrics more accurate for the Gateway.
Move build configs to
pyproject
(#5351)Build requirements have been moved from
setup.py
topyproject.toml
. This suppresses deprecation warnings that show up when installing Jina.New timer should keep labels (#5341)
The
MetricsTimer
in instrumentation previously created new timers without keeping the histogram metric labels. This behavior is fixed and new timers retain the same labels.Use non-mutable default for
MetricsTimer
constructor (#5339)Use
None
instead of empty dict as a default value forhistogram_metric_labels
inMetricsTimer
constructor.Catch
RpcError
s and show better error messages inClient
(#5325)In the
Client
, we catchRpcError
and show its details instead of showing a standard error message.Import OpenTelemetry functions only when tracing is enabled in WorkerRuntime (#5321)
This release ensures OpenTelemetry functions are only imported when tracing is enabled in the worker.
📗 Documentation Improvements
fork
in Jina when using macOS (docs: add tips for using multiprocessing fork in macos #5379)spawn
section and emphasize the need for entrypoint protection (docs: fix link to multiprocessing spawn section and highlight entrypoint protection #5370)🤟 Contributors
We would like to thank all contributors to this release:
The text was updated successfully, but these errors were encountered: