You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In apps deployed on Managed Runtime, most of the response time is spent waiting on network I/O. This introduces performance variability, driven by the health and responsiveness of upstream APIs.
To improve observability and resilience, it would be valuable to:
Give visibility into the health of upstream APIs (e.g., SCAPI).
Add tooling to manage backpressure and mitigate the impact of slow or degraded upstream services.
Currently, diagnosing slow API response times is cumbersome. Without detailed telemetry on API performance, it can be difficult to pinpoint the source of delays. For instance:
Is SCAPI generally healthy, or is a specific endpoint struggling?
How often are error rates spiking?
Are certain endpoints consistently slower than others?
API Health Observability
Introduce real-time monitoring of upstream API health with granular filtering capabilities, enabling deeper analysis of response times and status codes. Example use cases:
Overall Health: What’s the p99 response time for SCAPI requests?
Filtered by Status Code: What’s the p99 for HTTP 200 responses?
Endpoint-Level Analysis: What’s the p99 for Shopper Search requests?
Param-Level Insight: What’s the p99 for Shopper Search with no expands?
This visibility would help surface degradation trends and identify where optimization efforts should focus.
Backpressure and Fail-Safe Mechanisms
To prevent cascading failures, provide mechanisms to control and adjust how the system behaves when upstream APIs are slow or failing.
Timeout Enforcement
Implement configurable timeouts for upstream API calls (e.g., 10-second hard limit).
Default to SCAPI’s documented timeout of 10 seconds but allow overrides through environment variables.
Ensure timeouts are only enforced server-side to avoid client-side variability.
Log requests exceeding timeout limits with clear error messaging for easy troubleshooting.
Circuit Breakers
Integrate circuit breakers to temporarily halt traffic to slow or failing upstream APIs.
Allow per-host/proxy circuit breakers, configurable by environment variables.
Maintain circuit breaker state centrally to ensure consistency across executions.
In apps deployed on Managed Runtime, most of the response time is spent waiting on network I/O. This introduces performance variability, driven by the health and responsiveness of upstream APIs.
To improve observability and resilience, it would be valuable to:
Currently, diagnosing slow API response times is cumbersome. Without detailed telemetry on API performance, it can be difficult to pinpoint the source of delays. For instance:
API Health Observability
Introduce real-time monitoring of upstream API health with granular filtering capabilities, enabling deeper analysis of response times and status codes. Example use cases:
This visibility would help surface degradation trends and identify where optimization efforts should focus.
Backpressure and Fail-Safe Mechanisms
To prevent cascading failures, provide mechanisms to control and adjust how the system behaves when upstream APIs are slow or failing.
Timeout Enforcement
Circuit Breakers
The text was updated successfully, but these errors were encountered: