Releases: statsig-io/statsig-forward-proxy
v3.0.0 - New Key Schema and other improvements
Summary of Changes
- Include SFP version in requests to CDN/origin
- Prepare redis cache to support various encoding types
- Add LCUT slice to redis based metrics
- Fix a bug where reading from redis does not check LCUT
- Add a metric for estimated active GRPC streams
Summary of Breaking Changes
Introduced new key schema
New key schema matches against the one that will be generated from SDKs, but also now includes an encoding type.
Migration Strategy: If you are not reading data from anywhere but the forward proxy, this breaking change does not apply to you.
If you use an external cache, first enable --double-write-cache-for-legacy-key, then update the data adapter code in your SDK to use: https://github.com/statsig-io/statsig-forward-proxy/blob/main/src/datastore/caching/redis_cache.rs#L249-L259 , and then disable --double-write-cache-for-legacy-key.
This legacy key will use the v0.x.x and v1.x.x logic. Not the 2.x.x.
v2.0.0 - Improved scalability and other breaking changes
Summary
- Some bug fixes related to statsd metrics not propagating
Summary of Breaking Changes
Nginx Support
We introduced this to improve performance at scale for larger payloads at high qps
Migration Strategy: This will cause a reduction in overall request volume due to our nginx configuration serving from its internal cache for a majority of requests.
clear_datastore_on_unauthorized
This was changed from clear_external_datastore_on_unauthorized which use to only clear the external datastores, but has been updated to also control the behavior of the internal datastore.
The rational for this was that if we have a SEV related to authorization, we don't want to clear the internal store either. The cache will then be cleared on restart.
We understand this is not the ideal solution, but a stop gap measure to ensure we don't risk unavailability. We are following up with another change soon, but wanted to mitigate this risk first.
Migration Strategy: If you delete a key and still see it served, but don't want it to be available anymore, just restart the service.
Deprecated idlist
We are moving to a more sustainable solution for serving IDlist payloads, as a result, we no longer plan to serve IDlist from the proxy.
Migration Strategy: On the SDK side, if you overrode the API to to send idlist traffic to the forward proxy, please remove that setting before upgrading.
Removed InMemoryCache
This cache was only used for development, but customers were selecting it by accident.
Migration Strategy: Choose a different cache, for equivalent behavior use "disabled"
Introduced new key schema
We are moving to a new key schema that will allow us to move key generation into the SDKs instead of requiring each user to define it themselves.
Migration Strategy: If you use an external cache, first enable --double-write-cache-for-legacy-key, then update the data adapter code in your SDK to use: https://github.com/statsig-io/statsig-forward-proxy/blob/main/src/datastore/caching/redis_cache.rs#L231C5-L238C6 , and then disable --double-write-cache-for-legacy-key.
v1.1.8 - Recommended Update - Bug Fix, GZIP, and TLS Support
Summary
- Add support for TLS for GRPC Streaming
- Add full support for gzip in HTTP
- Fix issue where HTTP Client gets into a bad state preventing configs from updating
v1.1.7 - Performance Improvements, New Cache mode, and New Logging Mode
Summary
- We identified that our logging path could keep CPU higher than necessary. As a result, we began optimizing this path.
- Introduced a new mode for backup cache, which is Disabled. The rationale is that local is an exact copy of the stores which makes it relatively redundant to use.
- Introduced new logging mode, '--statsig-logging', which logs events to statsig using the SDK
v1.1.6 - Bugs, Perf, and Chores
Summary
We recommend updating to this version if you are on later ones due to Signficiant perf improvements and bug fixes.
Bugs
- Fixed a race condition which could cause more outbound requests on a cache miss than needed (should be bounded at 1, but at high qps, can go to 10s/100s)
- Fix bug where id lists were failing to fetch which could lead to almost infinite 401s
Performance
- In non-async scenarios, use parking_lot RwLocks which are more performant for these basic situations
- In our core data stores, leverage a dashmap which has internal sharding to reduce lock contention on shared hashmaps which only use a lock for shared storage
- Improve logic for reading and writing to reduce lock contention and lock holding as much as possible by moving almost all logic out of the critical section
- In scenarios where there is an outer an inner lock, only ever hold the write lock on the outer lock if there is no valid entry that exists yet, else only use a read lock
- Used fixed sized vectors to reduce memory utilization
- Moved sdk_key parsing off of main request thread and into event handler thread
Chores
- Cleanup metric collection for log_event
v1.1.5 - Add support for timing metric
Summary
Distribution metrics are a Datadog specific implementation. As a result, exposed the capability to replace using distribution metrics with standard statsd timing metrics.
v1.1.4 - Add support for more endpoints and metric updates
Summary
- Add a new propagation latency metric
- Add support for new endpoints that are unused by SDKs at the moment
- Refactor of core logic to use request context
v1.1.3 - Add flag for clearing external cache
Summary
Prior to this version, this would be done by default, however, there are reliability implications if this is done. Mainly that, if there are any 4xx's that happen unexpectedly, we still want the external cache to be able to serve back ups.
As a result, we made this a flag so that this trade off can be done as a choice by end users.
v1.1.2 - More optimizations and grpc improvements
Summary
- Add hostname metadata to grpc stream to improve debugging
- Fix problem which could cause primary to never change if a process is terminated before expiry is set
- Optimize get_id_lists such that we dont always write to redis
- Added support to specify maximum concurrent streams for server
- Implemented graceful shutdown for grpc stream
v1.1.1 - A number of fixes and optimizations
Summary
- Fix /v1/get_id_lists such that it no longer clears cache due to a 411
- Fixed locking logic which affected a number of callsites due to high qps
- Update in memory cache to support idlist
- Add lcut to a few statsd metrics
- Prevent potentially pushing the same dcs payload through a grpc stream multiple times