Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
metadata: minor changes, & wait a bit when looping for now triggers
Previously, if an immediate trigger failed either due to request failure or inner partition errors, we would immediately loop and re-load metadata. This retry could happen twice, resulting in three total metadata loads that were nearly instantaneous. This immediate retry is not too beneficial: a failure should imply *some* backoff. If a user has auto topic creation enabled, a partition will not have a leader right away, and our immediate triggers would fail on first produce and the client would stall 10s until the delayed trigger re-takes and re-loads. We split the retry into two cases: - On error, we do not retry. Fetching metadata itself already retries 3x on request failure, so we should not retry further when we know the request itself failed 3 times. We will just go to the delayed update case. - On non-error, but inner-partition-error, we sleep 250ms and try again up to 8x, meaning we try across 2s. This gives Kafka a chance to harmonize its internal issues, and allows us to be less immediately spammy. This does mean, however, that we could end up trying a bit more in the end for a bit longer of time. We'd need to retry anyway eventually, so, minor wash. This commit also drops the default min metadata reload interval from 10s to 5s. This speeds up some random cases where an immediate trigger continues to fail. Hopefully, this does not result in unnecessary metadata load for users. Lastly, this removes the min metadata refresh interval, instead just globally defaulting to the user's configuration. The min interval was originally 1s for not much reason, then bumped to 2.5s for similar lack of reasoning. Actually thinking about it, it does not make sense to allow waitmeta to trigger a metadata update that will not immediately run. The purpose of waitmeta is to wait for a metadata update. We should either always return a cached value, or always immediately trigger and then run.
- Loading branch information