Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Balancer/upstreams: Kong loads upstreams table every 10s in request context when no upstream exist in database #8970

Closed
1 task done
marc-charpentier opened this issue Jun 16, 2022 · 3 comments · Fixed by #8974

Comments

@marc-charpentier
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Kong version ($ kong version)

2.8.1

Current Behavior

Hello,
When Kong has only "standalone" services entities in database, and no upstreams, it reloads the whole empty upstreams table every 10 seconds (being the negative TTL of balancer:upstreams cache key) in request context, adding extra 100 ms to proxy latency.
The cause is that the kong.runloop.balancer.upstreams:load_upstreams_dict_into_memory function returns a nil instead of an empty table (upstreams_dict) if no key/value pair was inserted.
So, I modified this function in order to return upstreams_dict unconditionally and drop the local found boolean, and the reload no longer took place every 10s at runtime. Cache invalidation mechanism works well and invalidates this cached dictionary as soon as an upstream is created using Kong Admin API.
Is there any reason I didn't think about in order to not cache an empty dictionary ?
Thank you.

Expected Behavior

Kong should not reload the whole upstreams table every 10s (provided traffic is enough to trigger it with such minimal interval) as long as no entity is CReated/Updated/Deleted.

Steps To Reproduce

1. Provision Kong with a service and an associated route (and delete all upstreams).
2. Curl the created route every 1s, greping the X-Kong-Latency-Proxy header.
3. Every 10 request, the X-Kong-Latency-Proxy jumps from 1-2ms to 100-120ms.

Anything else?

Kong running with a Cassandra database.

@flrgh
Copy link
Contributor

flrgh commented Jun 16, 2022

Hi @marc-charpentier 👋

This is a good find! Appreciate the debugging efforts and detailed report. Your reasoning on this seems sound to me--hopefully another dev will chime in to confirm (or refute if there's something we're both unaware of). Looking at this code, one alternative fix might be to change the initial value of found here from nil => false so that it will be treated as a cache-able return value by mlcache, though I'd agree returning an empty table ({}) would probably work just as well.

Would you be able to open up a PR for this? Sounds like you know your way around the code. I think the bulk of the effort would be in testing, since we most likely want an integration test case to confirm that cache invalidation still works properly in the case that an upstream is added.

@marc-charpentier
Copy link
Contributor Author

Hi @flrgh
Thank you for so quick an answer.
About returning an empty table ({}) instead of false, although I hadn't thought about the latter option, I think that with the former one we could also replace the return upstreams_dict or {} instruction by return upstreams_dict at end of function upstreams_M.get_all_upstreams(), as we end up returning an empty table anyway when there are no uptreams in db.
About adding an integration test case, I'm sorry but I've no experience in writing or executing them. I didn't doubt that the cache invalidation mechanism still worked, but I manually tested adding/removing upstreams (including one with name equaling the service's host) with additional log messages and checking X-Kong-Proxy-Latency response header, while curling the service's route every second, in order to ensure the cached value was replaced whenever it had to.
Anyway, I'll open a PR for this, including the modification I suggest here above, and try to take further requirements into account if any.

@esatterwhite
Copy link

I can confirm. The latency jump happens in DB-less mode as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants