-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adds rate limiting to pull queries #4951
feat: Adds rate limiting to pull queries #4951
Conversation
@@ -168,6 +172,11 @@ public TableRowsEntity execute( | |||
+ "this feature."); | |||
} | |||
|
|||
if (!rateLimiter.tryAcquire()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this interact with host forwarding? If I forwarded to a host that is overwhelmed will I then try it's standby? It might make sense to check the error message and not forward to another host in this scenario (not sure if that's what happens or not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair question. At the moment, it checks the limit at both places, which seems like a reasonable method generally since both have to take part in responding.
It doesn't handle this failure in a special manner at the forwarder and it would just try the standby, as you're hinting at. This could be a way to quickly shift load from one overwhelmed host to another, though in the current scheme, next time, we'll just try the overwhelmed host again, so it's not perfect at the moment.
To me, being overwhelmed with queries is not unlike the host being down temporarily, and the solution we have is to fail over to a standby. What do you think @vinothchandar ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlanConfluent I don't understand from the code how it will try the standby if the active has exceeded the rate limit. The check does not happen in the forwarding loop so if the active fails, the query will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the rate limit check fails at the forwarding node, that it just fails. That means that the node accessed to answer pull queries is overwhelmed. I could move this down to the local pull query section, but it wouldn't limit someone from forwarding lots of pull queries through a single forwarding node, which seems bad.
If it fails the rate limit at the actual active node, then the forwarding loop will go on to the standby.
for (KsqlNode node : filteredAndOrderedNodes) {
try {
return routeQuery(node, statement, executionContext, serviceContext, pullQueryContext);
} catch (Exception t) {
LOG.debug("Error routing query {} to host {} at timestamp {}",
statement.getStatementText(), node, System.currentTimeMillis());
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By forwarding node do you mean a router? The way I understand how this change would work is this: If there is a router and the router exceeds the rate limit, it will fail the query. If the router is not overwhelmed, it will go to the forwarding loop and will try the active and standbys. If there is no router, then the active will fail the query if is has exceeded the rate limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fly by comment (feel free to ignore) :): I feel we can just enforce at the router and over time, things will settle down to that rate when all routers enforce the limit.. This is a simpler model to understand..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I agree it's a little hard to reason about if you're trying to figure out total QPS available.
I have it now check the rate limit so long as it's not been forwarded yet. It should be noted that this doesn't prevent people from deliberately trying to circumvent the limit by always setting the "forwarded" flag, though it may not find the desired data in that case (if it lives elsewhere). Any scheme (without internal, trusted rpcs) that tries to only check a limit for some requests has this issue since flags can be spoofed. I think that's not an issue for this feature though since it's not meant for security.
3dc6440
to
3eb51d4
Compare
.withConfigs(ImmutableMap.of(KsqlConfig.KSQL_QUERY_PULL_MAX_QPS_CONFIG, 2)); | ||
|
||
@Test | ||
public void shouldRateLimit() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this fail here? There are no queries issued?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It checks the limit before the request is issued. It effectively asks for permission, and if it's at the limit, it's not given permission.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @AlanConfluent! LGTM. Maybe try the code with the wrk
tool to verify it works?
Did exactly that and verified that at least on a local host, I get the QPS I set in the config. |
Description
Adds the use of a rate limiter for pull queries, to allow for capping QPS. Fails immediately if limit is reached.
Fixes #4445
Testing done
mvn package
Reviewer checklist