You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm getting random "Error opening rules configuration file" when using Routing Rule Engine. It only happens when I have 2 or more clients sending requests to presto-gateway.
ERROR [2022-04-04 21:03:57,026] com.lyft.data.gateway.ha.router.RoutingGroupSelector$Logger: Error opening rules configuration file, using routing group header as default.
! org.yaml.snakeyaml.parser.ParserException: while parsing a block node
! in 'reader', line 13, column 1:
!
! ^
! expected the node content, but found '<block end>'
! in 'reader', line 13, column 1:
!
! ^
!
! at org.yaml.snakeyaml.parser.ParserImpl.parseNode(ParserImpl.java:482)
! at org.yaml.snakeyaml.parser.ParserImpl.access$1300(ParserImpl.java:117)
! at org.yaml.snakeyaml.parser.ParserImpl$ParseBlockNode.produce(ParserImpl.java:359)
! at org.yaml.snakeyaml.parser.ParserImpl$ParseBlockSequenceEntry.produce(ParserImpl.java:506)
! at org.yaml.snakeyaml.parser.ParserImpl$ParseBlockSequenceFirstEntry.produce(ParserImpl.java:496)
! at org.yaml.snakeyaml.parser.ParserImpl.peekEvent(ParserImpl.java:158)
! at org.yaml.snakeyaml.parser.ParserImpl.checkEvent(ParserImpl.java:148)
! at org.yaml.snakeyaml.composer.Composer.checkNode(Composer.java:78)
! at org.yaml.snakeyaml.constructor.BaseConstructor.checkData(BaseConstructor.java:123)
! at org.yaml.snakeyaml.Yaml$1.hasNext(Yaml.java:489)
! at org.jeasy.rules.support.reader.YamlRuleDefinitionReader.loadRules(YamlRuleDefinitionReader.java:71)
! at org.jeasy.rules.support.reader.AbstractRuleDefinitionReader.read(AbstractRuleDefinitionReader.java:43)
! at org.jeasy.rules.mvel.MVELRuleFactory.createRules(MVELRuleFactory.java:100)
! at com.lyft.data.gateway.ha.router.RoutingGroupSelector.lambda$byRoutingRulesEngine$1(RoutingGroupSelector.java:42)
! at com.lyft.data.gateway.ha.handler.QueryIdCachingProxyHandler.rewriteTarget(QueryIdCachingProxyHandler.java:123)
! at com.lyft.data.proxyserver.ProxyServletImpl.rewriteTarget(ProxyServletImpl.java:65)
...
Using:
java version "1.8.0_202"
Python 3.7.13
trino-python-client 0.3.11.0
rules.yaml:
---
name: 'airflow'description: 'if query from airflow, route to etl group'condition: 'request.getHeader("X-Trino-Source") == "trino-python-client"'actions:
- 'result.put("routingGroup", "adhoc")'
When the error is raised, presto-gateway routes request to default RoutingGroup "adhoc". If RoutingGroup "adhoc" does not exist, client request fails.
CPU usage keeps increasing over time on the JVM. I stopped my python clients after 30mins, but CPU usage was still at 25%. If I leave it running for days, it will eventually eat up all my CPU cores.
After further investigation, I narrowed it down to MVELRuleFactory class. I'm not a developer so I can't really explain what is happening but the following fixes the issue.
--- a/gateway-ha/src/main/java/com/lyft/data/gateway/ha/router/RoutingGroupSelector.java
+++ b/gateway-ha/src/main/java/com/lyft/data/gateway/ha/router/RoutingGroupSelector.java
@@ -34,11 +34,11 @@ public interface RoutingGroupSelector {
* to determine the right routing group.
*/
static RoutingGroupSelector byRoutingRulesEngine(String rulesConfigPath) {
- RulesEngine rulesEngine = new DefaultRulesEngine();
- MVELRuleFactory ruleFactory = new MVELRuleFactory(new YamlRuleDefinitionReader());
return request -> {
try {
+ RulesEngine rulesEngine = new DefaultRulesEngine();
+ MVELRuleFactory ruleFactory = new MVELRuleFactory(new YamlRuleDefinitionReader());
Rules rules = ruleFactory.createRules(
new FileReader(rulesConfigPath));
Facts facts = new Facts();
It ran for hours without any errors and CPU usage was stable.
If anyone can provide more input, It would be appreciated.
The text was updated successfully, but these errors were encountered:
the previous implementation required to read the routing rules config file for each request, adding unnecessary latency. More importantly, it was done in a non-thread-safe way, as the `rulesEngine` and `ruleFactory` were shared among all requests, but the `Rules` creation was not. \
Since the class MVELRuleFactory is stateful (as it has `RuleDefinitionReader reader` and `ParserContext parserContext` as private members), it should not be shared between concurrent threads. `Rules` OTOH is a read-only class after it's created (it's just iterated over by the `DefaultRulesEngine`). Given that the rules are the same for all requests, and that the `Rules` usage is thread-safe, it makes sense to share it across all of them.\
We have deployed this code in production at Datadog and our APM profiling tool shows a reduction of around ~1-2%, plus we stopped seeing the error described in lyft#166
I'm getting random "Error opening rules configuration file" when using Routing Rule Engine. It only happens when I have 2 or more clients sending requests to presto-gateway.
Using:
java version "1.8.0_202"
Python 3.7.13
trino-python-client 0.3.11.0
rules.yaml:
python client script:
Other Observations:
After further investigation, I narrowed it down to MVELRuleFactory class. I'm not a developer so I can't really explain what is happening but the following fixes the issue.
It ran for hours without any errors and CPU usage was stable.
If anyone can provide more input, It would be appreciated.
The text was updated successfully, but these errors were encountered: