-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error message when resource group selector cannot be loaded #13036
Conversation
wenleix
commented
Jun 30, 2019
•
edited
Loading
edited
26e5adf
to
ee19cb3
Compare
@@ -159,6 +169,15 @@ public void configure(ResourceGroup group, SelectionContext<VariableMap> criteri | |||
if (lastRefresh.get() == 0) { | |||
throw new PrestoException(CONFIGURATION_UNAVAILABLE, "Selectors cannot be fetched from database"); | |||
} | |||
|
|||
if (System.nanoTime() - lastRefresh.get() > maxRefreshInterval.toMillis() * 1_000_000) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put these three blocks into a helper maybe.
refreshFailures.update(1); | ||
log.error(e, "Error loading configuration from db"); | ||
if (lastRefresh.get() != 0) { | ||
log.debug("Last successful configuration loading was %s ago", succinctNanos(System.nanoTime() - lastRefresh.get()).toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this line be useful in test clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@highker : "Test clusters"? -- I am thinking it will help when something goes wrong and oncall can check the log.
@@ -119,6 +120,15 @@ public DbResourceGroupConfigurationManager(ClusterMemoryPoolManager memoryPoolMa | |||
if (lastRefresh.get() == 0) { | |||
throw new PrestoException(CONFIGURATION_UNAVAILABLE, "Root groups cannot be fetched from database"); | |||
} | |||
|
|||
if (System.nanoTime() - lastRefresh.get() > maxRefreshInterval.toMillis() * 1_000_000) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic using succinctNanos
is easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rongrong : But the underlying logic looks quite involved. Not sure if we want to call it per each second...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it involved enough to cause coordinator perf issues? If not I won't care. If yes then change maxRefreshInterval
to maxRefreshIntervalNanos
instead and initialize it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use MILLISECONDS.toNanos(1)
instead of 1_000_000
to help readability :)
// TODO: consider not throwing exception but only have a warning in this case | ||
throw new PrestoException( | ||
CONFIGURATION_UNAVAILABLE, | ||
format("Selectors cannot be fetched from database. Current selectors are loaded %s ago", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Root groups cannot be fetched..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This problem is not really fixed 😛
@@ -119,6 +120,15 @@ public DbResourceGroupConfigurationManager(ClusterMemoryPoolManager memoryPoolMa | |||
if (lastRefresh.get() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just change this to a checkArgument(lastRefresh.get() != 0)
or remove it altogether. It's not very useful. If lastRefresh
is 0 the new logic would throw anyways. You can change the error message to print something more reasonable for this special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means resource group has never successfully being fetched -- this means something is severely wrong but it could happen right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well at least currently you are throwing the same exception in both cases, thus nothing distinguishes the "severity". Removing this and and modify the error message to handle lastRefresh = 0
has the same effect. If you actually believe it should be more severe, handle it accordingly. 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rongrong : Ah I see. From the code perspective is about whether "lastReferesh == 0" means something special case :).
I don't have strong opinion here, although my personal coding style will handle it separately :). Anyhow, I will put it as a special case inside "System.nanoTime() - lastRefresh.get() > maxRefreshInterval.toMillis() * 1_000_000"
...src/main/java/com/facebook/presto/resourceGroups/db/DbResourceGroupConfigurationManager.java
Outdated
Show resolved
Hide resolved
893d4bc
to
6d866db
Compare
@rongrong : Comments addressed. Can you take another pass? ^_^ |
if (lastRefresh.get() == 0) { | ||
throw new PrestoException(CONFIGURATION_UNAVAILABLE, "Root groups cannot be fetched from database"); | ||
} | ||
checkMaxRefreshInterval(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function will print cannot fetch selectors but this should be about root groups. They are likely to fail at the same time so I'm ok with printing the same message for both, but it should not print selectors in that case.