Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error message when resource group selector cannot be loaded #13036

Merged
merged 1 commit into from
Aug 29, 2019

Conversation

wenleix
Copy link
Contributor

@wenleix wenleix commented Jun 30, 2019

No release notes

@@ -159,6 +169,15 @@ public void configure(ResourceGroup group, SelectionContext<VariableMap> criteri
if (lastRefresh.get() == 0) {
throw new PrestoException(CONFIGURATION_UNAVAILABLE, "Selectors cannot be fetched from database");
}

if (System.nanoTime() - lastRefresh.get() > maxRefreshInterval.toMillis() * 1_000_000) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put these three blocks into a helper maybe.

refreshFailures.update(1);
log.error(e, "Error loading configuration from db");
if (lastRefresh.get() != 0) {
log.debug("Last successful configuration loading was %s ago", succinctNanos(System.nanoTime() - lastRefresh.get()).toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this line be useful in test clusters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@highker : "Test clusters"? -- I am thinking it will help when something goes wrong and oncall can check the log.

@highker highker removed their assignment Jul 1, 2019
@@ -119,6 +120,15 @@ public DbResourceGroupConfigurationManager(ClusterMemoryPoolManager memoryPoolMa
if (lastRefresh.get() == 0) {
throw new PrestoException(CONFIGURATION_UNAVAILABLE, "Root groups cannot be fetched from database");
}

if (System.nanoTime() - lastRefresh.get() > maxRefreshInterval.toMillis() * 1_000_000) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic using succinctNanos is easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rongrong : But the underlying logic looks quite involved. Not sure if we want to call it per each second...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it involved enough to cause coordinator perf issues? If not I won't care. If yes then change maxRefreshInterval to maxRefreshIntervalNanos instead and initialize it that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use MILLISECONDS.toNanos(1) instead of 1_000_000 to help readability :)

// TODO: consider not throwing exception but only have a warning in this case
throw new PrestoException(
CONFIGURATION_UNAVAILABLE,
format("Selectors cannot be fetched from database. Current selectors are loaded %s ago",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Root groups cannot be fetched..."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem is not really fixed 😛

@@ -119,6 +120,15 @@ public DbResourceGroupConfigurationManager(ClusterMemoryPoolManager memoryPoolMa
if (lastRefresh.get() == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just change this to a checkArgument(lastRefresh.get() != 0) or remove it altogether. It's not very useful. If lastRefresh is 0 the new logic would throw anyways. You can change the error message to print something more reasonable for this special case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means resource group has never successfully being fetched -- this means something is severely wrong but it could happen right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well at least currently you are throwing the same exception in both cases, thus nothing distinguishes the "severity". Removing this and and modify the error message to handle lastRefresh = 0 has the same effect. If you actually believe it should be more severe, handle it accordingly. 😛

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rongrong : Ah I see. From the code perspective is about whether "lastReferesh == 0" means something special case :).

I don't have strong opinion here, although my personal coding style will handle it separately :). Anyhow, I will put it as a special case inside "System.nanoTime() - lastRefresh.get() > maxRefreshInterval.toMillis() * 1_000_000"

@wenleix wenleix force-pushed the laod_rg branch 4 times, most recently from 893d4bc to 6d866db Compare July 9, 2019 23:53
@wenleix
Copy link
Contributor Author

wenleix commented Jul 10, 2019

@rongrong : Comments addressed. Can you take another pass? ^_^

if (lastRefresh.get() == 0) {
throw new PrestoException(CONFIGURATION_UNAVAILABLE, "Root groups cannot be fetched from database");
}
checkMaxRefreshInterval();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function will print cannot fetch selectors but this should be about root groups. They are likely to fail at the same time so I'm ok with printing the same message for both, but it should not print selectors in that case.

@wenleix wenleix merged commit d960aef into prestodb:master Aug 29, 2019
@wenleix wenleix deleted the laod_rg branch August 29, 2019 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants