-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27681 Refactor Table Latency Metrics #5072
Conversation
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to do a more thorough read-through, but I do like the direction here so far... A couple questions:
- Should we do anything to unify with TableMetrics/MetricsTableSourceImpl class?
- Are there any additional tests we should add to verify all the new behavior end-to-end? It's a big change. We might at least want to upload a screenshot or dump of jmx metrics from a real server to prove that all of the expected metrics get updated.
I also find a lot of duplication in RSRpcServices. I almost wonder if we should expose a method on HRegion:
region.ifTableMetricsEnabled(metrics -> metrics.updateGet(timeCost));
Maybe not a huge improvement, but I find all the null checks tedious. Open to other ideas.
I think so. But maybe we can split it into another issue? What do you think?
I will do some tests and upload, maybe a doc, in this weekend.
Glad to make some changes. |
Convergence of null check, and renamed metricsName to tableRequests. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for trying that @thangTang ... Sorry, now that I see it.. I think maybe it's not the right move. Instead maybe some other abstraction. I need to think on this more, but this is what I'm noticing:
We're making the exact same calls, just with different receivers. For example:
if (region.getTableMetrics() != null) {
region.getTableMetrics().updateDelete(timeCost));
}
if (metricsRegionServer != null) {
metricsRegionServer.updateDelete(timeCost);
}
So we're calling updateDelete twice, just on different objects. This pattern repeats for every single method of RSRpcServices. This isn't so bad for updateDelete, but some of them (i.e. scans) involve multiple calls, lots of duplication.
Previously this duplication was nicely encapsulated in MetricsRegionServer. We passed in a TableName, and it updated table metrics if appropriately.
You removed the TableName argument, so now the code has to live outside MetricsRegionServer. What if you replace TableName argument with Region argument? So MetricRegionServer.updateGet would become:
public void updateGet(Region region, long t) {
if (region.getTableMetrics() != null) {
region.getTableMetrics().updateGet(t);
}
if (t > slowMetricTime) {
serverSource.incrSlowGet();
}
serverSource.updateGet(t);
userAggregate.updateGet(t);
}
Still lots of calling the same method on different objects, but at least it's encapsulated. Alternatively I feel like we want some other abstraction to encapsulate this, rather than add lots of duplicate code in RSRpcServices (which is already a huge class).
Thoughts?
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Agree that is a truly problem. Honestly I have consider it too. I think the previous code structure is a bit confusing. My vision is that each metrics object is only responsible for operating one type of metrics (that is, a bean in jmx), so that the metric of the table belongs to the metricsTable, the metric of the server belongs to the metricsServer, and the metric of the region belongs to the metricsRegion. Then there is a tool class that only needs to be called once externally (such as in RSRpcServices), and encapsulates all operations on metrics in the methods of this tool class. In fact, the current But in order to disassemble the work and try to have a clear goal, in this ticket I don't intend to modify anything other than tableLatencies. That's why I just left it here. After all, now I'm glad to temporarily put these logics back into |
@thangTang your analysis makes a lot of sense, and I 100% agree with the direction. I agree the existing setup is a bit confusing, and MetricsRegionServer is more of a tool class. I agree it would be nice to either rename it or refactor it, and that can happen in a separate issue. For this PR, I think it would be nice to move the logic back into MetricsRegionServer for now. For two reasons:
Passing I have another PR #5067 which is going to add more metrics to Table and RegionServer beans. I think that PR would also end up quite a bit messier if we moved the logic into RSRpcServices, so I think that's a good reason (ease of adding new metrics) |
Thank you for your suggestion @bbeaudreault ~ |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a couple more code cleanliness stuff, thanks for the work here
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
Show resolved
Hide resolved
// Handle table latency metrics | ||
private MetricsTableRequests metricsTableRequests; | ||
|
||
public void ifTableRequestsMetricsExist(Consumer<MetricsTableRequests> consumer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry i wasn't clear... i was wrong here, and I dont think this method actually improves much. can we revert this part too? sorry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me try.
tableMetrics.updatePutBatch(tn, t); | ||
} | ||
public void updatePutBatch(HRegion region, long t) { | ||
Optional.ofNullable(region).ifPresent(r -> Optional.ofNullable(r.getMetricsTableRequests()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I thought you were going to remove all the Optional stuff.
If you want to keep that, I think we should just have region.getTableMetrics() return an Optional. That way we don't have to create 2 optionals in every call. You can an just chain them together with flatMap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, my fault...
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was trying to get at with the optional stuff, regionservers can be pretty cpu and memory sensitive. Let's say a regionserver is doing 10k req/s. Every request is instrumented with MetricsRegionServer. So you are adding 2 calls to Object.ofNullablr for each request. The first one (region) will always be present, the second will be present anytime table metrics are enabled.
So that's 20k object allocations for really no reason. I don't think optionals are a good replacement for a null check.
For getTableRequestMetrics, I was thinking you'd create the optional only once when the region opens. So
tableRequestsmetrics = Optional.of(new TableRequestMetrics());
public Optional<TableRequestMetrics> getTableRequestMetrics() {
return tableRequestMetrics;
}
The optional Region param is only added for tests. Otherwise it's always non-null. You could just remove that null check on region and update the tests to pass a mocked Region.
Also I think the pre-commit hook is saying you need to rebase your PR
🎊 +1 overall
This message was automatically generated. |
Thank you very much for your patient review. |
Actually, I have another question I would like to ask for your opinion. Thank you! |
💔 -1 overall
This message was automatically generated. |
After removed the null check of
That's because And thank you for providing me the informations about Meter metrics! |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Removed some unnecessary modify in Failure test seems not related. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @thangTang! This is looking great, and I appreciate the thorough testing doc.
I had a few more nitpick comments, and then I think we're good.
Finally, I think you still need to rebase on master. Note the CI comments say:
Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
So can you pull latest master, then rebase your branch on it and squash the commits? I just like to have the CI be totally green before merging.
In terms of the failing test, I agree it looks unrelated. Can you try running the failing test locally with your patch just to be 100%?
Once all of that is done, we'll be ready to merge. Thanks again!
@@ -962,6 +970,7 @@ long initialize(final CancelableProgressable reporter) throws IOException { | |||
} | |||
|
|||
} | |||
Optional.ofNullable(metricsTableRequests).ifPresent(metrics -> metrics.removeRegistry()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you replace this and below with normal null checks?
} | ||
|
||
public void updatePutBatch(HRegion region, long t) { | ||
ifTableRequestsMetricsExist(region, metrics -> metrics.updatePutBatch(t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you replace these with normal null checks? similarly not sure we need the overhead of a lambda.
(i had originally suggested a method like this back when the RSRpcServices diff was really large, but now that it's all nicely encapsulated I don't think it's needed.
* @param t time it took | ||
*/ | ||
public void updatePut(long t) { | ||
ifEnableTableLatenciesMetrics(() -> putTimeHistogram.update(t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you repalce these lambda calls with normal if statements?
...-server/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/MetricsTableRequests.java
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @thangTang! This unified implementation is so much cleaner and great that we easily support removals now.
Assuming the CI comes back green, this looks good to me. Will merge once all green
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
The new failed test seems also not related. I tried to run
|
@thangTang I tried cherry-pick to branch-2, but there are a bunch of merge conflicts. Any chance you can submit a backport PR for branch-2? |
Sure. Will open a PR this week. Thank you for your patient and meticulous
review, it helped me a lot: )
Bryan Beaudreault ***@***.***>于2023年3月8日 周三01:09写道:
… @thangTang <https://github.com/thangTang> I tried cherry-pick to
branch-2, but there are a bunch of merge conflicts. Any chance you can
submit a backport PR for branch-2?
—
Reply to this email directly, view it on GitHub
<#5072 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKMK637GQFX2OLPEJYK6GLW25TWFANCNFSM6AAAAAAVMHT23E>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@thangTang also, I'm not sure how I missed this but I just realized that we didn't delete MetricsTableLatencies or MetricsTableLatenciesImpl classes. Those are now unused I believe, but the classes still exist. I think we might need another jira to delete those. |
Oh... Sorry about that... |
Pls help me take a look at #5092. If you think we need a single ticket, feel free to leave a msg to me and I will create a new one. |
Yes that works. I can take a look tomorrow morning my time. Also, we don't need a new jira for backport. We sometimes do that when doing a backport after the original jira was long resolved. But we haven't restocked this yet, so can just use the same jira |
Sorry, I just saw this message... I already opened a new issue for Backport this morning😂 |
I think I'm going to rename the title of your PR and rewrite the commit message when I merge, so that it can be associated with the original jira. As someone who often goes digging in jira to find when or why changes occurred, I think it's much nicer to keep the backports in the original jira if at all possible. So once I merge we can resolve your new jira as |
Hi no need to do this by yourself @bbeaudreault , if you think so, just give me a few minutes to rebase my commit and re-push it, then you can just review and merge: ) |
@thangTang no worries -- it's easy for me to do in the github UI when I merge. Github gives me a chance to rewrite the commit (which we use for adding the I'm just reviewing it now, so hopefully will merge soon |
OK, then thank you so much.
Bryan Beaudreault ***@***.***>于2023年3月8日 周三22:57写道:
… @thangTang <https://github.com/thangTang> no worries -- it's easy for me
to do in the github UI when I merge. Github gives me a chance to rewrite
the commit (which we use for adding the Signed-off-by tags and such). I
can just change it there.
—
Reply to this email directly, view it on GitHub
<#5072 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKMK62UXAMHA7MQCBH2Z73W3CM5FANCNFSM6AAAAAAVMHT23E>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
No description provided.