-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27536: improve slowlog payload #4937
Conversation
@virajjasani what do you think of this since you did some work in creating this system? It requires re-parsing the requests using ProtobufUtil, but hopefully should not be an issue since it's in the disruptor and should be a low volume relative to usual request load. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, now that I think of it, I wonder if we should just add an optional bytes operationBytes
to the SlowLogPayload. Then we can do the de-serialization into the correct Operation in ProtobufUtil.getSlowLogRecord. That way the user could actually decide what to do with it. Alternatively we could add optional fields for each of the request types, so like optional Get get
, optional Scan scan
, etc
5ec865f
to
2353e75
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
2353e75
to
654dd5f
Compare
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@rmdmattingly @bbeaudreault thanks for taking up this work. Since I am away for a while, I could not actively look into the PR but would definitely love to do so as soon as I can. If the PR is open, I should be able to review by first week of Jan. Thanks again for the improvement, I am already hopeful and feeling happy that something better is coming here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left few minor comments, looks good overall
"hbase.regionserver.slowlog.operation.json.max.cols"; | ||
public static final int SLOW_LOG_OPERATION_JSON_MAX_COLS_DEFAULT = 50; | ||
|
||
public static final String SLOW_LOG_OPERATION_JSON_ENABLED = | ||
"hbase.regionserver.slowlog.operation.json.enabled"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to remove regionserver
from the config names, since the slowlog is used for master RPC calls as well.
public static SlowLogParams getSlowLogParams(Message message, boolean slowLogOperationJsonEnabled, | ||
int maxCols) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add additional params to javadoc?
try { | ||
return operation.call().toJSON(maxCols); | ||
} catch (Exception e) { | ||
LOG.warn("Exception when deriving operation JSON", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if logging maxCols
might be helpful?
@@ -2184,6 +2193,19 @@ public static SlowLogParams getSlowLogParams(Message message) { | |||
return new SlowLogParams(params); | |||
} | |||
|
|||
private static String toJson(boolean slowLogOperationJsonEnabled, int maxCols, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: replace slowLogOperationJsonEnabled
with jsonEnabled
as generic?
Thanks @virajjasani! What do you think of the idea to send the proto message instead of json? We could add a field to the proto for get/mutate/scan/multi and send it directly. The client side would do the usual work of parsing that into the API objects using ProtobufUtil. This way users could decide to print out the requests using protobuf or do anything else they want with the raw request objects. |
I think it seems good idea to send/parse proto message instead of json as well, downstreamers could also use protobuf. The only thing is, the compatibility might not be well maintained across the releases but should be fine I guess? |
This falls under our Operational Compatibility guidelines, which basically say no breaking changes to metrics and parseable payloads such as this. We can ADD new payload formats as non default option but have to maintain compatibility with existing, although it can be deprecated and removed in a future release. Just FYI |
@apurtell There is currently no json support. This PR was initially to add a new json field to the SlowLog protobuf message, which would provide insight into the request details for the slow/large request in json format. Since we are adding a new feature and have the opportunity to think of future compatibility, I was suggesting that we not add a new json field. Instead we should add fields which contains the actual request proto. So if the slow request was a scan, it'd be stashed in a new I actually think this would be far more compatible for the future than a new json field. |
Thanks for the pointer to the guideline though. It's a helpful reminder |
Sure, if this is all new work, then either json or protobuf are good options for something that will evolve. protobuf is a better choice for a couple of reasons:
|
654dd5f
to
dff72fb
Compare
@@ -3376,7 +3401,9 @@ private static LogEntry getSlowLogRecord(final TooSlowLog.SlowLogPayload slowLog | |||
.setQueueTime(slowLogPayload.getQueueTime()).setRegionName(slowLogPayload.getRegionName()) | |||
.setResponseSize(slowLogPayload.getResponseSize()) | |||
.setServerClass(slowLogPayload.getServerClass()).setStartTime(slowLogPayload.getStartTime()) | |||
.setUserName(slowLogPayload.getUserName()).build(); | |||
.setUserName(slowLogPayload.getUserName()).setScan(slowLogPayload.getScan()) | |||
.setMulti(slowLogPayload.getMulti()).setGet(slowLogPayload.getGet()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't expose the actual protos to downstream users. Instead we should use the correct ProtobufUtil or RequestConverter methods to convert these into the actual client models here
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
dff72fb
to
ae0decb
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
ae0decb
to
ae7adfe
Compare
473204e
to
02bf657
Compare
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
02bf657
to
43e2978
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🤨 I'm a little skeptical of the spotless check failure — running spotless:apply yields no diff, and the output from the failing build looks okay |
c965124
to
1e88dc0
Compare
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
1e88dc0
to
575bf92
Compare
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
private static final Type OPERATION_LIST = | ||
TypeToken.getParameterized(List.class, Operation.class).getType(); | ||
private static final Type OPERATION_MAYBE = | ||
TypeToken.getParameterized(Optional.class, Operation.class).getType(); | ||
private static final Type OPERATION_LIST_MAYBE = | ||
TypeToken.getParameterized(Optional.class, OPERATION_LIST.getClass()).getType(); | ||
private static final JsonElement EMPTY_NODE = JsonParser.parseString(HConstants.EMPTY_STRING); | ||
private static final Gson GSON = GsonUtil.createGson().setPrettyPrinting() | ||
.registerTypeAdapter(OPERATION_MAYBE.getClass(), | ||
(JsonSerializer< | ||
Optional<Operation>>) (operationMaybe, type, jsonSerializationContext) -> operationMaybe | ||
.map(operation -> serializeCatchAll(operation).orElse(EMPTY_NODE)).orElse(EMPTY_NODE)) | ||
.registerTypeAdapter(OPERATION_LIST_MAYBE.getClass(), (JsonSerializer< | ||
Optional<List<Operation>>>) (operationsMaybe, type, jsonSerializationContext) -> { | ||
if (!operationsMaybe.isPresent()) { | ||
return EMPTY_NODE; | ||
} | ||
JsonObject jsonObj = new JsonObject(); | ||
final AtomicInteger i = new AtomicInteger(0); | ||
for (Operation operation : operationsMaybe.get()) { | ||
serializeCatchAll(operation).ifPresent(json -> { | ||
jsonObj.add(String.valueOf(i), json); | ||
i.incrementAndGet(); | ||
}); | ||
} | ||
return jsonObj; | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a lot. maybe it's the only way, but maybe there's a different way. Below we are registering a type adapter for OnlineLogRecord.class. Currently we just call gson.toJsonTree then make a few cleanups.
One question is whether we even need to fully serialize all the operation stuff to json here. I'm not sure what this is used for exactly, but maybe we don't have to include them? Otherwise, i wonder if there's a slightly simpler way to represent these in the below type adapter.
I was imaginging you could even do something like this:
if (slowLogPayload.getScan().isPresent()) {
jsonObj.set("scan", slowLogPayload.getScan().toJSON());
}
etc
You're already checking for presence of these fields below in order to remove the empties, so it might just be a matter of adding an else
to each.
Not sure, not saying that's definitely better but something to think about. I'm also surprised that there's not a built in type adapter for Optional and List, in which case we would just need a type adapter for Operation which does the toJSON() call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed — I've pushed some changes that make this much cleaner by defining a gson implementation to be used in the OnlineLogRecord serialization. I also added unit tests to validate the basic outputs.
A benefit of this approach is that we can also stop the isEmpty based removal of operation nodes in the json output
/** | ||
* If {@value org.apache.hadoop.hbase.HConstants#SLOW_LOG_OPERATION_MESSAGE_PAYLOAD_ENABLED} is | ||
* enabled then this value may be present and should represent the Scan that produced the given | ||
* {@link OnlineLogRecord}. This value should only be present if {@link #getMulti()}, | ||
* {@link #getGet()}, and {@link #getMutate()} are empty | ||
*/ | ||
public Optional<Scan> getScan() { | ||
return scan; | ||
} | ||
|
||
/** | ||
* If {@value org.apache.hadoop.hbase.HConstants#SLOW_LOG_OPERATION_MESSAGE_PAYLOAD_ENABLED} is | ||
* enabled then this value may be present and should represent the MultiRequest that produced the | ||
* given {@link OnlineLogRecord}. This value should only be present if {@link #getScan}, | ||
* {@link #getGet()}, and {@link #getMutate()} are empty | ||
*/ | ||
public Optional<List<Operation>> getMulti() { | ||
return multi; | ||
} | ||
|
||
/** | ||
* If {@value org.apache.hadoop.hbase.HConstants#SLOW_LOG_OPERATION_MESSAGE_PAYLOAD_ENABLED} is | ||
* enabled then this value may be present and should represent the Get that produced the given | ||
* {@link OnlineLogRecord}. This value should only be present if {@link #getScan()}, | ||
* {@link #getMulti()} ()}, and {@link #getMutate()} are empty | ||
*/ | ||
public Optional<Get> getGet() { | ||
return get; | ||
} | ||
|
||
/** | ||
* If {@value org.apache.hadoop.hbase.HConstants#SLOW_LOG_OPERATION_MESSAGE_PAYLOAD_ENABLED} is | ||
* enabled then this value may be present and should represent the Mutation that produced the | ||
* given {@link OnlineLogRecord}. This value should only be present if {@link #getScan}, | ||
* {@link #getMulti()} ()}, and {@link #getGet()} ()} are empty | ||
*/ | ||
public Optional<Mutation> getMutate() { | ||
return mutate; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should think a little about whether this is actually the API we want to expose. Since this class is InterfaceAudience.Public, any changes we make here we're stuck with for a long time. We can only remove/rename methods (i.e. breaking changes) at major version releases and we haven't had a major version release in years.
There's potentially a little more flexibility since it's InterfaceStability.Evolving, but I don't think there's really consensus across all committers/PMC as to whether this grants us anything. It's not mentioned in our guides except:
Public packages marked as evolving may be changed, but it is discouraged.
Anyway, not saying what we have here is wrong per-se. But we should take a minute to think about how we might evolve usage of these slow log stuff over time and make sure what we have here will support that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good advice for sure. I've mulled it over a little, and I'm not sure what I'd change at this level. Please let me know if you have any ideas
575bf92
to
9f7baa8
Compare
private static final Gson GSON = | ||
GsonUtil.createGson().setPrettyPrinting().registerTypeAdapter(OnlineLogRecord.class, | ||
(JsonSerializer<OnlineLogRecord>) (slowLogPayload, type, jsonSerializationContext) -> { | ||
Gson gson = new Gson(); | ||
JsonObject jsonObj = (JsonObject) gson.toJsonTree(slowLogPayload); | ||
JsonObject jsonObj = (JsonObject) INNER_GSON.toJsonTree(slowLogPayload); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you try doing jsonSerliazationContext.serialize(slowPayLoad)
? I feel like that's the appropriate call which should hopefully work with chaining, but not sure.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
I've talked offline with @bbeaudreault, and we've decided to close this PR and reduce the scope of the initial work here. In the near future I'll open a PR that includes the |
Related Jira: https://issues.apache.org/jira/browse/HBASE-27536
I've loaded the server changes onto a test host and verified the following two states:
hbase.regionserver.slowlog.operation.json.enabled
== false results in an empty string foroperationJson
hbase.regionserver.slowlog.operation.json.enabled
== true results in anoperationJson
as expected.Here's a picture of the entire
OnlineLogEntry
for aMutate
example:And here's a copy of the
operationJson
for aScan
example: