-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize parameter parsing in text chunking processor #733
Optimize parameter parsing in text chunking processor #733
Conversation
Signed-off-by: yuye-aws <[email protected]>
Signed-off-by: yuye-aws <[email protected]>
Signed-off-by: yuye-aws <[email protected]>
Signed-off-by: yuye-aws <[email protected]>
We need to rerun the CI checks to check if there are any flakey tests. |
The rolling upgrade BWC test failure is the same in #734 |
Signed-off-by: yuye-aws <[email protected]>
Hi maintainer, please attach backport-2.x label to this PR. |
The rolling upgrade BWC test is failing due to cluster health status issue. Can we rerun these tests? |
* Throw IllegalArgumentException if parameter is missing or is not an integer. | ||
*/ | ||
public static int parseIntegerParameter(final Map<String, Object> parameters, final String fieldName) { | ||
if (!parameters.containsKey(fieldName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!parameters.containsKey(fieldName)) { | |
return parseIntegerParameter(parameters, fieldName, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this design doesn't follow conventions in other place in Java API.
If you calling parse(argument) than it should execute parsing logic
If you calling parse(argument, defaultValue) than it should try to execute parsing logic, and return default value in case that parsing logic return empty value
In your code everywhere where this new method called with null you can call your new method without default value.
And in that new method implementation you can just call this method with defaultValue = null, see my previous comment with suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I don't think the function is necessary. I have removed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you calling parse(argument, defaultValue) than it should try to execute parsing logic, and return default value in case that parsing logic return empty value
Not very sure what you mean here. We should throw illegal argument exception if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your code everywhere where this new method called with null you can call your new method without default value.
Updated by Zan's comment. You can review now.
Signed-off-by: yuye-aws <[email protected]>
// all integer parameters are optional | ||
return defaultValue; | ||
// return the default value when parameter is not existing | ||
if (defaultValue == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest splitting this method to three new methods:
- parseInteger: Simply parse integer with catching number format exception.
- parseIntegerWithDefault: if field not shown, return default value, else return
parseInteger
. - parseIntegerWithException: if field not shown, throw IllegalStateException, else return
parseInteger
.
This won't add repeated code also can make sure the method 2 have concise semantic by making the default value type to int
, currently default value accepts null
which is not in a correct semantic. For runtime parameters, there isn't default value, so throwing exception is the correct semantic for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all integer parameters are optional. I will not keep the parseIntegerWithException
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, runtime parameters are always available while non-runtime parameters have default values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
Signed-off-by: yuye-aws <[email protected]>
this.maxChunkLimit = parseIntegerParameter(chunkerParameters, MAX_CHUNK_LIMIT_FIELD, DEFAULT_MAX_CHUNK_LIMIT); | ||
if (maxChunkLimit < 0 && maxChunkLimit != DISABLED_MAX_CHUNK_LIMIT) { | ||
this.maxChunkLimit = parseIntegerWithDefault(chunkerParameters, MAX_CHUNK_LIMIT_FIELD, DEFAULT_MAX_CHUNK_LIMIT); | ||
if (maxChunkLimit <= 0 && maxChunkLimit != DISABLED_MAX_CHUNK_LIMIT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am making this change to keep it consistent with the error message:
Parameter [%s] must be positive or %s to disable this parameter"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think maxChunkLimit can be 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be, it's fine to add = 0 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall code looks good to me. One comment - can you add a unit test for ChunkerParameterParser class? As we're adding more methods to it having specialized test class will give more confidence
Sure. I will add unit tests today. |
Signed-off-by: yuye-aws <[email protected]>
Signed-off-by: yuye-aws <[email protected]>
Signed-off-by: yuye-aws <[email protected]>
HI @zane-neo and @martin-gaievski ! I have implemented the unit tests for chunker parameter parser. Can you help review again? |
@@ -24,7 +24,6 @@ | |||
import java.util.LinkedList; | |||
import java.util.Map; | |||
import java.util.Set; | |||
import java.util.concurrent.ExecutorService; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is due to spotless apply
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me, thank you
@martin-gaievski Can you merge the PR? |
The last approver is expected to merge the PR. |
* Optimize parameter parsing in text chunking processor Signed-off-by: yuye-aws <[email protected]> * add change log Signed-off-by: yuye-aws <[email protected]> * fix unit tests in delimiter chunker Signed-off-by: yuye-aws <[email protected]> * fix unit tests in fixed token length chunker Signed-off-by: yuye-aws <[email protected]> * remove redundant Signed-off-by: yuye-aws <[email protected]> * refactor chunker parameter parser Signed-off-by: yuye-aws <[email protected]> * unit tests for chunker parameter parser Signed-off-by: yuye-aws <[email protected]> * fix comment Signed-off-by: yuye-aws <[email protected]> * spotless apply Signed-off-by: yuye-aws <[email protected]> --------- Signed-off-by: yuye-aws <[email protected]> (cherry picked from commit 038b1ec)
Signed-off-by: yuye-aws <[email protected]> (cherry picked from commit 038b1ec) Co-authored-by: yuye-aws <[email protected]>
* Optimize parameter parsing in text chunking processor Signed-off-by: yuye-aws <[email protected]> * add change log Signed-off-by: yuye-aws <[email protected]> * fix unit tests in delimiter chunker Signed-off-by: yuye-aws <[email protected]> * fix unit tests in fixed token length chunker Signed-off-by: yuye-aws <[email protected]> * remove redundant Signed-off-by: yuye-aws <[email protected]> * refactor chunker parameter parser Signed-off-by: yuye-aws <[email protected]> * unit tests for chunker parameter parser Signed-off-by: yuye-aws <[email protected]> * fix comment Signed-off-by: yuye-aws <[email protected]> * spotless apply Signed-off-by: yuye-aws <[email protected]> --------- Signed-off-by: yuye-aws <[email protected]>
Description
There PR optimizes the parameter parsing step in chunking algorithms. For both algorithms, all runtime parameters for text chunking processor is always available. Therefore, default value is not needed.
Issues Resolved
Optimize parameter parsing in text chunking processor.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.