-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17338: [Java] The maximum request memory of BaseVariableWidthVector should limit to Integer.MAX_VALUE #13815
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -46,6 +46,7 @@ public abstract class BaseVariableWidthVector extends BaseValueVector | |||||||||
implements VariableWidthVector, FieldVector, VectorDefinitionSetter { | ||||||||||
private static final int DEFAULT_RECORD_BYTE_COUNT = 8; | ||||||||||
private static final int INITIAL_BYTE_COUNT = INITIAL_VALUE_ALLOCATION * DEFAULT_RECORD_BYTE_COUNT; | ||||||||||
private static final int MAX_BUFFER_SIZE = (int) Math.min(MAX_ALLOCATION_SIZE, Integer.MAX_VALUE); | ||||||||||
private int lastValueCapacity; | ||||||||||
private long lastValueAllocationSizeInBytes; | ||||||||||
|
||||||||||
|
@@ -430,7 +431,7 @@ public void allocateNew(int valueCount) { | |||||||||
|
||||||||||
/* Check if the data buffer size is within bounds. */ | ||||||||||
private void checkDataBufferSize(long size) { | ||||||||||
if (size > MAX_ALLOCATION_SIZE || size < 0) { | ||||||||||
if (size > MAX_BUFFER_SIZE || size < 0) { | ||||||||||
throw new OversizedAllocationException("Memory required for vector " + | ||||||||||
" is (" + size + "), which is more than max allowed (" + MAX_ALLOCATION_SIZE + ")"); | ||||||||||
ConeyLiu marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
} | ||||||||||
|
@@ -445,7 +446,7 @@ private long computeAndCheckOffsetsBufferSize(int valueCount) { | |||||||||
* an additional slot in offset buffer. | ||||||||||
*/ | ||||||||||
final long size = computeCombinedBufferSize(valueCount + 1, OFFSET_WIDTH); | ||||||||||
if (size > MAX_ALLOCATION_SIZE) { | ||||||||||
if (size > MAX_BUFFER_SIZE) { | ||||||||||
throw new OversizedAllocationException("Memory required for vector capacity " + | ||||||||||
valueCount + | ||||||||||
" is (" + size + "), which is more than max allowed (" + MAX_ALLOCATION_SIZE + ")"); | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder whether the exception messages should point users to LargeVar*Vectors when exceeding buffer capacity. Based on my experiences with RDBMS, I expected LargeVarCharVector to be suitable for storing large values, and missed that it is needed for many small values as well. I'm not sure how well-understood this is, and perhaps users would benefit from being pointed in an appropriate direction. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree. I think the error message should give more information about how to solve the problem. Updated the message to point to the LargeVar*Vectors. |
||||||||||
|
@@ -1240,7 +1241,7 @@ protected final void handleSafe(int index, int dataLength) { | |||||||||
* So even though we may have setup an initial capacity of 1024 | ||||||||||
* elements in the vector, it is quite possible | ||||||||||
* that we need to reAlloc() the data buffer when we are setting | ||||||||||
* the 5th element in the vector simply because previous | ||||||||||
* the 1025th element in the vector simply because previous | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, I don't think this change is right. You should read this example as:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the detailed explanation. Reverted the change. |
||||||||||
* variable length elements have exhausted the buffer capacity. | ||||||||||
* However, we really don't need to reAlloc() validity and | ||||||||||
* offset buffers until we try to set the 1025th element | ||||||||||
|
@@ -1250,8 +1251,10 @@ protected final void handleSafe(int index, int dataLength) { | |||||||||
while (index >= getValueCapacity()) { | ||||||||||
reallocValidityAndOffsetBuffers(); | ||||||||||
} | ||||||||||
final int startOffset = lastSet < 0 ? 0 : getStartOffset(lastSet + 1); | ||||||||||
while (valueBuffer.capacity() < (startOffset + dataLength)) { | ||||||||||
final long startOffset = lastSet < 0 ? 0 : getStartOffset(lastSet + 1); | ||||||||||
final long targetCapacity = startOffset + dataLength; | ||||||||||
checkDataBufferSize(targetCapacity); | ||||||||||
while (valueBuffer.capacity() < targetCapacity) { | ||||||||||
reallocDataBuffer(); | ||||||||||
} | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Slightly unrelated, but why is this using a
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the reallocation may not meet the memory request if the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. My point is that it's wasteful to reallocate several times in a row, instead of reallocating directly to the desired target capacity. Anyway, this was already the case before. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got your point. And updated the code. It should be the right way. |
||||||||||
} | ||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I assume the check for negative size values is for overflows, but even so, the appearance of a negative in the error message text below could be misleading as a literal reading would say that a negative number is more than the max allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small update the error message for the
overflow
case.