-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Fix the crash caused by JniScanner #44903
[BugFix] Fix the crash caused by JniScanner #44903
Conversation
this.childColumns = new OffHeapColumnVector[size]; | ||
for (int i = 0; i < size; i++) { | ||
this.childColumns[i] = new OffHeapColumnVector(newCapacity, type.childTypes.get(i)); | ||
} | ||
} | ||
} else { | ||
throw new RuntimeException("Unhandled type: " + type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
Potential loss of previously allocated OffHeapColumnVector objects during expansion for types other than BYTE storage type and failure to accommodate the child column vectors resizing needs in non-BYTE storage scenarios.
You can modify the code like this:
@@ -142,18 +142,25 @@ private void reserveInternal(int newCapacity) {
this.data = Platform.reallocateMemory(data, oldCapacity * typeSize, newCapacity * typeSize);
} else if (type.isByteStorageType()) {
this.offsetData = Platform.reallocateMemory(offsetData, oldOffsetSize, newOffsetSize);
if (this.childColumns == null) {
int childCapacity = newCapacity * DEFAULT_STRING_LENGTH;
this.childColumns = new OffHeapColumnVector[1];
this.childColumns[0] = new OffHeapColumnVector(childCapacity, new ColumnType(type.name + "#data",
ColumnType.TypeValue.BYTE));
}
- // The same condition should be checked not only for BYTE storage types but also when dealing with
- // ARRAY, MAP, or STRUCT types to prevent overwrites. However, if already created, we need a mechanism
- // to ensure that their capacity is expanded as necessary rather than just avoiding their re-creation.
+ // This logic prevents the overwriting of child columns for BYTE storage types. A similar check is needed
+ // below for other types (ARRAY, MAP, STRUCT), but with additional logic for handling capacity expansions
+ // for existing child columns efficiently without losing their data.
} else if (type.isArray() || type.isMap() || type.isStruct()) {
if (type.isArray() || type.isMap()) {
this.offsetData = Platform.reallocateMemory(offsetData, oldOffsetSize, newOffsetSize);
}
if (this.childColumns == null) {
int size = type.childTypes.size();
this.childColumns = new OffHeapColumnVector[size];
for (int i = 0; i < size; i++) {
this.childColumns[i] = new OffHeapColumnVector(newCapacity, type.childTypes.get(i));
+ }
+ } else {
+ // Assuming an expandCapacity method exists to appropriately resize each OffHeapColumnVector
+ for (OffHeapColumnVector childColumn : this.childColumns) {
+ childColumn.expandCapacity(newCapacity);
}
}
} else {
throw new RuntimeException("Unhandled type: " + type);
}
This modification addresses the risk by ensuring that existing OffHeapColumnVector
instances are preserved and properly resized as needed, rather than potentially being overwritten or not resized at all for non-BYTE storage types, thus mitigating the risk of data loss or crashes due to unhandled boundary conditions.
Signed-off-by: changxin <[email protected]>
6718129
to
3986b0e
Compare
Quality Gate passedIssues Measures |
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
Can someone review it? |
@dirtysalt @stephen-shelby Could you please review it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good job!
@Mergifyio backport branch-3.3 |
@Mergifyio backport branch-3.2 |
@Mergifyio backport branch-3.1 |
✅ Backports have been created
|
@Mergifyio backport branch-3.0 |
@Mergifyio backport branch-2.5 |
✅ Backports have been created
|
✅ Backports have been created
|
✅ Backports have been created
|
✅ Backports have been created
|
Signed-off-by: changxin <[email protected]> (cherry picked from commit b8cbc29)
Signed-off-by: changxin <[email protected]> (cherry picked from commit b8cbc29)
Signed-off-by: changxin <[email protected]> (cherry picked from commit b8cbc29) # Conflicts: # java-extensions/hive-reader/src/test/java/com/starrocks/hive/reader/TestHiveScanner.java
Signed-off-by: changxin <[email protected]> (cherry picked from commit b8cbc29) # Conflicts: # java-extensions/hive-reader/src/test/java/com/starrocks/hive/reader/TestHiveScanner.java
Signed-off-by: changxin <[email protected]> (cherry picked from commit b8cbc29) # Conflicts: # java-extensions/hive-reader/src/test/java/com/starrocks/hive/reader/TestHiveScanner.java
Co-authored-by: 裸奔丶小馒头 <[email protected]>
Co-authored-by: 裸奔丶小馒头 <[email protected]>
Why I'm doing:
A crash may occur when reading array, map<string,string> types from HiveJniScanner. Taking array as an example, the reasons are as follows:
In OffHeapColumnVector, array uses childColumns[0] to store data. For each row of array data, there are 0 to multiple rows of data corresponding to it in childColumns[0], and offsetData is used to record the Start and end positions of each row of array data in childColumns[0].
Initially, childColumns[0] has the same capacity as array. Assuming that each row of array data corresponds to multiple rows of data in childColumns[0], when appending data, childColumns[0] must be expanded first to accommodate all the data in the array.
ChildColumns[0] is an OffHeapColumnVector of String type. It directly creates a new OffHeapColumnVector when expanding, which means that the offset of the data added later will start from 0, but the offsetData of the array is still continuous. So there will be a situation where offsetData[n-1] > offsetData[n], as shown in the figure below, this will be a disaster, because offsetData will be passed to be, and be will follow offsetData[n] - offsetData[n - 1] Calculate the length of the string. The length of a negative number will be assigned to an unsigned number, so the construction of the string will go out of bounds, causing be crash.
What I'm doing:
When OffHeapColumnVector is expanding, check whether childColumns already exists. If it already exists, there is no need to reset it and just keep it, because the expansion will be done automatically in the appendValue function of childColumns.
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: