Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

[NSE-265] Reserve enough memory before UnsafeAppend in builder #266

Merged
merged 2 commits into from
Apr 23, 2021

Conversation

JkSelf
Copy link
Contributor

@JkSelf JkSelf commented Apr 21, 2021

What changes were proposed in this pull request?

Related to #265

fixes: #241

shuffle builder use UnsafeAppend API for better performance. it
tries to reserve enough space based on results of last recordbatch,
this maybe not buggy if there's a dense recordbatch after a sparse one.

this patch adds below fixes:

  • adds Reset() after Finish() in builder
  • reserve length for offset_builder in binary builder

A further clean up on the reservation logic should be needed.

How was this patch tested?

locally verified

@github-actions
Copy link

#265

@JkSelf
Copy link
Contributor Author

JkSelf commented Apr 21, 2021

@zhouyuan You can have a try with this PR.

@zhouyuan
Copy link
Collaborator

@JkSelf
thanks a lot for digging into this bug! this seems to be one hidden bug from codes committed while ago
I made some quick tests on my env, it looks like we are missing one call to reserve size in the offsets_builder_. below patch can make it work also. we may need to check Reserve() the right size in somewhere.

diff --git a/native-sql-engine/cpp/src/shuffle/splitter.cc b/native-sql-engine/cpp/src/shuffle/splitter.cc
index 2eebb9ae..31856e36 100644
--- a/native-sql-engine/cpp/src/shuffle/splitter.cc
+++ b/native-sql-engine/cpp/src/shuffle/splitter.cc
@@ -1047,6 +1047,7 @@ arrow::Status Splitter::AppendBinary(
       offset_type length;
       auto value = src_arr->GetValue(row, &length);
       const auto& builder = dst_builders[partition_id_[row]];
+      RETURN_NOT_OK(builder->Reserve(1));
       RETURN_NOT_OK(builder->ReserveData(length));
       builder->UnsafeAppend(value, length);
     }
@@ -1056,6 +1057,7 @@ arrow::Status Splitter::AppendBinary(
         offset_type length;
         auto value = src_arr->GetValue(row, &length);
         const auto& builder = dst_builders[partition_id_[row]];
+        RETURN_NOT_OK(builder->Reserve(1));
         RETURN_NOT_OK(builder->ReserveData(length));
         builder->UnsafeAppend(value, length);
       } else {

@rui-mo
Copy link
Collaborator

rui-mo commented Apr 22, 2021

verified q67 on my env.

@zhouyuan zhouyuan changed the title [NSE-265] Change the UnsafeAppend to Append to fix the memmove exception [NSE-265] Reserve enough memory before UnsafeAppend in builder Apr 23, 2021
shuffle builder use UnsafeAppend API for better performance. it
tries to reserve enough space based on results of last recordbatch,
this maybe not buggy if there's a dense recordbatch after a sparse one.

this patch adds below fixes:
- adds Reset() after Finish() in builder
- reserve length for offset_builder in binary builder

A further clean up on the reservation logic should be needed.

Signed-off-by: Yuan Zhou <[email protected]>
@zhouyuan zhouyuan force-pushed the fixMemmoveException branch 2 times, most recently from 242fe7a to c4074b5 Compare April 23, 2021 04:10
@zhouyuan zhouyuan merged commit 94af4ac into oap-project:master Apr 23, 2021
zhouyuan added a commit to zhouyuan/native-sql-engine that referenced this pull request Apr 23, 2021
…roject#266)

* change the UnsafeAppend to Append

* fix buffer builder in shuffle

shuffle builder use UnsafeAppend API for better performance. it
tries to reserve enough space based on results of last recordbatch,
this maybe not buggy if there's a dense recordbatch after a sparse one.

this patch adds below fixes:
- adds Reset() after Finish() in builder
- reserve length for offset_builder in binary builder

A further clean up on the reservation logic should be needed.

Signed-off-by: Yuan Zhou <[email protected]>

Co-authored-by: Yuan Zhou <[email protected]>
zhouyuan added a commit that referenced this pull request Apr 23, 2021
* [NSE-262] fix remainer loss in decimal divide (#263)

* fix decimal divide int issue

* correct cpp uts

* use const reference

Co-authored-by: Yuan <[email protected]>

Co-authored-by: Yuan <[email protected]>

* [NSE-265] Reserve enough memory before UnsafeAppend in builder (#266)

* change the UnsafeAppend to Append

* fix buffer builder in shuffle

shuffle builder use UnsafeAppend API for better performance. it
tries to reserve enough space based on results of last recordbatch,
this maybe not buggy if there's a dense recordbatch after a sparse one.

this patch adds below fixes:
- adds Reset() after Finish() in builder
- reserve length for offset_builder in binary builder

A further clean up on the reservation logic should be needed.

Signed-off-by: Yuan Zhou <[email protected]>

Co-authored-by: Yuan Zhou <[email protected]>

Co-authored-by: Rui Mo <[email protected]>
Co-authored-by: JiaKe <[email protected]>
zhouyuan added a commit that referenced this pull request May 13, 2021
* [NSE-262] fix remainer loss in decimal divide (#263)

* fix decimal divide int issue

* correct cpp uts

* use const reference

Co-authored-by: Yuan <[email protected]>

Co-authored-by: Yuan <[email protected]>

* [NSE-261] ArrowDataSource: Add S3 Support (#270)

Closes #261

* [NSE-196] clean up configs in unit tests (#271)

* remove testing config

* remove unused configs

* [NSE-265] Reserve enough memory before UnsafeAppend in builder (#266)

* change the UnsafeAppend to Append

* fix buffer builder in shuffle

shuffle builder use UnsafeAppend API for better performance. it
tries to reserve enough space based on results of last recordbatch,
this maybe not buggy if there's a dense recordbatch after a sparse one.

this patch adds below fixes:
- adds Reset() after Finish() in builder
- reserve length for offset_builder in binary builder

A further clean up on the reservation logic should be needed.

Signed-off-by: Yuan Zhou <[email protected]>

Co-authored-by: Yuan Zhou <[email protected]>

* [NSE-274] Comment to trigger tpc-h RAM test (#275)

Closes #274

* bump cmake to 3.16 (#281)

Signed-off-by: Yuan Zhou <[email protected]>

* [NSE-276] Add option to switch Hadoop version (#277)

Closes #276

* [NSE-119] clean up on comments (#288)

Signed-off-by: Yuan Zhou <[email protected]>

* [NSE-206]Update installation guide and configuration guide. (#289)

* [NSE-206]Update installation guide and configuration guide.

* Fix numaBinding setting issue. & Update description for protobuf

* [NSE-206]Fix Prerequisite and Arrow Installation Steps. (#290)

* [NSE-245]Adding columnar RDD cache support (#246)

* Adding columnar RDD cache support

Signed-off-by: Chendi Xue <[email protected]>

* Directly save reference, only convert to Array[Byte] when calling by BlockManager

Signed-off-by: Chendi Xue <[email protected]>

* Add DeAllocator to construction to make sure this instance will be released once it be deleted by JVM

Signed-off-by: Chendi Xue <[email protected]>

* Delete cache by adding a release in InMemoryRelation

Since unpersist only delete RDD object, seems our deAllocator wasn't being called along
Now we added a release function in InMemoryRelation clearCache() func, may need to think
a new way for 3.1.0

Signed-off-by: Chendi Xue <[email protected]>

* [NSE-207] fix issues found from aggregate unit tests (#233)

* fix incorrect input in Expand

* fix empty input for aggregate

* fix only result expressions

* fix empty aggregate expressions

* fix res attr not found issue

* refine

* fix count distinct with null

* fix groupby of NaN, -0.0 and 0.0

* fix count on mutiple cols with null in WSCG

* format code

* support normalize NaN and 0.0

* revert and update

* support normalize function in WSCG

* [NSE-206]Update documents and License for 1.1.0 (#292)

* [NSE-206]Update documents and remove duplicate parts

* Modify documents by comments

* [NSE-293] fix unsafemap with key = '0' (#294)

Signed-off-by: Yuan Zhou <[email protected]>

* [NSE-257] fix multiple slf4j bindings (#291)

* [NSE-297] Disable incremental compiler in GHA CI (#298)

Closes #297

* [NSE-285] ColumnarWindow: Support Date input in MAX/MIN (#286)

Closes #285

* [NSE-304] Upgrade to Arrow 4.0.0: Change basic GHA TPC-H test target OAP Arrow branch (#306)

* [NSE-302] remove exception (#303)

* [NSE-273] support spark311 (#272)

* support spark 3.0.2

Signed-off-by: Yuan Zhou <[email protected]>

* update to use spark 302 in unit tests

Signed-off-by: Yuan Zhou <[email protected]>

* support spark 311

Signed-off-by: Yuan Zhou <[email protected]>

* fix

Signed-off-by: Yuan Zhou <[email protected]>

* fix missing dep

Signed-off-by: Yuan Zhou <[email protected]>

* fix broadcastexchange metrics

Signed-off-by: Yuan Zhou <[email protected]>

* fix arrow data source

Signed-off-by: Yuan Zhou <[email protected]>

* fix sum with decimal

Signed-off-by: Yuan Zhou <[email protected]>

* fix c++ code

Signed-off-by: Yuan Zhou <[email protected]>

* adding partial sum decimal sum

Signed-off-by: Yuan Zhou <[email protected]>

* fix hashagg in wscg

Signed-off-by: Yuan Zhou <[email protected]>

* fix partial sum with number type

Signed-off-by: Yuan Zhou <[email protected]>

* fix AQE shuffle copy

Signed-off-by: Yuan Zhou <[email protected]>

* fix shuffle redudant reat

Signed-off-by: Yuan Zhou <[email protected]>

* fix rebase

Signed-off-by: Yuan Zhou <[email protected]>

* fix format

Signed-off-by: Yuan Zhou <[email protected]>

* avoid unecessary fallbacks

Signed-off-by: Yuan Zhou <[email protected]>

* on-demand scala unit tests

Signed-off-by: Yuan Zhou <[email protected]>

* clean up

Signed-off-by: Yuan Zhou <[email protected]>

* [NSE-311] Build reports errors (#312)

Closes #311

* [NSE-257] fix the dependency issue on v2

Co-authored-by: Rui Mo <[email protected]>
Co-authored-by: Hongze Zhang <[email protected]>
Co-authored-by: JiaKe <[email protected]>
Co-authored-by: Wei-Ting Chen <[email protected]>
Co-authored-by: Chendi.Xue <[email protected]>
Co-authored-by: Hong <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TPC-DS q67 failed for XXH3_hashLong_64b_withSecret.constprop.0+0x180
3 participants