Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import data cause BufferOverflowException #3728

Closed
ljwh opened this issue Jan 26, 2024 · 0 comments · Fixed by #3729
Closed

Import data cause BufferOverflowException #3728

ljwh opened this issue Jan 26, 2024 · 0 comments · Fixed by #3729
Assignees
Labels
bug Something isn't working

Comments

@ljwh
Copy link
Contributor

ljwh commented Jan 26, 2024

Bug Description
I am trying to import data to online db with hive table, exception happens if there are some strings length bigger than 255:

Caused by: java.io.IOException: write row to openmldb failed on:  ... 
	at com._4paradigm.openmldb.spark.write.OpenmldbDataSingleWriter.write(OpenmldbDataSingleWriter.java:89)
	at com._4paradigm.openmldb.spark.write.OpenmldbDataSingleWriter.write(OpenmldbDataSingleWriter.java:39)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:419)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:457)
	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:358)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.nio.BufferOverflowException
	at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:194)
	at java.nio.ByteBuffer.put(ByteBuffer.java:867)
	at com._4paradigm.openmldb.common.codec.FlexibleRowBuilder.build(FlexibleRowBuilder.java:385)
	at com._4paradigm.openmldb.sdk.impl.InsertPreparedStatementImpl.buildRow(InsertPreparedStatementImpl.java:302)
	at com._4paradigm.openmldb.sdk.impl.InsertPreparedStatementImpl.execute(InsertPreparedStatementImpl.java:317)
	at com._4paradigm.openmldb.spark.write.OpenmldbDataSingleWriter.write(OpenmldbDataSingleWriter.java:77)
	... 13 more

Expected Behavior
import data success

Relation Case
no

Steps to Reproduce

  1. prepare some data that some all string value length bigger than 255 and some less
  2. import these data to online db
  3. exception with java.nio.BufferOverflowException

After digging into the code,

    // FlexibleRowBuilder.java
    int totalSize = strFieldStartOffset + strAddrLen + strTotalLen;
    // check totalSize if bigger than UNIT8_MAX or UNIT16_MAX ...
    int curStrAddrSize = CodecUtil.getAddrLength(totalSize);
    if (curStrAddrSize > strAddrSize) {
        // strAddrBuf will be expanded if the totalSize bigger than UNIT8_MAX(255)
        strAddrBuf = expandStrLenBuf(curStrAddrSize, settedStrCnt);
        strAddrSize = curStrAddrSize;
        totalSize = strFieldStartOffset + strAddrLen + strTotalLen;
    }

private variable strAddrBuf will be expanded if totalSize bigger than UNIT8_MAX(255) and wil be used for the following records but never reduce the array size, that causes java.nio.BufferOverflowException.

currently i manually reduce the strAddrBuf size at the end of result allocate to solve the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants