-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature](step:one)Add arrow format #161
Conversation
private RecordBatch(Iterator<InternalRow> iterator, String format, String sep, byte[] delim, | ||
private VectorSchemaRoot arrowRoot = null; | ||
|
||
private int arrowBatchSize = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this variable be configured? Or is this the best performance value obtained through testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the number obtained after comparing the performance of arrows when testing the performance.
However, there may be other scenarios, and perhaps a configuration item can be added
spark-doris-connector/src/main/scala/org/apache/spark/sql/doris/spark/ArrowSchemaUtils.scala
Show resolved
Hide resolved
spark-doris-connector/src/main/scala/org/apache/spark/sql/doris/spark/ArrowSchemaUtils.scala
Show resolved
Hide resolved
spark-doris-connector/src/main/java/org/apache/doris/spark/load/DorisStreamLoad.java
Show resolved
Hide resolved
There are some code conflicts, please fix them, thank you |
a9d37eb
to
29912e8
Compare
29912e8
to
94d7bb6
Compare
@@ -72,7 +72,7 @@ | |||
<spark.major.version>3.1</spark.major.version> | |||
<scala.version>2.12</scala.version> | |||
<libthrift.version>0.16.0</libthrift.version> | |||
<arrow.version>5.0.0</arrow.version> | |||
<arrow.version>13.0.0</arrow.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as doris
byte[] bytes = new byte[1]; | ||
int read = read(bytes, 0, 1); | ||
if (read < 0) { | ||
return -1; | ||
} else { | ||
return bytes[0]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use same logic from read(x,x,x)
@@ -31,10 +31,8 @@ | |||
import java.nio.charset.StandardCharsets; | |||
import java.util.HashMap; | |||
import java.util.Map; | |||
import java.util.Objects; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seem useless, delete...
import java.util.stream.Collectors; | ||
import java.util.stream.IntStream; | ||
import java.util.stream.Stream; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seem useless, delete...
|
||
object ArrowSchemaUtils { | ||
def toArrowSchema(schema: StructType, timeZoneId: String): Schema = { | ||
ArrowUtils.toArrowSchema(schema, timeZoneId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a direct call to the Spark function, because ArrowUtils
is private in org.apache.spark.sql
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Proposed changes
Issue Number: close #xxx
Problem Summary:
Describe the overview of changes.
Checklist(Required)
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...