-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT] How to configure spark and flink to write mor tables using bucket indexes? #11946
Comments
Did you have chance to check the thread dump of the operators? |
"consistent_bucket_write: test.fin_ipr_inmaininfo_test (1/2)#0" Id=89 TIMED_WAITING on java.util.LinkedList@37d9fd7 Is hdfs write performance problematic? If I use simple index for spark, flink uses bucket index very quickly 1k-2krecord/s |
The performance should be very near for consistent hashing and simple hashing, but from the stacktrace, it looks like the appending to files takes time. |
So how can this be optimized? This speed is too slow |
@beyond1920 can you help with the performance issue here? |
@xiaofan2022 Did you schedule the clustering for expanding the consistent hashing ring already? Did you check the |
hdfs dfs -cat hdfs://nameservice1/apps/spark/warehouse/test.db/file_test/.hoodie/.bucket_index/consistent_hashing_metadata/00000000000000.hashing_meta | grep "value" | wc -l result=>>>256 |
So you have 256 initial buckets? |
Yes, I set up 'hoodie. Bucket. Index. Max. Num. Buckets' =' 32 ', |
yeah, let's figure out the reason, too many buckets would not perform well for streaming write. |
@xiaofan2022 Any updates on this ticket? Were you able to find out the reason why we see 256 buckets? |
I first create tables through spark and import full data. Then flink updates incremental data in real time, but the default bucket in spark is 4 |
I want to use flink and spark to write to the mor table, and use bucket CONSISTENT_HASHING for the index, but I find that spark is very fast to write the full amount and flink is very slow(flink write 100record/s) to write increments.
spark sql:
The text was updated successfully, but these errors were encountered: