Scaling local table writers for unpartitioned data #13788

gaurav8297 · 2022-08-23T07:56:27Z

Problem: The default value of task_writer_count is 1 which makes insert super slow and doesn't use the cluster resources effectively.

Possible solutions:

Increase the default value to either 4, 8, 16 or 32.
- The problem with this solution is that it can produce many small files for a small amount of data. Therefore it can impact the subsequent read performance.
Implement scaling of local writers adaptively using physicalWrittenBytes and current buffer size.
- This is similar to how we are doing global scaling with scale_writers.
- Here we could have a constraint of min file size using minWriterSize. So, a user could avoid small files thus making the subsequent read faster.
- PR: Scale table writers per task based on throughput #13111

Insert performance with different values of task writers:

1. Single node with no local scaling.

Table inserted: tpcds sf300 lineitem

task_writer_count = 4 => 36:15 mins
task_writer_count = 8 => 18:27 mins
task_writer_count = 16 => 9:22 mins
task_writer_count = 32 => 5:21 mins

2. Single node with local writer scaling.

Table inserted: tpcds sf300 lineitem
writerMinSize: 16MB

max_task_writer_count = 8 => 23.06 mins
max_task_writer_count = 6 => 13:14 mins
max_task_writer_count = 32 => 8:48 mins

Read performance (with small files):

1. Small amount of data inserted over a long time with task_writer_count = 2

config:

Single node with 2 threads during reading
task_writer_count = 2
local writer scaling is disabled
number of rows = ~1billion
total number of files = 586
Avg File size = ~22MB

Query:

select count(orderkey), count(partkey), count(suppkey), count(linenumber), count(quantity), count(extendedprice), count(discount), count(tax), count(returnflag), count(linestatus), count(commitdate), count(receiptdate), count(shipinstruct), count(shipmode), count(comment), count(shipdate) from lineitem_fixed_new_2

Result:

Query 20220822_102559_00002_w6bmx, FINISHED, 1 node
Splits: 659 total, 659 done (100.00%)
10:29 [1.04B rows, 24.9GB] [1.66M rows/s, 40.6MB/s

2. With task_writer_count = 32

config (same as above except):

task_writer_count = 32
total number of files = >8000
Avg File size = ~2MB

Result:

Query 20220822_095540_00001_w6bmx, FINISHED, 1 node
Splits: 9,827 total, 9,827 done (100.00%)
21:21 [1.1B rows, 26.3GB] [856K rows/s, 21MB/s]

So, If we increase the number of small files it can have a huge impact on read performance. By almost 2x based on the above experiment.

Summary

If we increase the default value of task_writer_count, the read performance will get impacted if a user is inserting a small amount of data over a long period of time. This is because there could huge amount of small files in this case. For instance, if someone is inserting 100MB of data every 15 mins with a default of 32 task writers. To solve this one could use the optimize command (which is expensive) at some frequency or we can go ahead with a local scaling approach which is a bit complex but maintain min file size.

cc @dain @sopel39 @raunaqmorarka @electrum

The text was updated successfully, but these errors were encountered:

sopel39 · 2022-08-30T14:02:17Z

Fixed via #13111

gaurav8297 self-assigned this Aug 23, 2022

gaurav8297 added the performance label Aug 23, 2022

sopel39 closed this as completed Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling local table writers for unpartitioned data #13788

Scaling local table writers for unpartitioned data #13788

gaurav8297 commented Aug 23, 2022 •

edited

Loading

sopel39 commented Aug 30, 2022

Scaling local table writers for unpartitioned data #13788

Scaling local table writers for unpartitioned data #13788

Comments

gaurav8297 commented Aug 23, 2022 • edited Loading

Possible solutions:

Insert performance with different values of task writers:

Read performance (with small files):

Summary

sopel39 commented Aug 30, 2022

gaurav8297 commented Aug 23, 2022 •

edited

Loading