Support writing multi files of single partition to improve speed in HDFS storage #396

zuston · 2022-12-09T12:26:51Z

What changes were proposed in this pull request?

Introduce the PooledHdfsShuffleWriteHandler to support writing single partition to multiple HDFS files concurrently.

Why are the changes needed?

As the problem mentioned by #378 (comment), the writing speed of HDFS is too slow and it can't write concurrently. Especially when huge partition exists, this problem will cause other apps slow due to the slight memory.

So the improvement of writing speed is an important factor to flush the huge partition to HDFS quickly.

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

UTs

…in HDFS storage

jerqi · 2022-12-09T13:46:35Z

...age/src/main/java/org/apache/uniffle/storage/handler/impl/PooledHdfsShuffleWriteHandler.java

+      int endPartition,
+      String storageBasePath,
+      String fileNamePrefix,
+      Configuration hadoopConf,


Maybe we can get the parameter concurrency from hadooConf. The hadoopConf we can pass from our client. It will be more flexible.

Yes. Although it’s a little bit strange.

I have added the todo comment to support this in the future.

Could you create some issues for these todo?

Yes. I will do after this PR is merged.

server/src/main/java/org/apache/uniffle/server/ShuffleServerConf.java

codecov-commenter · 2022-12-12T08:11:26Z

Codecov Report

Merging #396 (1b47add) into master (3ec3f41) will decrease coverage by 0.22%.
The diff coverage is 25.00%.

@@             Coverage Diff              @@
##             master     #396      +/-   ##
============================================
- Coverage     58.77%   58.54%   -0.23%     
- Complexity     1602     1607       +5     
============================================
  Files           193      195       +2     
  Lines         10939    11021      +82     
  Branches        955      963       +8     
============================================
+ Hits           6429     6452      +23     
- Misses         4132     4193      +61     
+ Partials        378      376       -2

Impacted Files	Coverage Δ
...org/apache/uniffle/storage/common/HdfsStorage.java	`0.00% <0.00%> (ø)`
.../storage/handler/impl/HdfsShuffleWriteHandler.java	`87.09% <ø> (ø)`
...le/storage/handler/impl/LocalFileWriteHandler.java	`75.51% <ø> (ø)`
...ge/handler/impl/PooledHdfsShuffleWriteHandler.java	`0.00% <0.00%> (ø)`
...rage/request/CreateShuffleWriteHandlerRequest.java	`71.42% <80.00%> (+0.59%)`	⬆️
...org/apache/uniffle/server/ShuffleFlushManager.java	`82.90% <100.00%> (+4.89%)`	⬆️
...a/org/apache/uniffle/server/ShuffleServerConf.java	`99.24% <100.00%> (+0.01%)`	⬆️
...torage/handler/impl/AbstractClientReadHandler.java	`12.00% <0.00%> (-8.00%)`	⬇️
...che/uniffle/client/impl/ShuffleReadClientImpl.java	`88.46% <0.00%> (ø)`
... and 6 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

advancedxy · 2022-12-12T08:52:54Z

Is there any read side related changes should be applied for this PR?

And, if possible, please add an integration test for spark3 with concurrent writer enabled?

zuston · 2022-12-12T09:02:23Z

Is there any read side related changes should be applied for this PR?

There is no need to do any compatible change in read client, because the original logic has covered the different partition files due to the prefix of retry times.

And, if possible, please add an integration test for spark3 with concurrent writer enabled?

It's OK. But the spark client test may be not accurate, because it's hard to control the concurrent write. And I think the integration test is enough and accurate.

advancedxy

The overall logic lgtm, left some minor comments.

...est/common/src/test/java/org/apache/uniffle/test/ShuffleServerConcurrentWriteOfHdfsTest.java

integration-test/common/src/test/java/org/apache/uniffle/test/ShuffleServerWithHdfsTest.java

server/src/main/java/org/apache/uniffle/server/ShuffleServerConf.java

storage/src/main/java/org/apache/uniffle/storage/common/HdfsStorage.java

advancedxy

LGTM

zuston · 2022-12-13T06:03:20Z

Do you have other comments? @jerqi

jerqi

LGTM

… distribute pressure (#452) ### What changes were proposed in this pull request? [Improvement] Read HDFS data files with random sequence to distribute pressure #452 ### Why are the changes needed? In PR #396 to support concurrently writing single partition's data into multiple HDFS files, it's better to randomly read HDFS data files to distribute stress in client side. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UTs

…cy to write in client side (#815) ### What changes were proposed in this pull request? 1. Support specifying per-partition's max concurrency to write in client side ### Why are the changes needed? The PR of #396 has introduced the concurrent HDFS writing for one partition, but the concurrency is determined by the server client. In order to increase flexibility, this PR supports specifying per-partition's max concurrency to write in client side ### Does this PR introduce _any_ user-facing change? Yes. The client conf of `<client_type>.rss.client.max.concurrency.per-partition.write` and `rss.server.client.max.concurrency.limit.per-partition.write` are introduced. ### How was this patch tested? 1. UTs

Support writing different files of single partition to improve speed …

416e1b1

…in HDFS storage

zuston changed the title ~~Support writing multi files of single partition to improve speed in HDFS storage~~ [WIP] Support writing multi files of single partition to improve speed in HDFS storage Dec 9, 2022

zuston mentioned this pull request Dec 9, 2022

[Improvement] Optimize data flushing and memory usage for huge partitions to improve stability #378

Closed

8 tasks

jerqi reviewed Dec 9, 2022

View reviewed changes

fix

8da52bd

advancedxy reviewed Dec 12, 2022

View reviewed changes

server/src/main/java/org/apache/uniffle/server/ShuffleServerConf.java Show resolved Hide resolved

Add tests

c3f5d0d

zuston changed the title ~~[WIP] Support writing multi files of single partition to improve speed in HDFS storage~~ Support writing multi files of single partition to improve speed in HDFS storage Dec 12, 2022

fix

649acd3

zuston mentioned this pull request Dec 12, 2022

[Improvement] Allow split huge data ShuffleDataFlushEvent to multiple small events #398

Open

3 tasks

zuston requested review from advancedxy and jerqi December 12, 2022 08:31

advancedxy reviewed Dec 12, 2022

View reviewed changes

zuston added 2 commits December 13, 2022 10:59

optimize from advancedxy

5c8ee13

add doc

1b47add

zuston requested a review from advancedxy December 13, 2022 03:06

advancedxy approved these changes Dec 13, 2022

View reviewed changes

jerqi approved these changes Dec 13, 2022

View reviewed changes

jerqi merged commit d3aa5dc into apache:master Dec 13, 2022

This was referenced Dec 29, 2022

[Improvement] Read HDFS data files with random sequence to distribute pressure #451

Closed

[ISSUE-451][Improvement] Read HDFS data files with random sequence to distribute pressure #452

Merged

zuston mentioned this pull request Apr 12, 2023

[#414] feat(client): support specifying per-partition's max concurrency to write in client side #815

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support writing multi files of single partition to improve speed in HDFS storage #396

Support writing multi files of single partition to improve speed in HDFS storage #396

zuston commented Dec 9, 2022 •

edited

Loading

jerqi Dec 9, 2022

zuston Dec 9, 2022

zuston Dec 12, 2022

jerqi Dec 12, 2022

zuston Dec 12, 2022 •

edited

Loading

codecov-commenter commented Dec 12, 2022 •

edited

Loading

advancedxy commented Dec 12, 2022

zuston commented Dec 12, 2022

advancedxy left a comment

advancedxy left a comment

zuston commented Dec 13, 2022

jerqi left a comment

Support writing multi files of single partition to improve speed in HDFS storage #396

Support writing multi files of single partition to improve speed in HDFS storage #396

Conversation

zuston commented Dec 9, 2022 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

jerqi Dec 9, 2022

Choose a reason for hiding this comment

zuston Dec 9, 2022

Choose a reason for hiding this comment

zuston Dec 12, 2022

Choose a reason for hiding this comment

jerqi Dec 12, 2022

Choose a reason for hiding this comment

zuston Dec 12, 2022 • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Dec 12, 2022 • edited Loading

Codecov Report

advancedxy commented Dec 12, 2022

zuston commented Dec 12, 2022

advancedxy left a comment

Choose a reason for hiding this comment

advancedxy left a comment

Choose a reason for hiding this comment

zuston commented Dec 13, 2022

jerqi left a comment

Choose a reason for hiding this comment

zuston commented Dec 9, 2022 •

edited

Loading

zuston Dec 12, 2022 •

edited

Loading

codecov-commenter commented Dec 12, 2022 •

edited

Loading