-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Survey] Spark (Hadoop) Support via S3 API #595
Comments
Write Performance IssueThe current write to file with
|
Write IssueFrom time to time, the write could be failed LogsNormal Error |
Read IssueRead requests are issued as Range Get (as tasks could be partitioned to multiple worker) |
Test with Spark 2.1.0 + Hadoop 3.0.0-alpha2To test with new version of Hadoop which includes performance improvement of S3 support Logsspark_210_hadoop_3a2_write.access.log.gz Number of requests decreases from >3000 to ~600 for one small write
|
Small File Testing** Data Set ** Hadoop Setting1x Name node (+Secondary Name Node) Read from Hadoop
Duration: 9s
Read from LeoFS
Duration: 9s
|
With a large data set (500 dirs x 1600 files), it took too long to list the number of objects It is quite difficult to work with common use pattern e.g. sc.textfile("s3a://test_bucket/*") |
@windkit yes it's unavoidable in case sc.textfile("s3a://test_bucket/*") until #548 fixed. |
@mocchira Yes, that's the way I am trying to work around the bottleneck. Will update here later |
Gateway Logs (Added Copy Log at info) Storage Logs (Added prefix search log at info) |
|
Issue SummaryDirectory deletion is done asynchronously, if the directory is re-created afterwards, incorrect deletion will happen, similar to the bucket case #150 spark_fail_sinfo.txt Related PR |
Issue fixed with |
With s3a:// available, it may be possible to use
TODO
|
Description
As LeoFS is good at handling small files (images, logs, ...), it may fill in the missing part of HDFS (which does not work well with small files)
Environment
Spark 1.6.1 (Hadoop 2.6.2)
Extra Libraries
Testing
The text was updated successfully, but these errors were encountered: