You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One possible idea for documentation or support for making jobs faster would be discussing various file output committers.
Particularly on object storage and as of Spark 3, using the default file output committer with s3a is going to result in a double data write and sad times.
Even registering s3a and friends via the right configs is something that I've had come up for a number of users, but that might be a little out of scope for the projects users.
I could try to contribute on the subject or just otherwise happy to throw it out there. Admittedly 9/10 on S3 if you just stick to the new cloud committers you're good. 😛
The text was updated successfully, but these errors were encountered:
In the context of the flow chart as it is, I'm not sure where this would fit.
Maybe a write phase slow down guidance box? This is a performance "regression" I've seen when helping people migrate from HDFS to S3 but I'm not sure it fits fully into the context of the docs flow.
I think maybe adding a node for slow writes and then having a sub node for s3 would be a great way to expose this. If you want to write the docs I'm happy to integrate it into the flowchart (I know the flowchart syntax is a little funky).
One possible idea for documentation or support for making jobs faster would be discussing various file output committers.
Particularly on object storage and as of Spark 3, using the default file output committer with s3a is going to result in a double data write and sad times.
Even registering s3a and friends via the right configs is something that I've had come up for a number of users, but that might be a little out of scope for the projects users.
I could try to contribute on the subject or just otherwise happy to throw it out there. Admittedly 9/10 on S3 if you just stick to the new cloud committers you're good. 😛
The text was updated successfully, but these errors were encountered: