Enable periodic cleanup of work_dir directories in ballista executor #1780

Ted-Jiang · 2022-02-08T05:40:30Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Enable periodic cleanup of work_dir directories in ballista executor which introduce 3 args
executor_cleanup_enable : Enable periodic cleanup of work_dir directories.
executor_cleanup_interval: Controls the interval in seconds , which the worker cleans up old job dirs on the local machine.
executor_cleanup_ttl: Number of seconds to retain job work_dir on each executor. This is a Time To Live and should depend on the amount of available disk space you have.

Describe the solution you'd like
Executor periodic spawn a task to clean work_dir, if all the files in job_dir not modified in executor_cleanup_ttl seconds, it will be deleted.

Describe alternatives you've considered
Scheduler send rpc call to delete files when job done.

Additional context
apache/datafusion-ballista#9

The text was updated successfully, but these errors were encountered:

houqp · 2022-02-08T05:59:05Z

On top of a background GC task, would it make sense to also clean up job dirs on job completion preemptively?

mingmwang · 2022-02-10T11:31:02Z

preemptively
@houqp
Sorry for my confusion , You mean if a job has 3 stage, when stage3 is running, we can delete stage 1 first?

IMO, I think when a SQL is finished, all the immediate shuffle data can be cleared except for the result data.

Ted-Jiang · 2022-02-15T03:54:02Z

preemptively
@houqp
Sorry for my confusion , You mean if a job has 3 stage, when stage3 is running, we can delete stage 1 first?

IMO, I think when a SQL is finished, all the immediate shuffle data can be cleared except for the result data.

@houqp @mingmwang It sounds very reasonable , i thinks this will handles some error cases for robustness.
IMHO, keep both of them and create a separate issue to capture for future improvement (maybe after separate shuffle data and result data).

Ted-Jiang added the enhancement New feature or request label Feb 8, 2022

Ted-Jiang mentioned this issue Feb 8, 2022

Enable periodic cleanup of work_dir directories in ballista executor #1783

Merged

alamb closed this as completed in #1783 Mar 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable periodic cleanup of work_dir directories in ballista executor #1780

Enable periodic cleanup of work_dir directories in ballista executor #1780

Ted-Jiang commented Feb 8, 2022

houqp commented Feb 8, 2022

mingmwang commented Feb 10, 2022

Ted-Jiang commented Feb 15, 2022

Enable periodic cleanup of work_dir directories in ballista executor #1780

Enable periodic cleanup of work_dir directories in ballista executor #1780

Comments

Ted-Jiang commented Feb 8, 2022

houqp commented Feb 8, 2022

mingmwang commented Feb 10, 2022

Ted-Jiang commented Feb 15, 2022