You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Enable periodic cleanup of work_dir directories in ballista executor which introduce 3 args executor_cleanup_enable : Enable periodic cleanup of work_dir directories. executor_cleanup_interval: Controls the interval in seconds , which the worker cleans up old job dirs on the local machine. executor_cleanup_ttl: Number of seconds to retain job work_dir on each executor. This is a Time To Live and should depend on the amount of available disk space you have.
Describe the solution you'd like
Executor periodic spawn a task to clean work_dir, if all the files in job_dir not modified in executor_cleanup_ttl seconds, it will be deleted.
Describe alternatives you've considered
Scheduler send rpc call to delete files when job done.
preemptively @houqp
Sorry for my confusion , You mean if a job has 3 stage, when stage3 is running, we can delete stage 1 first?
IMO, I think when a SQL is finished, all the immediate shuffle data can be cleared except for the result data.
@houqp@mingmwang It sounds very reasonable , i thinks this will handles some error cases for robustness.
IMHO, keep both of them and create a separate issue to capture for future improvement (maybe after separate shuffle data and result data).
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Enable periodic cleanup of work_dir directories in ballista executor which introduce 3 args
executor_cleanup_enable
: Enable periodic cleanup of work_dir directories.executor_cleanup_interval
: Controls the interval in seconds , which the worker cleans up old job dirs on the local machine.executor_cleanup_ttl
: Number of seconds to retain job work_dir on each executor. This is a Time To Live and should depend on the amount of available disk space you have.Describe the solution you'd like
Executor periodic spawn a task to clean work_dir, if all the files in
job_dir
not modified inexecutor_cleanup_ttl
seconds, it will be deleted.Describe alternatives you've considered
Scheduler send rpc call to delete files when job done.
Additional context
apache/datafusion-ballista#9
The text was updated successfully, but these errors were encountered: