We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
多谢 @xinghai-sun 的反馈,目前的日志存储/查看方式有很多不方便的地方,线下和 @typhoonzero 也讨论了一下,记录如下:
paddlecloud logs
job被kill后日志也会丢失,无法复查现场
目前训练任务的日志是使用Docker Container原生的存储方式,Container被kill掉之后容器日志也会被清除
paddlecloud logs 只能查看部分日志内容
经查集群Docker Log Driver被默认配置成了journald,而Journald中的日志定期回滚(时间很短)
journald
/pfs/dlnel/home/<user>/jobs/<job-name>/logs
/pfs/dlnel/home/<user>/jobs/<job-name>/logs/<pod-id>.log
head/tail
pcloud logs
tail
也请 @xinghai-sun @wanghaoshuang 帮看下此方法是否满足训练需求。
The text was updated successfully, but these errors were encountered:
太棒了,我觉得这样方便多了!不再需要手动保存日志了!
Sorry, something went wrong.
helinwang
xinghai-sun
wanghaoshuang
gongweibao
typhoonzero
No branches or pull requests
多谢 @xinghai-sun 的反馈,目前的日志存储/查看方式有很多不方便的地方,线下和 @typhoonzero 也讨论了一下,记录如下:
目前使用不方便的地方:
paddlecloud logs
只能查看部分日志内容原因:
目前训练任务的日志是使用Docker Container原生的存储方式,Container被kill掉之后容器日志也会被清除
经查集群Docker Log Driver被默认配置成了
journald
,而Journald中的日志定期回滚(时间很短)解决方法
存储
/pfs/dlnel/home/<user>/jobs/<job-name>/logs
/pfs/dlnel/home/<user>/jobs/<job-name>/logs/<pod-id>.log
查看
head/tail
, 同时也支持将文件download到本地进行查看。pcloud logs
命令调用PFS的tail
接口查看文件。也请 @xinghai-sun @wanghaoshuang 帮看下此方法是否满足训练需求。
The text was updated successfully, but these errors were encountered: