-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log dir tidy #4836
Log dir tidy #4836
Conversation
797686c
to
bddc613
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few minor suggestions.
Adding a pointless comment so that I get a notification when this gets merged, so that I can start work on cylc-review. |
BTW you can [un]subscribe to an issue/pr from the pane on the right hand side (under milestone), especially useful for tracking issues in other projects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, one small efficiency comment:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I think I found one issue in testing, cylc cat-log
has a feature for setting the log rotation number which I think needs to be updated to the new filename pattern?:
cylc cat-log <workflow> -r 1
Reviewing I've had a couple of ideas (whoops):
1) Use log number in log filenames
Here is the log directory for a workflow I have run, then consequently restarted, the log files have "rolled over" three times:
`-- scheduler
|-- 20220509T111659+01-start.log
|-- 20220509T111701+01-start.log
|-- 20220509T111704+01-start.log
|-- 20220509T111722+01-restart.log
|-- 20220509T111732+01-restart.log
`-- log -> 20220509T111732+01-restart.log
Internally we keep a record of the "log number" which gets incremented with each "rollover", you can find this near the top of each file e.g:
Run: (re)start=1 log=1
We might be able to template this into the filename e.g:
`-- scheduler
|-- 01-20220509T111659+01-start.log
|-- 02-20220509T111701+01-start.log
|-- 03-20220509T111704+01-start.log
|-- 04-20220509T111722+01-restart.log
|-- 05-20220509T111732+01-restart.log
`-- log -> 05-20220509T111732+01-restart.log
Or:
`-- scheduler
|-- 20220509T111659+01-01-start.log
|-- 20220509T111701+01-02-start.log
|-- 20220509T111704+01-03-start.log
|-- 20220509T111722+01-04-restart.log
|-- 20220509T111732+01-05-restart.log
`-- log -> 20220509T111732+01-05-restart.log
Or possibly even (after all there are timestamps in the log [by default]):
`-- scheduler
|-- 01-start-01.log
|-- 02-start-01.log
|-- 03-start-01.log
|-- 04-restart-02.log
|-- 05-restart-02.log
`-- log -> 05-restart-02.log
Ideas?
Would be good if we could differentiate a rollover from a restart from the filename as workflows are often "restarted in anger" when problems occur.
2) Write the rsync command to the file install log
Might be nice to log the rsync command at the top of the fileinstall log. Makes it clearer what fileinstall means and how to interpret the stdout e.g:
diff --git a/cylc/flow/task_remote_mgr.py b/cylc/flow/task_remote_mgr.py
index 0c3a78768..be72ceb2e 100644
--- a/cylc/flow/task_remote_mgr.py
+++ b/cylc/flow/task_remote_mgr.py
@@ -56,6 +56,7 @@ from cylc.flow.platforms import (
)
from cylc.flow.remote import construct_rsync_over_ssh_cmd, construct_ssh_cmd
from cylc.flow.subprocctx import SubProcContext
+from cylc.flow.util import format_cmd
from cylc.flow.wallclock import get_current_time_string
from cylc.flow.workflow_files import (
KeyInfo,
@@ -532,13 +533,14 @@ class TaskRemoteMgr:
Path(install_log_path).parent.mkdir(parents=True, exist_ok=True)
with open(install_log_path, 'a') as install_log:
install_log.write(
- 'File installation information for '
- f'{install_target}:\n{ctx.out}'
+ f'$ {format_cmd(ctx.cmd, maxlen=80)}'
+ '\n\n### STDOUT:'
+ f'\n{ctx.out}'
)
if ctx.err:
install_log.write(
- '\n File installation error for '
- f'{install_target}:\n{ctx.err}'
+ '\n\n### STDERR:'
+ f'\n{ctx.err}'
)
if ctx.ret_code == 0:
# Both file installation and remote init success
That produces files like this:
$ rsync --delete \
--rsh=ssh -oBatchMode=yes -oConnectTimeout=10 \
--include=/.service/ --include=/.service/server.key -a \
--checksum --out-format=%o %n%L --no-t --exclude=log \
--exclude=share --exclude=work --include=/app/*** \
--include=/bin/*** --include=/etc/*** --include=/lib/*** \
--exclude=* ~/cylc-run/remotes/run4/ \
myhost:$HOME/cylc-run/remotes/run4/
### STDOUT:
send .service/server.key
### STDERR:
some stderr...
Thanks for the review.
I'll have a look into this now.
Yes, good idea, I'll give this a whirl.
As it seems tidier to me, like you said we have timestamp in the files.
Yes, love this idea. That would be super helpful for debugging. |
👍 mine too! |
One typing issue causing trouble:
|
Have done a quick functional review, looks good. The log numbers increment correctly, the housekeeping is working well. One small thing, the start number in the filename is one higher than the start number in the log itself, easy fix:
|
The config logs look ok, however, I think the log number here is not needed? Just need the start number?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review.
I am a little confused by the numbers e.g. 03-restart-04.log
. What do the 2 numbers mean? It doesn't match the PR description
Filename format change for scheduler log:
<timestamp>-<start/restart>.log
Review feedback OS
Tidy
Reduce the changes to scheduler.py vs scheduler_cli.py introduced in cylc#4836 Fix tests Something about the logs filename change has meant the contact file takes slightly longer to be created on start Shorten log handler name
* master: (30 commits) Log dir tidy (cylc#4836) Cut down on back-compat warnings, plus other tidying (cylc#4943) Validate platform settings (background job runner) (cylc#4938) clean: push error message to stderr Update cylc/flow/id_match.py Fix unintended directory stripping during cylc install (cylc#4931) stop cylc validate swallowing useful errors (cylc#4936) Improve config reference docs (cylc#4916) glblcfg: increase default rolling archive length Fix job state with pre-submitted failure Update tests/functional/cylc-reinstall/02-failures.t reinstall: require workflow ID argument play: upgrade --start-task's specified in legacy format (cylc#4925) expand schema docstring internally (cylc#4926) Added a new task filtering test. Add comment [skip ci] Fix optparse Options class for std options (cylc#4919) uid: warn if the run number is provided as a cycle Tweak prev. Fix task filtering. ...
Follow-up to cylc#4836
Follow-up to cylc#4836
Follow-up to cylc#4836
Fix failure to clean on remote host with leftover contact file * Fix scan for recent log dir changes * Follow-up to #4836 * GH Actions: pass if codecov upload fails (codecov upload is flaky) * Update changelog
These changes close #4802
Log directory renaming:
log/flow-config
=>log/config
log/workflow
=>log/scheduler
New Directory for remote install logs:
log/remote-install/
Filenames for remote file installation are now on an install-target basis with filename format:
<timestamp>-<start/restart/reload>-<install-target>.log
Filename format change for scheduler log:
<log number>-<start/restart>-<start_number>.log
(log/scheduler/log remains a symlink to the current log)
Filename format change for config log:
After a lengthy discussion UK end,
<log number>-<start/restart/reload>-<start_number>.log
The reload start number should correspond to the current start number when the reload took place.
Requirements check-list
CONTRIBUTING.md
and added my name as a Code Contributor.setup.cfg
andconda-environment.yml
.