Can we please add Timestamps to Each Epoch Checkpoint DEBUG line ? #665

coffeecodeconverter · 2024-11-30T13:48:04Z

can we please add a timestamp to the DEBUG line for each epoch checkpoint?
so instead of just having this:

DEBUG:fsspec.local:open file: /pipertts/train-me/lightning_logs/version_5/checkpoints/epoch=804-step=994862.ckpt
DEBUG:fsspec.local:open file: /pipertts/train-me/lightning_logs/version_5/checkpoints/epoch=809-step=994902.ckpt
DEBUG:fsspec.local:open file: /pipertts/train-me/lightning_logs/version_5/checkpoints/epoch=814-step=994942.ckpt

can we please have it read as:

DEBUG:fsspec.local: YYYY-MM-DD HH:mm:ss.uuu - open file: /pipertts/train-me/lightning_logs/version_5/checkpoints/epoch=804-step=994862.ckpt
DEBUG:fsspec.local: YYYY-MM-DD HH:mm:ss.uuu - open file: /pipertts/train-me/lightning_logs/version_5/checkpoints/epoch=809-step=994902.ckpt
DEBUG:fsspec.local: YYYY-MM-DD HH:mm:ss.uuu - open file: /pipertts/train-me/lightning_logs/version_5/checkpoints/epoch=814-step=994942.ckpt

The text was updated successfully, but these errors were encountered:

coffeecodeconverter · 2024-11-30T15:39:42Z

ive done some more digging, and it looks like its part of the pytorch lightning packages, so not really pipers issue.
nonetheless, ive managed to achieve the above though,
plus some other minor tweaks that make the output more readable.

ive tweaked 3 x files
/PiperTTS/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py
/PiperTTS/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py
/PiperTTS/.venv/lib/python3.10/site-packages/fsspec/implementations/local.py

.
.
.
.
firstly, in checkpoint_connector.py
(/PiperTTS/.venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py)
to give it a timestamp, and a line break between the message and the file path,
ive amended this
(right near the top)

from pytorch_lightning.utilities.migration.utils import _pl_migrate_checkpoint
from pytorch_lightning.utilities.rank_zero import rank_zero_deprecation, rank_zero_info, rank_zero_warn

with this
(to import datetime)

from pytorch_lightning.utilities.migration.utils import _pl_migrate_checkpoint
from pytorch_lightning.utilities.rank_zero import rank_zero_deprecation, rank_zero_info, rank_zero_warn

from datetime import datetime

then replaced this output message:

rank_zero_info(f"Restored all states from the checkpoint file at {self.resume_checkpoint_path}")

with this:
(gives it a timestamp, and a clear start point for the training)

timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
rank_zero_info(f"\n\n[{timestamp}]\nRestored all states from the checkpoint file at \n{self.resume_checkpoint_path}\n------------------------------------------\nStarting Training...\n------------------------------------------")

.
.
.
.
secondly, in fit_loop.py
(/PiperTTS/.venv/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py )
to give it a timestamp
ive amended this
(right near the top)

from pytorch_lightning.utilities.rank_zero import rank_zero_debug, rank_zero_info, rank_zero_warn
from pytorch_lightning.utilities.signature_utils import is_param_in_hook_signature

with this
(to import datetime)

from pytorch_lightning.utilities.rank_zero import rank_zero_debug, rank_zero_info, rank_zero_warn
from pytorch_lightning.utilities.signature_utils import is_param_in_hook_signature

from datetime import datetime

then replaced this output message:

rank_zero_info(f"`Trainer.fit` stopped: `max_epochs={self.max_epochs!r}` reached.")

with this:
(gives it a timestamp, and a clear finish point for the training)

timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
rank_zero_info(f"\n---------------------------------------------------------------------\n[{timestamp}] - `Trainer.fit` stopped: `max_epochs={self.max_epochs!r}` reached.\n---------------------------------------------------------------------")

.
.
.
.
thirdly, in local.py
(/PiperTTS/.venv/lib/python3.10/site-packages/fsspec/implementations/local.py)
ive amended this :
(added the datetime import just above the class, and added timestamp to the logger)

class LocalFileOpener(io.IOBase):
    def __init__(
        self, path, mode, autocommit=True, fs=None, compression=None, **kwargs
    ):
        logger.debug("open file: %s", path)
        self.path = path
        self.mode = mode
        self.fs = fs
        self.f = None
        self.autocommit = autocommit
        self.compression = get_compression(path, compression)
        self.blocksize = io.DEFAULT_BUFFER_SIZE
        self._open()

with this

from datetime import datetime

class LocalFileOpener(io.IOBase):
    def __init__(
        self, path, mode, autocommit=True, fs=None, compression=None, **kwargs
    ):
        timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
        logger.debug(f" [{timestamp}] open: %s", path")
        self.path = path
        self.mode = mode
        self.fs = fs
        self.f = None
        self.autocommit = autocommit
        self.compression = get_compression(path, compression)
        self.blocksize = io.DEFAULT_BUFFER_SIZE
        self._open()

.
.
.
.
which instead of this kind of output:

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
DEBUG:fsspec.local:open file: /home/train-me/lightning_logs/version_4/hparams.yaml
Restored all states from the checkpoint file at ~/train-me/lightning_logs/version_3/checkpoints/epoch=729-step=994262.ckpt
/home/PiperTTS/.venv/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py:109: UserWarning: Total length of `DataLoader` across ranks is zero. Please make sure this was your intention.
  rank_zero_warn(
DEBUG:fsspec.local:open file: /home/train-me/lightning_logs/version_4/checkpoints/epoch=734-step=994302.ckpt
DEBUG:fsspec.local:open file: /home/train-me/lightning_logs/version_4/checkpoints/epoch=739-step=994342.ckpt
`Trainer.fit` stopped: `max_epochs=800` reached.

now produces this output in the console:

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
DEBUG:fsspec.local: [2024-11-30 15:08:50] open: /home/train-me/lightning_logs/version_16/hparams.yaml

[2024-11-30 15:08:50]
Restored all states from the checkpoint file at
~/train-me/lightning_logs/version_15/checkpoints/epoch=939-step=995942.ckpt
------------------------
Starting Training...
------------------------
/home/PiperTTS/.venv/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py:109: UserWarning: Total length of `DataLoader` across ranks is zero. Please make sure this was your intention.
  rank_zero_warn(
DEBUG:fsspec.local: [2024-11-30 15:12:31] open: /home/train-me/lightning_logs/version_16/checkpoints/epoch=944-step=995982.ckpt
DEBUG:fsspec.local: [2024-11-30 15:14:54] open: /home/train-me/lightning_logs/version_16/checkpoints/epoch=949-step=996022.ckpt

------------------------------------------------------------------------------
[2024-11-30 15:14:54] - `Trainer.fit` stopped: `max_epochs=950` reached.
------------------------------------------------------------------------------

coffeecodeconverter closed this as completed Nov 30, 2024

coffeecodeconverter closed this as not planned Won't fix, can't repro, duplicate, stale Nov 30, 2024

coffeecodeconverter mentioned this issue Dec 1, 2024

Windows support #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we please add Timestamps to Each Epoch Checkpoint DEBUG line ? #665

Can we please add Timestamps to Each Epoch Checkpoint DEBUG line ? #665

coffeecodeconverter commented Nov 30, 2024

coffeecodeconverter commented Nov 30, 2024 •

edited

Loading

Can we please add Timestamps to Each Epoch Checkpoint DEBUG line ? #665

Can we please add Timestamps to Each Epoch Checkpoint DEBUG line ? #665

Comments

coffeecodeconverter commented Nov 30, 2024

coffeecodeconverter commented Nov 30, 2024 • edited Loading

coffeecodeconverter commented Nov 30, 2024 •

edited

Loading