Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorboard #138

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Tensorboard #138

wants to merge 10 commits into from

Conversation

kermitt2
Copy link
Owner

@kermitt2 kermitt2 commented May 2, 2022

This PR setup DeLFT to use tensorboard: logs, callbacks...

The idea is to cover the interesting metrics so that everything is visualized on Tensorboard.

How to use: basically nothing special, start a training and launch:

> tensorboard --logdir logs/summaries

(tensorboard is already installed)

Then open http://localhost:6006/

Screenshot from 2022-05-02 14-59-03

TODO:

  • add more metrics
  • see non-scalar views

@kermitt2 kermitt2 self-assigned this May 2, 2022
@kermitt2
Copy link
Owner Author

kermitt2 commented May 2, 2022

Tensorboard generates a ridiculous amount of event logs... more than 1.2GB of logs per epoch only for a basic transformer model. For instance after 4 epochs:

$ du -sh logs/summaries/20220502-160051/train
5.1G	logs/summaries/20220502-160051/train

Given that effect (like with a 10-fold cross-validation training), the tensorflow callbacks need to be defined at the application level, so that it is a user choice and not systematically used by the core library.

@@ -870,10 +870,12 @@ def __init__(self, config, ntags=None, load_pretrained_weights=True, local_path:
x = Concatenate()([text_embedding_layer, features_embedding_out])
x = Dropout(config.dropout)(x)

'''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this intentional?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope I was testing features with architecture having a transformer layer, because it's not working as expected.
I am still working on this branch, I should have put a draft flag !

@lfoppiano
Copy link
Collaborator

Looks good.

Indeed, this tool can fill up the disk space faster than docker... 🎉

@kermitt2 kermitt2 marked this pull request as draft May 10, 2022 09:13
@lfoppiano
Copy link
Collaborator

FYI 1 run of 10-fold crossvalidation weight ... 82GB 😂

(base) [lfoppian0@sakura02 summaries]$ ls -lh
total 4.0K
drwxr-sr-x 4 lfoppian0 tdm 4.0K May 10 12:17 20220510-121634
(base) [lfoppian0@sakura02 summaries]$ du -hs
82G	.

@kermitt2 kermitt2 marked this pull request as ready for review February 12, 2023 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants