Skip to content

Commit

Permalink
Add WandbLogger.v5 (#21)
Browse files Browse the repository at this point in the history
* Refactor `WandbLogger` code
Add `v5` to support custom stats logging

* Add new param to README

* Add docstrings, fix type hints and default params, update README
  • Loading branch information
shadeMe authored Jan 4, 2023
1 parent 5618829 commit 2f7d00d
Show file tree
Hide file tree
Showing 4 changed files with 327 additions and 244 deletions.
30 changes: 18 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,14 +50,19 @@ wandb login

### Usage

`spacy.WandbLogger.v4` is a logger that sends the results of each training step
`spacy.WandbLogger.v5` is a logger that sends the results of each training step
to the dashboard of the [Weights & Biases](https://www.wandb.com/) tool. To use
this logger, Weights & Biases should be installed, and you should be logged in.
The logger will send the full config file to W&B, as well as various system
information such as memory utilization, network traffic, disk IO, GPU
statistics, etc. This will also include information such as your hostname and
operating system, as well as the location of your Python executable.

`spacy.WandbLogger.v4` and below automatically call the [default console logger](https://spacy.io/api/top-level#ConsoleLogger).
However, starting with `spacy.WandbLogger.v5`, console logging must be activated
through the use of the [ChainLogger](#chainlogger). This allows the user to configure
the console logger's parameters according to their preferences.

**Note** that by default, the full (interpolated)
[training config](https://spacy.io/usage/training#config) is sent over to the
W&B dashboard. If you prefer to **exclude certain information** such as path
Expand All @@ -70,23 +75,24 @@ on your local system.

```ini
[training.logger]
@loggers = "spacy.WandbLogger.v4"
@loggers = "spacy.WandbLogger.v5"
project_name = "monitor_spacy_training"
remove_config_values = ["paths.train", "paths.dev", "corpora.train.path", "corpora.dev.path"]
log_dataset_dir = "corpus"
model_log_interval = 1000
```

| Name | Type | Description |
| ---------------------- | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `project_name` | `str` | The name of the project in the Weights & Biases interface. The project will be created automatically if it doesn't exist yet. |
| `remove_config_values` | `List[str]` | A list of values to exclude from the config before it is uploaded to W&B (default: `[]`). |
| `model_log_interval` | `Optional[int]` | Steps to wait between logging model checkpoints to the W&B dasboard (default: `None`). Added in `spacy.WandbLogger.v2`. |
| `log_dataset_dir` | `Optional[str]` | Directory containing the dataset to be logged and versioned as a W&B artifact (default: `None`). Added in `spacy.WandbLogger.v2`. |
| `run_name` | `Optional[str]` | The name of the run. If you don't specify a run name, the name will be created by the `wandb` library (default: `None`). Added in `spacy.WandbLogger.v3`. |
| `entity` | `Optional[str]` | An entity is a username or team name where you're sending runs. If you don't specify an entity, the run will be sent to your default entity, which is usually your username (default: `None`). Added in `spacy.WandbLogger.v3`. |
| `log_best_dir` | `Optional[str]` | Directory containing the best trained model as saved by spaCy (by default in `training/model-best`), to be logged and versioned as a W&B artifact (default: `None`). Added in `spacy.WandbLogger.v4`. |
| `log_latest_dir` | `Optional[str]` | Directory containing the latest trained model as saved by spaCy (by default in `training/model-latest`), to be logged and versioned as a W&B artifact (default: `None`). Added in `spacy.WandbLogger.v4`. |
| Name | Type | Description |
| ---------------------- | --------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `project_name` | `str` | The name of the project in the Weights & Biases interface. The project will be created automatically if it doesn't exist yet. |
| `remove_config_values` | `List[str]` | A list of values to exclude from the config before it is uploaded to W&B (default: `[]`). |
| `model_log_interval` | `Optional[int]` | Steps to wait between logging model checkpoints to the W&B dasboard (default: `None`). Added in `spacy.WandbLogger.v2`. |
| `log_dataset_dir` | `Optional[str]` | Directory containing the dataset to be logged and versioned as a W&B artifact (default: `None`). Added in `spacy.WandbLogger.v2`. |
| `entity` | `Optional[str]` | An entity is a username or team name where you're sending runs. If you don't specify an entity, the run will be sent to your default entity, which is usually your username (default: `None`). Added in `spacy.WandbLogger.v3`. |
| `run_name` | `Optional[str]` | The name of the run. If you don't specify a run name, the name will be created by the `wandb` library (default: `None`). Added in `spacy.WandbLogger.v3`. |
| `log_best_dir` | `Optional[str]` | Directory containing the best trained model as saved by spaCy (by default in `training/model-best`), to be logged and versioned as a W&B artifact (default: `None`). Added in `spacy.WandbLogger.v4`. |
| `log_latest_dir` | `Optional[str]` | Directory containing the latest trained model as saved by spaCy (by default in `training/model-latest`), to be logged and versioned as a W&B artifact (default: `None`). Added in `spacy.WandbLogger.v4`. |
| `log_custom_stats` | `Optional[List[str]]` | A list of regular expressions that will be applied to the info dictionary passed to the logger (default: `None`). Statistics and metrics that match these regexps will be automatically logged. Added in `spacy.WandbLogger.v5`. |

## MLflowLogger

Expand Down
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ python_requires = >=3.6

[options.entry_points]
spacy_loggers =
spacy.WandbLogger.v5 = spacy_loggers.wandb:wandb_logger_v5
spacy.WandbLogger.v4 = spacy_loggers.wandb:wandb_logger_v4
spacy.WandbLogger.v3 = spacy_loggers.wandb:wandb_logger_v3
spacy.WandbLogger.v2 = spacy_loggers.wandb:wandb_logger_v2
Expand Down
1 change: 1 addition & 0 deletions spacy_loggers/tests/test_registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
("loggers", "spacy.WandbLogger.v2"),
("loggers", "spacy.WandbLogger.v3"),
("loggers", "spacy.WandbLogger.v4"),
("loggers", "spacy.WandbLogger.v5"),
("loggers", "spacy.MLflowLogger.v1"),
("loggers", "spacy.ClearMLLogger.v1"),
("loggers", "spacy.ChainLogger.v1"),
Expand Down
Loading

0 comments on commit 2f7d00d

Please sign in to comment.