Skip to content

Commit

Permalink
Merge pull request #311 from 1190303125/2.0.0
Browse files Browse the repository at this point in the history
modify resume, wandb and instruction
  • Loading branch information
StevenTang1998 authored Dec 27, 2022
2 parents 0a3dc3f + 8cdeec0 commit ebcef12
Show file tree
Hide file tree
Showing 9 changed files with 131 additions and 28 deletions.
100 changes: 100 additions & 0 deletions asset/basic_training.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Basic Training
## config
You may want to load your configurations in equivalent ways:
* cmd
* config files
* yaml

### cmd
You may want to change configurations in the command line like ``--xx=yy``. ``xx`` is the name of the parameters and ``yy`` is the corresponding value. for example:

```bash
python run_textbox.py --model=BART --model_path=facebook/bart-base --epochs=1
```

It's suitable for **a few temporary** modifications with cmd like:
* ``model``
* ``model_path``
* ``dataset``
* ``epochs``
* ...

### config files

You can also modify configurations through the local files:
```bash
python run_textbox.py ... --config_files <config-file-one> <config-file-two>
```

Every config file is an additional yaml file like:

```yaml
efficient_methods: ['prompt-tuning']
```
It's suitable for **a large number of** modifications or **long-term** modifications with cmd like:
* ``efficient_methods``
* ``efficient_kwargs``
* ...
### yaml
The original configurations are in the yaml files. You can check the values there, but it's not recommended to modify the files except for **permanent** modification of the dataset. These files are in the path ``textbox\properties``:
* ``overall.yaml``
* ``dataset\*.yaml``
* ``model\*yaml``
## trainer
You can choose an optimizer and scheduler through `optimizer=<optimizer-name>` and `scheduler=<scheduler-name>`. We provide a wrapper around **pytorch optimizer**, which means parameters like `epsilon` or `warmup_steps` can be specified with keyword dictionaries `optimizer_kwargs={'epsilon': ... }` and `scheduler_kwargs={'warmup_steps': ... }`. See [pytorch optimizer](https://pytorch.org/docs/stable/optim.html#algorithms) and scheduler for a complete tutorial. <!-- TODO -->

Validation frequency is introduced to validate the model **at each specific batch-steps or epoch**. Specify `valid_strategy` (either `'step'` or `'epoch'`) and `valid_steps=<int>` to adjust the pace. Specifically, the traditional train-validate paradigm is a special case with `valid_strategy=epoch` and `valid_steps=1`.

`max_save=<int>` indicates **the maximal amount of saved files** (checkpoint and generated corpus during evaluation). `-1`: save every file, `0`: do not save any file, `1`: only save the file with the best score, and `n`: save both the best and the last $n−1$ files.

According to ``metrics_for_best_model``, the score of the current checkpoint will be calculated, and evaluation metrics specified with ``metrics``([full list](evaluation.md)) will be chosen. **Early stopping** can be configured with `stopping_steps=<int>` and score of every checkpoint.


```bash
python run_textbox.py ... --stopping_steps=8 \\
--metrics_for_best_model=\[\'rouge-1\', \'rouge-w\'\] \\
--metrics=\[\'rouge\'\]
```

You can resume from a **previous checkpoint** through ``model_path=<checkpoint_path>``.When you want to restore **all trainer parameters** like optimizer and start_epoch, you can set ``resume_training=True``. Otherwise, only **model and tokenizer** will be loaded. The script below will resume training from checkpoint in the path ``saved/BART-samsum-2022-Dec-18_20-57-47/checkpoint_best``

```bash
python run_textbox --model_path=saved/BART-samsum-2022-Dec-18_20-57-47/checkpoint_best \\
--resume_training=True
```

Other commonly used parameters include `epochs=<int>` and `max_steps=<int>` (indicating maximum iteration of epochs and batch steps, if you set `max_steps`, `epochs` will be invalid), `learning_rate=<float>`, `train_batch_size=<int>`, `weight_decay=<bool>`, and `grad_clip=<bool>`.

### Partial Experiment

You can run the partial experiment with `do_train`, `do_valid`and `do_test`. You can test your pipeline and debug with `quick_test=<amount-of-data-to-load>` to load just a few examples.

The following script loads the trained model from a local path and conducts generation and evaluation without training and evaluation.
```bash
python run_textbox.py --model_path=saved/BART-samsum-2022-Dec-18_20-57-47/checkpoint_best \\
--do_train=False --do_valid=False
```

## wandb

If you are running your code in jupyter environments, you may want to log in by simply setting an environment variable (your key may be stored in plain text):

```python
%env WANDB_API_KEY=<your-key>
```
Here you can set wandb with `wandb`.

If you are debugging your model, you may want to **disable W&B** with `--wandb=disabled`, and **none of the metrics** will be recorded. You can also disable **sync only** with `--wandb=offline` and enable it again with `--wandb=online` to upload to the cloud. Meanwhile, the parameter can be configured in the yaml file like:

```yaml
wandb: online
```

The local files can be uploaded by executing `wandb sync` in the command line.

After configuration, you can throttle wandb prompts by defining the environment variable `export WANDB_SILENT=false`. For more information, see [documentation](docs.wandb.ai).
12 changes: 1 addition & 11 deletions install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ esac

echo "Installation may take a few minutes."
echo -e "\033[0;32mInstalling torch ...\033[0m"
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch

echo -e "\033[0;32mInstalling requirements ...\033[0m"
pip install -r requirements.txt
Expand Down Expand Up @@ -75,16 +75,6 @@ chmod +rx $F2RExpDIR/WordNet-2.0.exc.db
pip uninstall py-rouge
pip install rouge > /dev/null

echo -e "\033[0;32mInstalling requirements (libxml) ...\033[0m"
if [[ "$OSTYPE" == "darwin"* ]]; then
brewinstall libxml2 cpanminus
cpanm --force XML::Parser
else
if [ -x "$(command -v apt-get)" ]; then sudo apt-get install libxml-parser-perl
elif [ -x "$(command -v yum)" ]; then sudo yum install -y "perl(XML::LibXML)"
else echo -e '\033[0;31mFailed to install libxml. See https://github.com/pltrdy/files2rouge/issues/9 for more information.\033[0m' && exit;
fi
fi

echo -e "\033[0;32mInstalling requirements (transformers) ...\033[0m"
git clone https://github.com/RUCAIBox/transformers.git
Expand Down
16 changes: 16 additions & 0 deletions instructions/RNN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
## RNN

You can train a RNN encoder-decoder with attention from scratch with this model. Three models are available:
* RNN
* GRU
* LSTM

You can choose them through ``model=RNN``,``model=GRU``,``model=LSTM``. Meanwhile, you can check or modify the default parameters of the model in ``textbox/property/model/rnn.yaml(gru.yaml)(lstm.yaml)``

Example usage:

```bash
python run_textbox.py \
--model=RNN \
--dataset=samsum
```
2 changes: 2 additions & 0 deletions textbox/config/configurator.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,8 @@ def _set_default_parameters(self):
self.setdefault('valid_strategy', 'epoch')
self.setdefault('valid_steps', 1)
self.setdefault('disable_tqdm', False)
self.setdefault('resume_training',True)
self.setdefault('wandb', 'online')
self._simplify_parameter('optimizer')
self._simplify_parameter('scheduler')
self._simplify_parameter('src_lang')
Expand Down
1 change: 1 addition & 0 deletions textbox/properties/overall.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ seed: 2020
state: INFO
reproducibility: True
data_path: 'dataset/'
wandb: 'online'

# training settings
epochs: 50
Expand Down
5 changes: 4 additions & 1 deletion textbox/quick_start/experiment.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ def __init__(
config_dict: Optional[Dict[str, Any]] = None,
):
self.config = Config(model, dataset, config_file_list, config_dict)
wandb_setting = 'wandb ' + self.config['wandb']
os.system(wandb_setting)
self.__extended_config = None

self.accelerator = Accelerator(gradient_accumulation_steps=self.config['accumulation_steps'])
Expand Down Expand Up @@ -94,7 +96,8 @@ def _on_experiment_start(self, extended_config: Optional[dict]):
self.valid_result: Optional[ResultType] = None
self.test_result: Optional[ResultType] = None
if config['load_type'] == 'resume':
self.trainer.resume_checkpoint(config['model_path'])
if config['resume_training']:
self.trainer.resume_checkpoint(config['model_path'])
self.model.from_pretrained(config['model_path'])

def _do_train_and_valid(self):
Expand Down
11 changes: 4 additions & 7 deletions textbox/trainer/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -364,18 +364,15 @@ def save_checkpoint(self):
def save_generated_text(self, generated_corpus: List[str], is_valid: bool = False):
r"""Store the generated text by our model into `self.saved_text_filename`."""
saved_text_filename = self.saved_text_filename
if not is_valid:
self._summary_tracker.add_corpus('test', generated_corpus)
else:
path_to_save = self.saved_model_filename + '_epoch-' + str(self.timestamp.valid_epoch)
saved_text_filename = os.path.join(path_to_save, 'generation.txt')
os.makedirs(path_to_save, exist_ok=True)
path_to_save = self.saved_model_filename + '_epoch-' + str(self.timestamp.valid_epoch)
saved_text_filename = os.path.join(path_to_save, 'generation.txt')
os.makedirs(path_to_save, exist_ok=True)
with open(saved_text_filename, 'w') as fout:
for text in generated_corpus:
fout.write(text + '\n')

def resume_checkpoint(self, resume_dir: str):
r"""Load the model parameters information and training information.
r"""Load training information.
Args:
resume_dir: the checkpoint file (specific by `model_path`).
Expand Down
4 changes: 3 additions & 1 deletion textbox/utils/argument_list.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
'_hyper_tuning', # hyper tuning
'multi_seed', # multiple random seed
'romanian_postprocessing',
'wandb'
]

training_parameters = [
Expand All @@ -43,7 +44,8 @@
'weight_decay', # common parameters
'accumulation_steps', # accelerator
'disable_tqdm', # tqdm
'pretrain_task' # pretraining
'pretrain_task', # pretraining
'resume_training'
]

evaluation_parameters = [
Expand Down
8 changes: 0 additions & 8 deletions textbox/utils/dashboard.py
Original file line number Diff line number Diff line change
Expand Up @@ -435,14 +435,6 @@ def add_scalar(self, tag: str, scalar_value: Union[float, int]):
if self._is_local_main_process and not self.tracker_finished and self.axes is not None:
wandb.log(info, step=self.axes.train_step, commit=False)

def add_corpus(self, tag: str, corpus: Iterable[str]):
r"""Add a corpus to summary."""
if tag.startswith('valid'):
self._current_epoch._update_metrics({'generated_corpus': '\n'.join(corpus)})
if self._is_local_main_process and not self.tracker_finished:
_corpus = wandb.Table(columns=[tag], data=pd.DataFrame(corpus))
wandb.log({tag: _corpus}, step=self.axes.train_step)


root = None

Expand Down

0 comments on commit ebcef12

Please sign in to comment.