Skip to content

Commit

Permalink
Merge branch 'main' into te
Browse files Browse the repository at this point in the history
  • Loading branch information
Quentin-Anthony authored Oct 8, 2024
2 parents b3255e6 + 3272032 commit afeff03
Show file tree
Hide file tree
Showing 13 changed files with 344 additions and 89 deletions.
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -679,7 +679,9 @@ We support profiling with Nsight Systems, the PyTorch Profiler, and PyTorch Memo
## Nsight Systems Profiling
To use the Nsight Systems profiling, set config options `profile`, `profile_step_start`, and `profile_step_stop`. Launch training with:
To use the Nsight Systems profiling, set config options `profile`, `profile_step_start`, and `profile_step_stop` (see [here](https://github.com/EleutherAI/gpt-neox/blob/main/configs/neox_arguments.md) for argument usage, and [here](https://github.com/EleutherAI/gpt-neox/blob/main/configs/prof.yml) for a sample config).
To populate nsys metrics, launch training with:
```
nsys profile -s none -t nvtx,cuda -o <path/to/profiling/output> --force-overwrite true \
Expand All @@ -689,22 +691,22 @@ $TRAIN_PATH/train.py --conf_dir configs <config files>
The generated output file can then by viewed with the Nsight Systems GUI:
![Alt text](images/nsight_profiling.png)
![nsight-prof](images/nsight_profiling.png)
## PyTorch Profiling
To use the built-in PyTorch profiler, set config options `profile`, `profile_step_start`, and `profile_step_stop`.
To use the built-in PyTorch profiler, set config options `profile`, `profile_step_start`, and `profile_step_stop` (see [here](https://github.com/EleutherAI/gpt-neox/blob/main/configs/neox_arguments.md) for argument usage, and [here](https://github.com/EleutherAI/gpt-neox/blob/main/configs/prof.yml) for a sample config).
The PyTorch profiler will save traces to your `tensorboard` log directory. You can view these traces within
TensorBoard by following the steps [here](https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html).
![Alt text](images/pytorch_profiling.png)
![torch-prof](images/pytorch_profiling.png)
## PyTorch Memory Profiling
To use PyTorch Memory Profiling, set config options `memory_profiling` and `memory_profiling_path`.
To use PyTorch Memory Profiling, set config options `memory_profiling` and `memory_profiling_path` (see [here](https://github.com/EleutherAI/gpt-neox/blob/main/configs/neox_arguments.md) for argument usage, and [here](https://github.com/EleutherAI/gpt-neox/blob/main/configs/prof.yml) for a sample config).
![Alt text](images/memory_profiling.png)
![mem-prof](images/memory_profiling.png)
View the generated profile with the [memory_viz.py](https://github.com/pytorch/pytorch/blob/main/torch/cuda/_memory_viz.py) script. Run with:
Expand Down
2 changes: 2 additions & 0 deletions configs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,8 @@ These can be set to any integer between `0` and `num_gpus`, and `num_gpus` must
# this should provide some speedup but takes a while to build, set to true if desired
"scaled_upper_triang_masked_softmax_fusion": false,
"train_iters": 320000,
# alternatively, use train_epochs to automatically determine the number of training iterations
#"train_epochs": 1,
```
An example of some basic settings used to configure your model's architecture and number of training steps.

Expand Down
41 changes: 39 additions & 2 deletions configs/neox_arguments.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,19 @@ LR Scheduler Arguments
Learning rate decay function. Choose from 'constant', 'linear', 'cosine', 'exponential'.



- **lr_decay_iters**: int

Default = None

Number of iterations to decay learning rate over, If None defaults to --train-iters
Number of iterations to decay learning rate over. If None, defaults to
--train-iters or the equivalent inferred value from train_epochs.

- **lr_decay_fraction**: float

Default = None

Effective fraction of training over which to decay lr. Overrides lr_decay_iters.
Useful when specifying train_epochs.

- **min_lr**: float

Expand Down Expand Up @@ -838,6 +843,29 @@ Model Arguments
- **dim_att**: int
Default = None
Total dimension of the attention mechanism for RWKV. If not set, defaults to hidden_size.
- **head_size**: int
Default = None
Size of each attention head for RWKV. Calculated as dim_att // num_attention_heads.
- **ffn_dim**: int
Default = None
Dimension of the feed-forward network for RWKV. If not set, calculated based on hidden_size and expansion_factor.
## NeoXArgsOptimizer
Optimizer Arguments
Expand Down Expand Up @@ -1928,6 +1956,15 @@ Training Arguments
- **train_epochs**: int
Default = None
Number of epochs to run for training. Do not specify both train_epochs and train_iters.
Not currently compatible with data reweighing, pairwise datasets, and packing other than 'packed'
- **eval_iters**: int
Default = 100
Expand Down
17 changes: 17 additions & 0 deletions configs/prof.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Sample profiling config
{
# Turns on nsys and pytorch profiling
"profile": true,

# pytorch profiler options
"profile_step_start": 10,
"profile_step_stop": 12,

# pytorch memory profiler options
"memory_profiling": true,
"memory_profiling_path": tensorboard,


# All trace files (pytorch, nsys, tensorboard, etc) will be written here
"tensorboard_dir": "tensorboard",
}
Loading

0 comments on commit afeff03

Please sign in to comment.