Skip to content

Commit

Permalink
📝 Update LLaMa docs (#415)
Browse files Browse the repository at this point in the history
## Describe your changes
Update Llama docs to tell the different optimization config for various
model size.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Format your code by running `pre-commit run --all-files`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.

## (Optional) Issue link
  • Loading branch information
trajepl authored Jul 15, 2023
1 parent fe6e3da commit 7a6de53
Show file tree
Hide file tree
Showing 2 changed files with 50 additions and 2 deletions.
48 changes: 48 additions & 0 deletions examples/open_llama/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,54 @@ This workflow also demonstrates how to use:

This example config file [open_llama_config.json](open_llama_config.json) is meant to be a starting point for optimizing Open LLaMA for target hardware. One can add additional passes as well as set different options for Transformer Optimization pass as per need. See [Olive documentation](https://microsoft.github.io/Olive/) for more information on optimizations passes available.

Note that this example config uses [openlm-research/open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b) for demonstration purpose. There are other models available in [Open LLaMA](https://huggingface.co/openlm-research) that can be used for optimization. The following table shows a few models' configurations:

| Model | Num Hidden Layers| Num Attention Heads | Hidden Size |
| --- | --- | --- | --- |
| [openlm-research/open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b) | 26 | 32 | 3200 |
| [openlm-research/open_llama_7b](https://huggingface.co/openlm-research/open_llama_7b) | 32 | 32 | 4096 |
| [openlm-research/open_llama_13b](https://huggingface.co/openlm-research/open_llama_13b) | 40 | 40 | 5120 |


When you run the example config for other larger models, you may need
1. change the `model_path` to the one you use.
```json
"input_model":{
"type": "OptimumModel",
"config": {
"model_path": "openlm-research/open_llama_3b", // to change based on the model you use
"model_components": ["decoder_model.onnx", "decoder_with_past_model.onnx"],
"hf_config": {
"model_class": "LlamaForCausalLM"
}
}
}
```
2. change the transformer optimization pass options in `open_llama_config.json` based on the above table:
```json
"optimize": {
"type": "OrtTransformersOptimization",
"config": {
"model_type": "gpt2",
"float16": true,
"use_gpu": false,
"keep_io_types": true,
"num_heads": 32, // to change based on the model you use
"hidden_size": 4096, // to change based on the model you use
"optimization_options": {
"use_multi_head_attention": false
}
}
}
```
3. increase the `num_hidden_layers` for dummy inputs in `user_script.py`.
```python
// to increase `num_hidden_layers` to conduct proper inputs data
def dummy_inputs(batch_size, torch_dtype, model_framework=Framework.PYTORCH, num_hidden_layers=26):
past_sequence_length = 1
attention_mask_sequence_length = 1
```

### Run sample using config

The optimization techniques to run are specified in the relevant config json file.
Expand Down
4 changes: 2 additions & 2 deletions examples/open_llama/user_script.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ def __getitem__(self, idx):
return self.create_input_func(self.batch_size, self.torch_dtype, self.model_framework), label


def dummy_inputs(batch_size, torch_dtype, model_framework=Framework.PYTORCH):
def dummy_inputs(batch_size, torch_dtype, model_framework=Framework.PYTORCH, num_hidden_layers=26):
past_sequence_length = 1
attention_mask_sequence_length = 1
sequence_length = 2
Expand All @@ -30,7 +30,7 @@ def dummy_inputs(batch_size, torch_dtype, model_framework=Framework.PYTORCH):
}

if model_framework == Framework.ONNX:
for layer_index in range(26):
for layer_index in range(num_hidden_layers):
inputs[f"past_key_values.{layer_index}.key"] = torch.rand(
(batch_size, 32, past_sequence_length, 100), dtype=torch_dtype
)
Expand Down

0 comments on commit 7a6de53

Please sign in to comment.