Skip to content

Commit

Permalink
Update README.md to add python formatting in code examples (microsoft…
Browse files Browse the repository at this point in the history
  • Loading branch information
imJunaidAfzal authored and LeetJoe committed Sep 15, 2023
1 parent 98954fd commit 841bfa9
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions inference/huggingface/zero_inference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ INT4 weight quantization can be easily enabled with a few lines of configuration

### Quantized Initialization
This is the easiest way to getting started. By providing a few lines of hints in ds_config, the model will be on-the-fly quantized during model initialization (e.g., AutoModel.from_pretrained). All candidate layers will be automatically quantized.
```
```python
ds_config = {
'weight_quantization': {
'quantized_initialization': {
Expand All @@ -135,7 +135,7 @@ Currently, ZeRO-inference can quantize the weight matrix of nn.Embedding and nn.
### Post Initialization Quantization
In this mode, model is first loaded in FP16 format and then convert into INT4. The advantage of enabling this mode is that users will have an overview of the model architecture. Thus, they will have fine-grained control over the quantization decision. For example, which layer should be quantized with which quantization configuration can be controlled. Only a few lines of code changes are needed. Note that we plan to expand this mode to accommodate more formats in the near future.
```
```python
from deepspeed.compression.inference.quantization import _init_group_wise_weight_quantization
ds_config = {
'weight_quantization': {
Expand Down

0 comments on commit 841bfa9

Please sign in to comment.