From 841bfa9cd78ffdf68915e62edc40aa6682b0c3c9 Mon Sep 17 00:00:00 2001 From: Junaid Afzal <54235418+imJunaidAfzal@users.noreply.github.com> Date: Tue, 12 Sep 2023 20:50:58 +0500 Subject: [PATCH] Update README.md to add python formatting in code examples (#729) --- inference/huggingface/zero_inference/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/inference/huggingface/zero_inference/README.md b/inference/huggingface/zero_inference/README.md index f720414b8..d76de89cd 100644 --- a/inference/huggingface/zero_inference/README.md +++ b/inference/huggingface/zero_inference/README.md @@ -115,7 +115,7 @@ INT4 weight quantization can be easily enabled with a few lines of configuration ### Quantized Initialization This is the easiest way to getting started. By providing a few lines of hints in ds_config, the model will be on-the-fly quantized during model initialization (e.g., AutoModel.from_pretrained). All candidate layers will be automatically quantized. -``` +```python ds_config = { 'weight_quantization': { 'quantized_initialization': { @@ -135,7 +135,7 @@ Currently, ZeRO-inference can quantize the weight matrix of nn.Embedding and nn. ### Post Initialization Quantization In this mode, model is first loaded in FP16 format and then convert into INT4. The advantage of enabling this mode is that users will have an overview of the model architecture. Thus, they will have fine-grained control over the quantization decision. For example, which layer should be quantized with which quantization configuration can be controlled. Only a few lines of code changes are needed. Note that we plan to expand this mode to accommodate more formats in the near future. -``` +```python from deepspeed.compression.inference.quantization import _init_group_wise_weight_quantization ds_config = { 'weight_quantization': {