add gpt-j alpha-tuning example and fix md typo (#783)

Signed-off-by: Lu, Yintong <[email protected]> (cherry picked from commit 2dd4303)
intel · May 9, 2023 · 3b7d282 · 3b7d282
1 parent 5e9fb35
commit 3b7d282
Show file tree

Hide file tree

Showing 4 changed files with 17 additions and 2 deletions.
diff --git a/docs/source/smooth_quant.md b/docs/source/smooth_quant.md
@@ -24,7 +24,7 @@ $$
 
 where $X_{fp32}$ is the input matrix, $S$ is the scale factor,  $Z$ is the integer zero point.
 
-### Per-tenor & Per-channel
+### Per-tensor & Per-channel
 
 There are several choices of sharing quantization parameters among tensor elements, also called quantization granularity. The coarsest level, per-tensor granularity, is that all elements in the tensor share the same quantization parameters. Finer granularity means sharing quantization parameters per row or per column for 2D matrices and per channel for 3D matrices. Similarly, the finest granularity is that each element has an individual parameter.
 

diff --git a/examples/README.md b/examples/README.md
@@ -579,7 +579,7 @@ Intel® Neural Compressor validated examples with multiple compression technique
     <td>EleutherAI/gpt-j-6B</td>
     <td>Natural Language Processing</td>
     <td>Post-Training Static Quantization</td>
-    <td><a href="./pytorch/nlp/huggingface_models/language-modeling/quantization/ptq_static/fx">fx</a></td>
+    <td><a href="./pytorch/nlp/huggingface_models/language-modeling/quantization/ptq_static/fx">fx</a> / <a href="./pytorch/nlp/huggingface_models/language-modeling/quantization/ptq_static/ipex/smooth_quant">smooth quant</a></td>
   </tr>
   <tr>
     <td>abeja/gpt-neox-japanese-2.7b</td>

diff --git a/...ce_models/language-modeling/quantization/ptq_static/ipex/smooth_quant/README.md b/...ce_models/language-modeling/quantization/ptq_static/ipex/smooth_quant/README.md
@@ -34,6 +34,18 @@ python eval_lambada.py \
   --alpha auto
 ```
 
+#### For GPT-J model, please enable the fallback_add option
+```shell
+python eval_lambada.py \
+  --model_name_or_path EleutherAI/gpt-j-6B \
+  --int8 \
+  --sq \
+  --alpha auto \
+  --fallback_add \
+```
+
+
+
 ## Benchmarking 
 
 int8 benchmarking

diff --git a/...ngface_models/language-modeling/quantization/ptq_static/ipex/smooth_quant/eval_lambada.py b/...ngface_models/language-modeling/quantization/ptq_static/ipex/smooth_quant/eval_lambada.py
@@ -17,6 +17,7 @@
 parser.add_argument('--log_frequency', type=int, default=100)
 parser.add_argument('--batch_size', type=int, default=16)
 parser.add_argument('--kl', action='store_true', default=False, help="whether to use kl divergence for calibration")
+parser.add_argument('--fallback_add', action='store_true', default=False, help="Whether to add fp32 fallback option" )
 args = parser.parse_args()
 
 from torch.nn.functional import pad
@@ -147,6 +148,8 @@ def eval_func(model):
     op_type_dict = None
     if args.kl:
         op_type_dict = {'linear': {'activation': {'algorithm': ['kl']}}}
+    if args.fallback_add:
+        op_type_dict["add"] = {"weight": {"dtype": ["fp32"]}, "activation": {"dtype": ["fp32"]}}
 
     conf = PostTrainingQuantConfig(quant_level=1, backend='ipex', excluded_precisions=["bf16"],##use basic tuning
                                    recipes=recipes,