Skip to content

Commit

Permalink
add gpt-j alpha-tuning example and fix md typo (#783)
Browse files Browse the repository at this point in the history
Signed-off-by: Lu, Yintong <[email protected]>
(cherry picked from commit 2dd4303)
  • Loading branch information
yintong-lu authored and chensuyue committed May 9, 2023
1 parent 5e9fb35 commit 3b7d282
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 2 deletions.
2 changes: 1 addition & 1 deletion docs/source/smooth_quant.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ $$

where $X_{fp32}$ is the input matrix, $S$ is the scale factor, $Z$ is the integer zero point.

### Per-tenor & Per-channel
### Per-tensor & Per-channel

There are several choices of sharing quantization parameters among tensor elements, also called quantization granularity. The coarsest level, per-tensor granularity, is that all elements in the tensor share the same quantization parameters. Finer granularity means sharing quantization parameters per row or per column for 2D matrices and per channel for 3D matrices. Similarly, the finest granularity is that each element has an individual parameter.

Expand Down
2 changes: 1 addition & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -579,7 +579,7 @@ Intel® Neural Compressor validated examples with multiple compression technique
<td>EleutherAI/gpt-j-6B</td>
<td>Natural Language Processing</td>
<td>Post-Training Static Quantization</td>
<td><a href="./pytorch/nlp/huggingface_models/language-modeling/quantization/ptq_static/fx">fx</a></td>
<td><a href="./pytorch/nlp/huggingface_models/language-modeling/quantization/ptq_static/fx">fx</a> / <a href="./pytorch/nlp/huggingface_models/language-modeling/quantization/ptq_static/ipex/smooth_quant">smooth quant</a></td>
</tr>
<tr>
<td>abeja/gpt-neox-japanese-2.7b</td>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,18 @@ python eval_lambada.py \
--alpha auto
```

#### For GPT-J model, please enable the fallback_add option
```shell
python eval_lambada.py \
--model_name_or_path EleutherAI/gpt-j-6B \
--int8 \
--sq \
--alpha auto \
--fallback_add \
```



## Benchmarking

int8 benchmarking
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
parser.add_argument('--log_frequency', type=int, default=100)
parser.add_argument('--batch_size', type=int, default=16)
parser.add_argument('--kl', action='store_true', default=False, help="whether to use kl divergence for calibration")
parser.add_argument('--fallback_add', action='store_true', default=False, help="Whether to add fp32 fallback option" )
args = parser.parse_args()

from torch.nn.functional import pad
Expand Down Expand Up @@ -147,6 +148,8 @@ def eval_func(model):
op_type_dict = None
if args.kl:
op_type_dict = {'linear': {'activation': {'algorithm': ['kl']}}}
if args.fallback_add:
op_type_dict["add"] = {"weight": {"dtype": ["fp32"]}, "activation": {"dtype": ["fp32"]}}

conf = PostTrainingQuantConfig(quant_level=1, backend='ipex', excluded_precisions=["bf16"],##use basic tuning
recipes=recipes,
Expand Down

0 comments on commit 3b7d282

Please sign in to comment.