diff --git a/README.md b/README.md index 1b9038f95eb..d6c430bf78b 100755 --- a/README.md +++ b/README.md @@ -37,9 +37,7 @@ Intel® Neural Compressor has been one of the critical AI software components in # install stable version from from conda conda install neural-compressor -c conda-forge -c intel ``` -More installation methods can be found at [Installation Guide](./docs/installation_guide.md). -> **Note:** -> Run into installation issues, please check [FAQ](./docs/faq.md). +More installation methods can be found at [Installation Guide](./docs/installation_guide.md). Please check out our [FAQ](./docs/faq.md) for more details. ## Getting Started * Quantization with Python API @@ -122,8 +120,8 @@ Intel® Neural Compressor supports systems based on [Intel 64 architecture or co -> Note: 1.Starting from official TensorFlow 2.6.0, oneDNN has been default in the binary. Please set the environment variable TF_ENABLE_ONEDNN_OPTS=1 to enable the oneDNN optimizations. -> 2.Starting from official TensorFlow 2.9.0, oneDNN optimizations are enabled by default on CPUs with neural-network-focused hardware features such as AVX512_VNNI, AVX512_BF16, AMX, etc. No need to set environment variable. +> **Note:** +> Please set the environment variable TF_ENABLE_ONEDNN_OPTS=1 to enable oneDNN optimizations if you are using TensorFlow from v2.6 to v2.8. oneDNN has been fully default from TensorFlow v2.9. ### Validated Models Intel® Neural Compressor validated 420+ [examples](./examples) with performance speedup geomean 2.2x and up to 4.2x on VNNI while minimizing the accuracy loss. @@ -143,7 +141,7 @@ More details for validated models are available [here](docs/validated_model_list
Types | +Quantization | +Dataset Requirements | +Framework | +Backend | +
---|---|---|---|---|
Post-Training Static Quantization (PTQ) | +weights and activations | +calibration | +PyTorch | +PyTorch Eager/PyTorch FX/IPEX | +
TensorFlow | +TensorFlow/Intel TensorFlow | +|||
ONNX Runtime | +QLinearops/QDQ | +|||
Post-Training Dynamic Quantization | +weights | +none | +PyTorch | +PyTorch eager mode/PyTorch fx mode/IPEX | +
ONNX Runtime | +QIntegerops | +|||
Quantization-aware Training (QAT) | +weights and activations | +fine-tuning | +PyTorch | +PyTorch eager mode/PyTorch fx mode/IPEX | +
TensorFlow | +TensorFlow/Intel TensorFlow | +
System Configuration | Intel Xeon Platinum 8380 Scalable processor |
---|---|
Manufacturer | +Intel Corporation | +
Product Name | +M50CYP2SBSTD | +
BIOS Version | +SE5C6200.86B.0022.D64.2105220049 | +
OS | +Ubuntu 20.04.1 LTS | +
Kernel | +5.4.0-42-generic | +
Microcode | +0xd0002b1 | +
CPU Model | +Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz | +
Base Frequency | +2.3GHZ | +
Thread(s) per Core | +2 | +
Core(s) per Socket | +40 | +
Socket(s) | +2 | +
Turbo | +Enabled | +
Power & Perf Policy | +Balanced | +
Installed | +256GB (16x16GB DDR4 3200MT/s [3200MT/s]) | +
NIC Summary | +2x Ethernet Controller 10G X550T | +
Drive Summary | +1x INTEL_SSDSC2KW01 953.9G, +1x CT1000MX500SSD1 931.5G, +1x CT1000MX500SSD1 931.5G + | +
Pruning Type | +Pruning Granularity | +Pruning Algorithm | +Framework | +
---|---|---|---|
Unstructured Pruning | +Element-wise | +Magnitude | +PyTorch, TensorFlow | +
Pattern Lock | +PyTorch | +||
Structured Pruning | +Filter/Channel-wise | +Gradient Sensitivity | +PyTorch | +
Block-wise | +Group Lasso | +PyTorch | +|
Element-wise | +Pattern Lock | +PyTorch | +
Framework | -version | -model | +Model | Accuracy | -Performance 1s4c10ins1bs/throughput (samples/sec) |
+ Performance throughput (samples/sec) |
+ Example | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
INT8 | FP32 | -Acc Ratio[(INT8-FP32)/FP32] | +Accuracy Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Performance Ratio[INT8/FP32] | ||||||||||
BERT large SQuAD | +92.39 | +92.99 | +-0.64% | +25.32 | +12.53 | +2.02x | +pb | +|||||||||
DenseNet121 | +73.57% | +72.89% | +0.93% | +370.52 | +329.74 | +1.12x | +pb | +|||||||||
DenseNet161 | +76.24% | +76.29% | +-0.07% | +219.46 | +180.75 | +1.21x | +pb | +|||||||||
DenseNet169 | +74.40% | +74.65% | +-0.33% | +301.33 | +259.88 | +1.16x | +pb | +|||||||||
Faster R-CNN Inception ResNet V2 | +37.98% | +38.33% | +-0.91% | +3.96 | +2.34 | +1.69x | +pb | +|||||||||
Faster R-CNN Inception ResNet V2 | +37.84% | +38.33% | +-1.28% | +3.98 | +2.31 | +1.72x | +SavedModel | +|||||||||
Faster R-CNN ResNet101 | +30.28% | +30.39% | +-0.36% | +70 | +19.98 | +3.50x | +pb | +|||||||||
Faster R-CNN ResNet101 | +30.37% | +30.39% | +-0.07% | +70.26 | +16.98 | +4.14x | +SavedModel | +|||||||||
intel-tensorflow | -2.7.0 | -resnet50v1.5 | -76.82% | -76.46% | -0.47% | -1239.52 | -433.07 | -2.86x | +Inception ResNet V2 | +80.44% | +80.40% | +0.05% | +281.79 | +137.91 | +2.04x | +pb |
intel-tensorflow | -2.7.0 | -resnet101 | -77.50% | -76.45% | -1.37% | -874.41 | -352.91 | -2.48x | +Inception V1 | +70.48% | +69.74% | +1.06% | +2193.17 | +975.6 | +2.25x | +pb |
intel-tensorflow | -2.7.0 | -inception_v2 | +Inception V2 | 74.36% | 73.97% | 0.53% | -1840.78 | -853.52 | -2.16x | +1835.35 | +838.82 | +2.19x | +pb | |||
intel-tensorflow | -2.7.0 | -inception_v3 | +Inception V3 | 77.28% | 76.75% | 0.69% | -954.63 | -391.35 | -2.44x | +973.42 | +376.3 | +2.59x | +pb | |||
intel-tensorflow | -2.7.0 | -inception_v4 | +Inception V4 | 80.40% | 80.27% | 0.16% | -580.02 | -202.14 | +575.9 | +200.55 | 2.87x | +pb | +||||
Mask R-CNN Inception V2 | +28.53% | +28.73% | +-0.70% | +132.51 | +50.3 | +2.63x | +pb | |||||||||
intel-tensorflow | -2.7.0 | -mobilenetv1 | +Mask R-CNN Inception V2 | +28.53% | +28.73% | +-0.70% | +132.89 | +50.97 | +2.61x | +ckpt | +||||||
MobileNet V1 | 71.79% | 70.96% | 1.17% | -3587.79 | -1343.07 | -2.67x | +3545.79 | +1191.94 | +2.97x | +pb | ||||||
intel-tensorflow | -2.7.0 | -mobilenetv2 | +MobileNet V2 | 71.89% | 71.76% | 0.18% | -2469.92 | -1434.87 | -1.72x | +2431.66 | +1420.11 | +1.71x | +pb | |||
intel-tensorflow | -2.7.0 | -ssd_resnet50_v1 | -37.86% | -38.00% | --0.37% | -70.35 | -26.34 | -2.67x | +ResNet101 | +77.50% | +76.45% | +1.37% | +877.91 | +355.49 | +2.47x | +pb |
intel-tensorflow | -2.7.0 | -ssd_mobilenet_v1 | -22.97% | -23.13% | --0.69% | -852.80 | -460.33 | +ResNet50 Fashion | +77.80% | +78.12% | +-0.41% | +3977.5 | +2150.68 | 1.85x | +pb | |
intel-tensorflow | -2.7.0 | -faster_rcnn_inception_resnet_v2 | -37.99% | -38.33% | --0.89% | -4.06 | -2.33 | -1.74x | -||||||||
intel-tensorflow | -2.7.0 | -faster_rcnn_resnet101_saved | -30.37% | -30.39% | --0.07% | -69.69 | -17.71 | -3.94x | -||||||||
intel-tensorflow | -2.7.0 | -mask_rcnn_inception_v2 | -28.54% | -28.72% | --0.63% | -123.97 | -53.23 | -2.33x | +ResNet50 V1.0 | +74.11% | +74.27% | +-0.22% | +1509.64 | +472.66 | +3.19x | +pb |
intel-tensorflow | -2.7.0 | -wide_deep_large_ds | -77.62% | -77.67% | --0.07% | -22704.16 | -21249.52 | -1.07x | +ResNet50 V1.5 | +76.82% | +76.46% | +0.47% | +1260.01 | +415.83 | +3.03x | +pb |
intel-tensorflow | -2.7.0 | -vgg16 | -72.66% | -70.89% | -2.50% | -669.62 | -178.75 | -3.75x | +ResNet V2 101 | +72.67% | +71.87% | +1.11% | +436.52 | +318.3 | +1.37x | +pb |
intel-tensorflow | -2.7.0 | -vgg19 | -72.72% | -71.01% | -2.41% | -558.43 | -148.19 | -3.77x | +ResNet V2 152 | +73.03% | +72.37% | +0.91% | +306.82 | +221.4 | +1.39x | +pb |
intel-tensorflow | -2.7.0 | -resnetv2_50 | +ResNet V2 50 | 70.33% | 69.64% | 0.99% | -765.73 | -580.54 | -1.32x | -|||||||
intel-tensorflow | -2.7.0 | -densenet121 | -73.57% | -72.89% | -0.93% | -366.59 | -296.63 | -1.24x | -||||||||
intel-tensorflow | -2.7.0 | -densenet161 | -76.24% | -76.29% | --0.07% | -218.26 | -164.48 | -1.33x | -||||||||
intel-tensorflow | -2.7.0 | -densenet169 | -74.40% | -74.65% | --0.33% | -294.82 | -253.35 | -1.16x | +749.85 | +574.19 | +1.31x | +pb | ||||
intel-tensorflow | -2.7.0 | -ssd_resnet50_v1_ckpt | -37.81% | -38.00% | --0.50% | -70.47 | -21.79 | -3.23x | +SSD MobileNet V1 | +22.97% | +23.13% | +-0.69% | +952.9 | +582.87 | +1.63x | +pb |
intel-tensorflow | -2.7.0 | -ssd_mobilenet_v1_ckpt | +SSD MobileNet V1 | 22.99% | 23.13% | -0.61% | -852.49 | -386.90 | -2.20x | +954.92 | +413.24 | +2.31x | +ckpt | |||
intel-tensorflow | -2.7.0 | -mask_rcnn_inception_v2_ckpt | -28.54% | -28.72% | --0.63% | -131.43 | -51.09 | -2.57x | -||||||||
intel-tensorflow | -2.7.0 | -resnet50v1.0 | -74.11% | -74.27% | --0.22% | -1543.95 | -501.61 | -3.08x | -||||||||
intel-tensorflow | -2.7.0 | -ssd_resnet34 | +SSD ResNet34 | 21.69% | 22.09% | -1.81% | -43.71 | -11.78 | -3.71x | +44.46 | +11.81 | +3.76x | +pb | |||
intel-tensorflow | -2.7.0 | -inception_v1 | -70.48% | -69.74% | -1.06% | -2227.69 | -1051.64 | -2.12x | -||||||||
intel-tensorflow | -2.7.0 | -faster_rcnn_inception_resnet_v2_saved | -37.90% | -38.33% | --1.12% | -4.05 | -2.33 | -1.74x | -||||||||
intel-tensorflow | -2.7.0 | -faster_rcnn_resnet101 | -30.28% | -30.39% | --0.36% | -69.74 | -19.90 | -3.50x | +SSD ResNet50 V1 | +37.86% | +38.00% | +-0.37% | +69.5 | +26.04 | +2.67x | +pb |
intel-tensorflow | -2.7.0 | -resnetv2_101 | -72.67% | -71.87% | -1.11% | -444.06 | -329.70 | -1.35x | +SSD ResNet50 V1 | +37.81% | +38.00% | +-0.50% | +69.27 | +21.17 | +3.27x | +ckpt |
intel-tensorflow | -2.7.0 | -inception_resnet_v2 | -80.44% | -80.40% | -0.05% | -284.40 | -143.73 | -1.98x | +VGG16 | +72.66% | +70.89% | +2.50% | +660.46 | +177.85 | +3.71x | +pb |
intel-tensorflow | -2.7.0 | -resnetv2_152 | -73.03% | -72.37% | -0.91% | -319.08 | -223.37 | -1.43x | +VGG19 | +72.72% | +71.01% | +2.41% | +562.04 | +147.61 | +3.81x | +pb |
intel-tensorflow | -2.7.0 | -resnet50_fashion | -77.80% | -78.12% | --0.41% | -3953.56 | -2170.49 | -1.82x | +Wide & Deep | +77.62% | +77.67% | +-0.07% | +21332.47 | +19714.08 | +1.08x | +pb |
Framework | -version | -model | +Model | Accuracy | -Performance 1s4c10ins1bs/throughput (samples/sec) |
+ Performance throughput (samples/sec) |
+ Example | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
INT8 | FP32 | -Acc Ratio[(INT8-FP32)/FP32] | +Accuracy Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Performance Ratio[INT8/FP32] | ||||||||||
intel-tensorflow | -1.15.0-up3 | -bert_large_squad | -92.42 | -92.98 | --0.61% | -25.99 | -12.55 | -2.07x | +ALBERT base MRPC | +88.06% | +88.50% | +-0.50% | +34.28 | +29.54 | +1.16x | +eager |
intel-tensorflow | -1.15.0-up3 | -bert_base_mrpc | -86.52% | -86.52% | -0.00% | -266.15 | -145.02 | -1.84x | -||||||||
intel-tensorflow | -1.15.0-up3 | -resnet_v1_50_slim | -76.38% | -75.18% | -1.60% | -1515.24 | -409.44 | -3.70x | -||||||||
intel-tensorflow | -1.15.0-up3 | -resnet_v1_101_slim | -77.52% | -76.40% | -1.47% | -837.49 | -224.57 | -3.73x | -||||||||
intel-tensorflow | -1.15.0-up3 | -resnet_v1_152_slim | -77.08% | -76.81% | -0.35% | -587.75 | -152.39 | -3.86x | -||||||||
intel-tensorflow | -1.15.0-up3 | -inception_v1_slim | -70.49% | -69.77% | -1.03% | -1968.87 | -803.53 | -2.45x | -||||||||
intel-tensorflow | -1.15.0-up3 | -inception_v2_slim | -74.35% | -73.98% | -0.50% | -1591.25 | -658.54 | -2.42x | -||||||||
intel-tensorflow | -1.15.0-up3 | -inception_v3_slim | -78.32% | -77.99% | -0.42% | -941.48 | -285.17 | -3.30x | -||||||||
intel-tensorflow | -1.15.0-up3 | -inception_v4_slim | -80.30% | -80.19% | -0.14% | -512.74 | -143.42 | -3.58x | -||||||||
intel-tensorflow | -1.15.0-up3 | -vgg16_slim | -72.78% | -70.89% | -2.67% | -609.29 | -151.15 | -4.03x | +Barthez MRPC | +82.99% | +83.81% | +-0.97% | +166.84 | +89.56 | +1.86x | +eager |
intel-tensorflow | -1.15.0-up3 | -vgg19_slim | -72.60% | -71.01% | -2.24% | -510.33 | -122.87 | -4.15x | -||||||||
intel-tensorflow | -1.15.0-up3 | -resnetv2_50_slim | -70.47% | -69.72% | -1.08% | -823.59 | -470.80 | -1.75x | -||||||||
intel-tensorflow | -1.15.0-up3 | -resnetv2_101_slim | -72.62% | -71.91% | -0.99% | -471.451 | -247.627 | -1.90x | +BERT base COLA | +58.80% | +58.84% | +-0.07% | +260 | +126.47 | +2.06x | +fx |
intel-tensorflow | -1.15.0-up3 | -resnetv2_152_slim | -72.95% | -72.40% | -0.76% | -339.192 | -170.545 | +BERT base MRPC | +90.28% | +90.69% | +-0.45% | +251.79 | +126.46 | 1.99x | +fx |
Framework | -version | -model | -Accuracy | -Performance 1s4c10ins1bs/throughput (samples/sec) |
+ BERT base RTE | +69.31% | +69.68% | +-0.52% | +252.14 | +126.45 | +1.99x | +fx | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
INT8 | -FP32 | -Acc Ratio[(INT8-FP32)/FP32] | -INT8 | -FP32 | -Performance Ratio[INT8/FP32] | +BERT base SST2 | +91.97% | +91.86% | +0.12% | +258.98 | +126.42 | +2.05x | +fx | |||
pytorch | -1.10.0+cpu | -se_resnext50_32x4d | -79.04% | -79.08% | --0.05% | -350.90 | -171.32 | -2.05x | +BERT base STSB | +89.13% | +89.75% | +-0.68% | +249.57 | +126.39 | +1.97x | +fx |
pytorch | -1.10.0+cpu | -mobilenet_v2 | -70.54% | -71.84% | --1.81% | -707.15 | -490.61 | -1.44x | +BERT large COLA | +62.88% | +62.57% | +0.49% | +88.75 | +36.7 | +2.42x | +fx |
pytorch | -1.10.0+cpu | -rnnt | -92.48 | -92.54 | --0.07% | -75.74 | -20.44 | -3.71x | +BERT large MRPC | +89.93% | +90.38% | +-0.49% | +89.43 | +36.62 | +2.44x | +fx |
pytorch | -1.10.0+cpu | -barthez_mrpc | -82.99% | -83.81% | --0.97% | -155.80 | -89.41 | -1.74x | +BERT large QNLI | +90.96% | +91.82% | +-0.94% | +91.27 | +37 | +2.47x | +fx |
pytorch | -1.10.0+cpu | -longformer_mrpc | -90.59% | -91.46% | --0.95% | -21.29 | -17.15 | -1.24x | +BERT large RTE | +71.84% | +72.56% | +-1.00% | +77.62 | +36.01 | +2.16x | +fx |
pytorch | -1.10.0+cpu | -resnet18 | -69.57% | -69.76% | --0.27% | -749.77 | -377.16 | -1.99x | +CamemBERT base MRPC | +86.56% | +86.82% | +-0.30% | +241.39 | +124.77 | +1.93x | +eager |
pytorch | -1.10.0+cpu | -resnet50 | -75.98% | -76.15% | --0.21% | -487.25 | -199.64 | -2.44x | +Deberta MRPC | +91.17% | +90.91% | +0.28% | +152.09 | +85.13 | +1.79x | +eager |
pytorch | -1.10.0+cpu | -resnext101_32x8d | -79.03% | -79.31% | --0.35% | -198.94 | -73.88 | -2.69x | +DistilBERT base MRPC | +88.66% | +89.16% | +-0.56% | +415.09 | +246.9 | +1.68x | +eager |
pytorch | -1.10.0+cpu | -resnet18_qat | -69.74% | -69.76% | --0.03% | -750.71 | -379.57 | -1.98x | +DistilBERT base MRPC | +88.74% | +89.16% | +-0.47% | +459.93 | +245.33 | +1.87x | +fx |
pytorch | -1.10.0+cpu | -resnet50_qat | -76.04% | -76.15% | --0.14% | -478.44 | -197.69 | -2.42x | +FlauBERT MRPC | +81.01% | +80.19% | +1.01% | +644.05 | +457.32 | +1.41x | +eager |
pytorch | -1.10.0+cpu | -inception_v3 | +Inception V3 | 69.43% | 69.52% | -0.13% | -433.36 | -216.31 | -2.00x | +454.3 | +213.7 | +2.13x | +eager | |||
pytorch | -1.10.0+cpu | -peleenet | -71.64% | -72.10% | --0.64% | -479.00 | -377.54 | -1.27x | +Longformer MRPC | +90.59% | +91.46% | +-0.95% | +21.51 | +17.45 | +1.23x | +eager |
pytorch | -1.10.0+cpu | -yolo_v3 | -24.60% | -24.54% | -0.21% | -105.84 | -39.80 | -2.66x | +Mask R-CNN | +37.70% | +37.80% | +-0.26% | +17.61 | +5.76 | +3.06x | +eager |
pytorch | -1.10.0+cpu | -blendcnn | -68.40% | -68.40% | +mBart WNLI | +56.34% | +56.34% | 0.00% | -4997.74 | -4621.03 | -1.08x | +65.05 | +31.26 | +2.08x | +eager | |
pytorch | -1.10.0+cpu | -roberta_base_mrpc | -87.88% | -88.18% | --0.34% | -246.27 | -125.03 | -1.97x | +MobileNet V2 | +70.54% | +71.84% | +-1.81% | +740.97 | +535.54 | +1.38x | +eager |
pytorch | -1.10.0+cpu | -camembert_base_mrpc | -86.56% | -86.82% | --0.30% | -236.17 | -124.68 | -1.89x | +lvwerra/pegasus-samsum | +42.21 | +42.67 | +-1.09% | +3.89 | +1.14 | +3.41x | +eager |
pytorch | -1.10.0+cpu | -distilbert_base_mrpc | -88.66% | -89.16% | --0.56% | -422.29 | -246.37 | -1.71x | +PeleeNet | +71.64% | +72.10% | +-0.64% | +502.01 | +391.31 | +1.28x | +eager |
pytorch | -1.10.0+cpu | -albert_base_mrpc | -88.06% | -88.50% | --0.50% | -34.44 | -28.85 | -1.19x | +ResNet18 | +69.57% | +69.76% | +-0.27% | +800.43 | +381.27 | +2.10x | +eager |
pytorch | -1.10.0+cpu | -pegasus_samsum | -42.20 | -42.67 | --1.09% | -3.80 | -1.14 | -3.33x | +ResNet18 | +69.57% | +69.76% | +-0.28% | +811.09 | +389.36 | +2.08x | +fx |
pytorch | -1.10.0+cpu | -flaubert_mrpc | -81.01% | -80.19% | -1.01% | -672.25 | -457.05 | -1.47x | +ResNet50 | +75.98% | +76.15% | +-0.21% | +507.55 | +200.52 | +2.53x | +eager |
pytorch | -1.10.0+cpu | -deberta_mrpc | -91.17% | -90.91% | -0.28% | -131.09 | -79.85 | -1.64x | +ResNeXt101_32x8d | +79.08% | +79.31% | +-0.29% | +203.54 | +73.85 | +2.76x | +eager |
pytorch | -1.10.0+cpu | -squeezebert_mrpc | -87.77% | -87.65% | -0.14% | -239.56 | -209.01 | -1.15x | +RNN-T | +92.45 | +92.55 | +-0.10% | +79.21 | +20.47 | +3.87x | +eager |
pytorch | -1.10.0+cpu | -resnet18_fx | -69.57% | -69.76% | --0.28% | -761.15 | -379.99 | +Roberta Base MRPC | +87.88% | +88.18% | +-0.34% | +250.21 | +124.92 | 2.00x | +eager | |
pytorch | -1.10.0+cpu | -resnet18_qat_fx | -69.73% | -69.76% | --0.04% | -765.09 | -377.01 | -2.03x | +Se_ResNeXt50_32x4d | +78.98% | +79.08% | +-0.13% | +358.63 | +173.03 | +2.07x | +eager |
pytorch | -1.10.0+cpu | -transfo_xl_mrpc | +SqueezeBERT MRPC | +87.77% | +87.65% | +0.14% | +249.89 | +207.43 | +1.20x | +eager | +||||||
Transfo-xl MRPC | 81.97% | 81.20% | 0.94% | -11.10 | -8.22 | +11.25 | +8.34 | 1.35x | +eager | |||||||
pytorch | -1.10.0+cpu | -bert_base_mrpc | -90.28% | -90.69% | --0.45% | -241.46 | -125.09 | -1.93x | -||||||||
pytorch | -1.10.0+cpu | -bert_base_cola | -58.80% | -58.84% | --0.07% | -253.12 | -125.17 | -2.02x | -||||||||
pytorch | -1.10.0+cpu | -bert_base_sts-b | -89.13% | -89.75% | --0.68% | -243.50 | -124.54 | -1.96x | -||||||||
pytorch | -1.10.0+cpu | -bert_base_sst-2 | -91.97% | -91.86% | -0.12% | -252.00 | -121.14 | -2.08x | +YOLOv3 | +24.60% | +24.54% | +0.21% | +108.09 | +40.02 | +2.70x | +eager |
pytorch | -1.10.0+cpu | -bert_large_cola | -62.88% | -62.57% | -0.49% | -87.88 | -36.93 | -2.38x | +Model | +Accuracy | +Performance throughput (samples/sec) |
+ Example | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pytorch | -1.10.0+cpu | -bert_base_rte | -69.31% | -69.68% | --0.52% | -244.20 | -125.71 | -1.94x | +INT8 | +FP32 | +Accuracy Ratio[(INT8-FP32)/FP32] | +INT8 | +FP32 | +Performance Ratio[INT8/FP32] | ||
pytorch | -1.10.0+cpu | -bert_large_mrpc | -89.93% | -90.38% | --0.49% | -87.44 | -36.71 | -2.38x | +ResNet18 | +69.74% | +69.76% | +-0.03% | +804.76 | +388.67 | +2.07x | +eager |
pytorch | -1.10.0+cpu | -bert_large_qnli | -90.96% | -91.82% | --0.94% | -89.18 | -36.87 | -2.42x | +ResNet18 | +69.73% | +69.76% | +-0.04% | +806.44 | +386.59 | +2.09x | +fx |
pytorch | -1.10.0+cpu | -bert_large_rte | -71.84% | -72.56% | --1.00% | -75.91 | -36.72 | -2.07x | +BERT base MRPC QAT | +89.60% | +89.50% | +0.11% | +258.89 | +125.79 | +2.06x | +fx |
pytorch | -1.10.0+cpu | -mbart_wnli | -56.34% | -56.34% | -0.00% | -65.24 | -31.06 | -2.10x | +ResNet50 | +76.04% | +76.15% | +-0.14% | +490.64 | +203.49 | +2.41x | +eager |
Framework | -version | -model | +Model | Accuracy | -Performance 1s4c10ins1bs/throughput (samples/sec) |
+ Performance throughput (samples/sec) |
+ Example | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
INT8 | FP32 | -Acc Ratio[(INT8-FP32)/FP32] | +Accuracy Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Performance Ratio[INT8/FP32] | |||||||||
pytorch | -1.10.0+cpu | -resnet50_ipex | -76.14% | -76.15% | -0.00% | -654.50 | -202.31 | -3.24x | -|||||||
pytorch | -1.10.0+cpu | -bert_large_ipex | -92.77 | -93.16 | --0.41% | -29.74 | -13.61 | -2.18x | -|||||||
pytorch | -1.10.0+cpu | -resnext101_32x16d_wsl_ipex | +|||||||||||||
bert-large-uncased-whole-word-masking-finetuned-squad | +92.9 | +93.16 | +-0.28% | +37.13 | +11.45 | +3.24x | +ipex | +||||||||
ResNeXt101_32x16d_wsl | 84.02% | 84.17% | -0.18% | -157.78 | -28.54 | -5.53x | +163.45 | +28.9 | +5.66x | +ipex | +|||||
ResNet50 | +76.00% | +76.15% | +-0.20% | +707.86 | +202.02 | +3.51x | +ipex | ||||||||
pytorch | -1.10.0+cpu | -ssd_resnet34_ipex | -19.95% | +SSD ResNet34 | +19.97% | 20.00% | --0.25% | -30.50 | -8.50 | -3.59x | +-0.15% | +30.84 | +8.55 | +3.61x | +ipex |
Framework | -version | -model | +Model | Accuracy | -Performance 1s4c10ins1bs/throughput (samples/sec) |
+ Performance throughput (samples/sec) |
+ Example | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
INT8 | FP32 | -Acc Ratio[(INT8-FP32)/FP32] | +Accuracy Ratio[(INT8-FP32)/FP32] | INT8 | FP32 | Performance Ratio[INT8/FP32] | ||||||||||
mxnet | -1.7.0 | -inceptionv3 | -77.80% | -77.65% | -0.20% | -918.73 | -238.90 | -3.85x | -||||||||
mxnet | -1.7.0 | -squeezenet1.0 | -56.80% | -56.97% | --0.28% | -4693.55 | -1272.50 | -3.69x | -||||||||
mxnet | -1.7.0 | -ssd-mobilenet1.0 | -74.94% | -75.54% | --0.79% | -771.65 | -189.81 | -4.07x | -||||||||
mxnet | -1.7.0 | -resnet152_v1 | -78.28% | -78.54% | --0.33% | -574.23 | -126.78 | -4.53x | +AlexNet | +54.74% | +54.79% | +-0.09% | +1518.97 | +676.74 | +2.24x | +qlinearops | +
AlexNet | +54.74% | +54.79% | +-0.09% | +1411.3 | +652.6 | +2.16x | +qdq | +|||||||||
BERT base MRPC DYNAMIC | +85.54% | +86.03% | +-0.57% | +379.71 | +156.16 | +2.43x | +qlinearops | +|||||||||
BERT base MRPC STATIC | +85.29% | +86.03% | +-0.86% | +756.33 | +316.36 | +2.39x | +qlinearops | +|||||||||
BERT SQuAD | +80.44 | +80.67 | +-0.29% | +115.58 | +64.71 | +1.79x | +qlinearops | +|||||||||
BERT SQuAD | +80.44 | +80.67 | +-0.29% | +115.4 | +64.68 | +1.78x | +qdq | +|||||||||
CaffeNet | +56.19% | +56.30% | +-0.20% | +2786.79 | +802.7 | +3.47x | +qlinearops | +|||||||||
CaffeNet | +56.19% | +56.30% | +-0.20% | +2726.86 | +819.41 | +3.33x | +qdq | +|||||||||
DenseNet | +60.20% | +60.96% | +-1.25% | +404.83 | +340.63 | +1.19x | +qlinearops | +|||||||||
DistilBERT base MRPC | +84.56% | +84.56% | +0.00% | +1630.41 | +596.68 | +2.73x | +qlinearops | +|||||||||
EfficientNet | +77.58% | +77.70% | +-0.15% | +1985.35 | +1097.33 | +1.81x | +qlinearops | +|||||||||
Faster R-CNN | +33.99% | +34.37% | +-1.11% | +10.02 | +4.32 | +2.32x | +qlinearops | +|||||||||
Faster R-CNN | +33.94% | +34.37% | +-1.25% | +10.41 | +4.28 | +2.43x | +qdq |
Framework | -version | -model | -Accuracy | -Performance 1s4c10ins1bs/throughput (samples/sec) |
+ FCN | +64.66% | +64.98% | +-0.49% | +44.31 | +14.2 | +3.12x | +qlinearops | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
INT8 | -FP32 | -Acc Ratio[(INT8-FP32)/FP32] | -INT8 | -FP32 | -Performance Ratio[INT8/FP32] | +FCN | +64.66% | +64.98% | +-0.49% | +18.11 | +14.19 | +1.28x | +qdq | |||
onnxrt-runtime | -1.10.0 | -alexnet | -54.74% | -54.79% | --0.09% | -1505.75 | -656.81 | -2.29x | +GoogleNet | +67.61% | +67.79% | +-0.27% | +1165.84 | +810.65 | +1.44x | +qlinearops |
onnxrt-runtime | -1.10.0 | -zfnet | -55.89% | -55.96% | --0.13% | -661.16 | -353.20 | -1.87x | +GoogleNet | +67.61% | +67.79% | +-0.27% | +1165.73 | +809.98 | +1.44x | +qdq |
onnxrt-runtime | -1.10.0 | -efficientnet | -77.58% | -77.70% | --0.15% | -2065.72 | -1094.77 | -1.89x | +Inception V1 | +67.23% | +67.24% | +-0.01% | +1205.89 | +838.71 | +1.44x | +qlinearops |
onnxrt-runtime | -1.10.0 | -squeezenet_qdq | -56.55% | -56.87% | --0.56% | -5965.78 | -4300.12 | -1.39x | +Inception V1 | +67.23% | +67.24% | +-0.01% | +1204.93 | +843.16 | +1.43x | +qdq |
onnxrt-runtime | -1.10.0 | -ssd-12_qdq | -18.38% | -18.98% | --3.16% | -42.24 | -11.12 | -3.80x | +Mask R-CNN | +33.40% | +33.72% | +-0.95% | +8.56 | +3.76 | +2.27x | +qlinearops |
onnxrt-runtime | -1.10.0 | -resnet50_v1_5 | -72.28% | -72.29% | --0.01% | -1166.31 | -554.34 | -2.10x | +Mask R-CNN | +33.33% | +33.72% | +-1.16% | +8.4 | +3.81 | +2.20x | +qdq |
onnxrt-runtime | -1.10.0 | -bert_base_mrpc_static | -85.29% | +Mobile bert MRPC | 86.03% | --0.86% | -766.46 | -315.22 | -2.43x | +86.27% | +-0.28% | +790.11 | +686.35 | +1.15x | +qlinearops | |
onnxrt-runtime | -1.10.0 | -bert_base_mrpc_dynamic | -85.54% | -86.03% | --0.57% | -381.30 | -155.90 | -2.45x | +MobileBERT SQuAD MLPerf | +89.84 | +90.03 | +-0.20% | +102.92 | +95.19 | +1.08x | +qlinearops |
onnxrt-runtime | -1.10.0 | -mobilenet_v2 | +MobileNet V2 | 65.47% | 66.89% | -2.12% | -5128.93 | -3390.19 | +5133.84 | +3394.73 | 1.51x | +qlinearops | ||||
onnxrt-runtime | -1.10.0 | -ssd_mobilenet_v1 | -22.20% | -23.10% | --3.90% | -914.92 | -703.74 | -1.30x | +MobileNet V2 | +65.47% | +66.89% | +-2.12% | +5066.31 | +3386.3 | +1.50x | +qdq |
onnxrt-runtime | -1.10.0 | -ssd_mobilenet_v2 | -23.83% | -24.68% | --3.44% | -718.28 | -501.31 | -1.43x | +MobileNet V3 MLPerf | +75.59% | +75.74% | +-0.20% | +4133.22 | +2132.92 | +1.94x | +qlinearops |
onnxrt-runtime | -1.10.0 | -distilbert_base_mrpc | -84.56% | -84.56% | -0.00% | -1675.94 | -594.27 | -2.82x | +MobileNetV2 (ONNX Model Zoo) | +68.30% | +69.48% | +-1.70% | +5349.42 | +3373.29 | +1.59x | +qlinearops |
onnxrt-runtime | -1.10.0 | -mobilebert_mrpc | -85.54% | -86.27% | --0.85% | -766.00 | -684.30 | -1.12x | +ResNet50 V1.5 MLPerf | +76.13% | +76.46% | +-0.43% | +1139.56 | +549.88 | +2.07x | +qlinearops |
onnxrt-runtime | -1.10.0 | -resnet50-v1-12 | +ResNet50 V1.5 | +72.28% | +72.29% | +-0.01% | +1165.35 | +556.02 | +2.10x | +qlinearops | +||||||
ResNet50 V1.5 | +72.28% | +72.29% | +-0.01% | +1319.32 | +543.44 | +2.43x | +qdq | +|||||||||
ResNet50 V1.5 (ONNX Model Zoo) | 74.76% | 74.99% | -0.31% | -1380.38 | -581.36 | -2.37x | -||||||||||
onnxrt-runtime | -1.10.0 | -resnet_v1_5_mlperf | -76.13% | -76.46% | --0.43% | -1143.13 | -550.77 | -2.08x | +1363.39 | +573.1 | +2.38x | +qlinearops | ||||
onnxrt-runtime | -1.10.0 | -mobilenet_v3_mlperf | -75.59% | -75.74% | --0.20% | -4121.33 | -2135.31 | -1.93x | +Roberta Base MRPC | +90.44% | +89.95% | +0.54% | +811.05 | +312.71 | +2.59x | +qlinearops |
onnxrt-runtime | -1.10.0 | -shufflenet-v2-12 | +ShuffleNet V2 | 66.13% | 66.36% | -0.35% | -4901.74 | -2853.37 | -1.72x | +4948.77 | +2847.66 | +1.74x | +qlinearops | |||
onnxrt-runtime | -1.10.0 | -googlenet-12 | -67.61% | -67.79% | --0.27% | -1030.75 | -805.76 | -1.28x | +SqueezeNet | +56.55% | +56.87% | +-0.56% | +6296.79 | +4340.51 | +1.45x | +qlinearops |
onnxrt-runtime | -1.10.0 | -squeezenet | +SqueezeNet | 56.55% | 56.87% | -0.56% | -6119.01 | -4321.71 | +6227.76 | +4383.8 | 1.42x | +qdq | ||||
onnxrt-runtime | -1.10.0 | -caffenet | -56.19% | -56.30% | --0.20% | -2644.16 | -810.13 | -3.26x | -||||||||
onnxrt-runtime | -1.10.0 | -inception_v1 | -67.23% | -67.24% | --0.01% | -1059.31 | -848.19 | -1.25x | -||||||||
onnxrt-runtime | -1.10.0 | -fcn | -64.66% | -64.98% | --0.49% | -44.48 | -14.23 | -3.13x | +SSD MobileNet V1 | +22.20% | +23.10% | +-3.90% | +917.64 | +709.48 | +1.29x | +qlinearops |
onnxrt-runtime | -1.10.0 | -ssd-12 | -18.84% | -18.98% | --0.74% | -41.98 | -11.11 | -3.78x | +SSD MobileNet V1 | +22.20% | +23.10% | +-3.90% | +840.99 | +655.99 | +1.28x | +qdq |
onnxrt-runtime | -1.10.0 | -ssd_mobilenet_v1-2 | +SSD MobileNet V1 (ONNX Model Zoo) | 22.88% | 23.03% | -0.65% | -836.01 | -652.27 | -1.28x | -|||||||
onnxrt-runtime | -1.10.0 | -faster_rcnn | -33.99% | -34.37% | --1.11% | -9.23 | -4.28 | -2.16x | +845.17 | +666.25 | +1.27x | +qlinearops | ||||
onnxrt-runtime | -1.10.0 | -mobilenetv2-12 | -68.30% | -69.48% | --1.70% | -5314.59 | -3369.52 | -1.58x | +SSD MobileNet V1 (ONNX Model Zoo) | +22.88% | +23.03% | +-0.65% | +790.06 | +624.2 | +1.27x | +qdq |
onnxrt-runtime | -1.10.0 | -mask_rcnn | -33.40% | -33.72% | --0.95% | -7.88 | -3.94 | -2.00x | +SSD MobileNet V2 | +23.83% | +24.68% | +-3.44% | +703.55 | +506.6 | +1.39x | +qlinearops |
onnxrt-runtime | -1.10.0 | -yolov3 | -26.88% | -28.74% | --6.47% | -157.85 | -64.93 | -2.43x | +SSD | +18.68% | +18.98% | +-1.58% | +41.99 | +11.12 | +3.78x | +qdq |
onnxrt-runtime | -1.10.0 | -densenet | -60.20% | -60.96% | --1.25% | -408.55 | -340.82 | -1.20x | +Tiny YOLOv3 | +12.08% | +12.43% | +-2.82% | +836.21 | +659.69 | +1.27x | +qlinearops |
onnxrt-runtime | -1.10.0 | -yolov4 | -30.95% | -32.78% | --5.58% | -53.51 | -28.66 | -1.87x | +VGG16 | +66.60% | +66.69% | +-0.13% | +312.48 | +128.98 | +2.42x | +qlinearops |
onnxrt-runtime | -1.10.0 | -resnet50_v1_5_qdq | +VGG16 (ONNX Model Zoo) | 72.28% | -72.29% | --0.01% | -1271.61 | -543.58 | -2.34x | +72.40% | +-0.17% | +446.13 | +131.04 | +3.40x | +qlinearops | |
onnxrt-runtime | -1.10.0 | -mobilenet_v2_qdq | -65.47% | -66.89% | --2.12% | -5069.54 | -3404.88 | -1.49x | +YOLOv3 | +26.88% | +28.74% | +-6.47% | +157.39 | +66.72 | +2.36x | +qlinearops |
onnxrt-runtime | -1.10.0 | -ssd_mobilenet_v1_qdq | -22.25% | -23.10% | --3.68% | -803.63 | -644.18 | -1.25x | +YOLOv4 | +33.18% | +33.71% | +-1.57% | +58.55 | +38.09 | +1.54x | +qlinearops |
onnxrt-runtime | -1.10.0 | -vgg16 | -66.60% | -66.69% | +ZFNet | +55.89% | +55.96% | -0.13% | -310.23 | -128.81 | -2.41x | +664.37 | +358.62 | +1.85x | +qlinearops | |
onnxrt-runtime | -1.10.0 | -roberta_base_mrpc | -89.22% | -89.95% | --0.81% | -766.66 | -316.24 | -2.42x | +ZFNet | +55.89% | +55.96% | +-0.13% | +666.99 | +354.38 | +1.88x | +qdq |
onnxrt-runtime | -1.10.0 | -bert_squad_model_zoo | -80.43 | -80.67 | --0.29% | -115.78 | -64.69 | -1.79x | +Model | +Accuracy | +Performance throughput (samples/sec) |
||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
onnxrt-runtime | -1.10.0 | -mobilebert_squad_mlperf | -89.84 | -90.02 | --0.20% | -102.82 | -95.17 | -1.08x | +INT8 | +FP32 | +Accuracy Ratio[(INT8-FP32)/FP32] | +INT8 | +FP32 | +Performance Ratio[INT8/FP32] | |
onnxrt-runtime | -1.10.0 | -vgg16_model_zoo | -72.28% | -72.40% | --0.17% | -447.28 | -129.59 | -3.45x | +|||||||
Inception V3 | +77.80% | +77.65% | +0.20% | +920.74 | +276.73 | +3.33x | +|||||||||
MobileNet V1 | +71.60% | +72.23% | +-0.86% | +6585.19 | +2529.21 | +2.60x | +|||||||||
MobileNet V2 | +70.80% | +70.87% | +-0.10% | +5230.32 | +1996.47 | +2.62x | +|||||||||
ResNet V1 152 | +78.28% | +78.54% | +-0.33% | +574.85 | +156.2 | +3.68x | +|||||||||
ResNet50 V1.0 | +75.91% | +76.33% | +-0.55% | +1567.9 | +427.99 | +3.66x | +|||||||||
SqueezeNet | +56.80% | +56.97% | +-0.28% | +4704.51 | +1332.29 | +3.53x | +|||||||||
SSD MobileNet V1 | +74.94% | +75.54% | +-0.79% | +769.26 | +193.03 | +3.99x |
System Configuration | Intel Xeon Platinum 8380 Scalable processor |
---|---|
Test Date | -Sat 30 Apr 2022 UTC | -
Manufacturer | -Intel Corporation | -
Product Name | -M50CYP2SBSTD | -
BIOS Version | -SE5C6200.86B.0022.D64.2105220049 | -
OS | -Ubuntu 20.04.1 LTS | -
Kernel | -5.4.0-42-generic | -
Microcode | -0xd0002b1 | -
CPU Model | -Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz | -
Base Frequency | -2.3GHZ | -
Thread(s) per Core | -2 | -
Core(s) per Socket | -40 | -
Socket(s) | -2 | -
Turbo | -Enabled | -
Power & Perf Policy | -Balanced | -
Installed | -256GB (16x16GB DDR4 3200MT/s [3200MT/s]) | -
NIC Summary | -2x Ethernet Controller 10G X550T | -
Drive Summary | -1x INTEL_SSDSC2KW01 953.9G, -1x CT1000MX500SSD1 931.5G, -1x CT1000MX500SSD1 931.5G - | -
Tasks | -FWK | +Framework | Model | -fp32 baseline | -gradient sensitivity with 20% sparsity | -+onnx dynamic quantization on pruned model | +FP32 Baseline | +Gradient Sensitivity with 20% Sparsity | ++ONNX Dynamic Quantization on Pruned Model | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
accuracy% | -drop% | -perf gain (sample/s) | -accuracy% | -drop% | -perf gain (sample/s) | +Accuracy% | +Drop | +Perf Gain (sample/s) | +Accuracy% | +Drop | +Perf Gain (sample/s) | ||||||||||||||
SST-2 | -pytorch | -bert-base | +PyTorch | +BERT base | accuracy = 92.32 | accuracy = 91.97 | -0.38 | @@ -1715,8 +1520,8 @@ Intel technologies may require enabled hardware, software or service activation.||||||||||||||||||
QQP | -pytorch | -bert-base | +PyTorch | +BERT base | [accuracy, f1] = [91.10, 88.05] | [accuracy, f1] = [89.97, 86.54] | [-1.24, -1.71] | @@ -1732,24 +1537,24 @@ Intel technologies may require enabled hardware, software or service activation.||||||||||||||||||
Tasks | -FWK | +Framework | Model | -fp32 baseline | +FP32 Baseline | Pattern Lock on 70% Unstructured Sparsity | Pattern Lock on 50% 1:2 Structured Sparsity | ||||||||||||||||||
accuracy% | -drop% | -accuracy% | -drop% | +Accuracy% | +Drop | +Accuracy% | +Drop | ||||||||||||||||||
MNLI | -pytorch | -bert-base | +PyTorch | +BERT base | [m, mm] = [84.57, 84.79] | [m, mm] = [82.45, 83.27] | [-2.51, -1.80] | @@ -1758,8 +1563,8 @@ Intel technologies may require enabled hardware, software or service activation.||||||||||||||||||
SST-2 | -pytorch | -bert-base | +PyTorch | +BERT base | accuracy = 92.32 | accuracy = 91.51 | -0.88 | @@ -1768,8 +1573,8 @@ Intel technologies may require enabled hardware, software or service activation.||||||||||||||||||
QQP | -pytorch | -bert-base | +PyTorch | +BERT base | [accuracy, f1] = [91.10, 88.05] | [accuracy, f1] = [90.48, 87.06] | [-0.68, -1.12] | @@ -1778,8 +1583,8 @@ Intel technologies may require enabled hardware, software or service activation.||||||||||||||||||
QNLI | -pytorch | -bert-base | +PyTorch | +BERT base | accuracy = 91.54 | accuracy = 90.39 | -1.26 | @@ -1788,8 +1593,8 @@ Intel technologies may require enabled hardware, software or service activation.||||||||||||||||||
QnA | -pytorch | -bert-base | +PyTorch | +BERT base | [em, f1] = [79.34, 87.10] | [em, f1] = [77.27, 85.75] | [-2.61, -1.54] | @@ -1804,50 +1609,50 @@ Intel technologies may require enabled hardware, software or service activation.||||||||||||||||||
Framework | Model | -fp32 baseline | +FP32 Baseline | Compression | -dataset | -acc(drop)% | +Dataset | +Accuracy% (Drop) | |||||||||||||||||
Pytorch | -resnet18 | +PyTorch | +ResNet18 | 69.76 | -30% sparsity on magnitude | +30% Sparsity on Magnitude | ImageNet | 69.47(-0.42) | |||||||||||||||||
Pytorch | -resnet18 | +PyTorch | +ResNet18 | 69.76 | -30% sparsity on gradient sensitivity | +30% Sparsity on Gradient Sensitivity | ImageNet | 68.85(-1.30) | |||||||||||||||||
Pytorch | -resnet50 | +PyTorch | +ResNet50 | 76.13 | -30% sparsity on magnitude | +30% Sparsity on Magnitude | ImageNet | 76.11(-0.03) | |||||||||||||||||
Pytorch | -resnet50 | +PyTorch | +ResNet50 | 76.13 | -30% sparsity on magnitude and post training quantization | +30% Sparsity on Magnitude and Post Training Quantization | ImageNet | 76.01(-0.16) | |||||||||||||||||
Pytorch | -resnet50 | +PyTorch | +ResNet50 | 76.13 | -30% sparsity on magnitude and quantization aware training | +30% Sparsity on Magnitude and Quantization Aware Training | ImageNet | 75.90(-0.30) | |||||||||||||||||
BlendCnn example | +BlendCNN example | MRPC | -BlendCnn (0.7034) |
+ BlendCNN (0.7034) |
BERT-Base (0.8382) |
0.7034 (0) |
Model | +Domain | +Approach | +Examples | +
---|---|---|---|
ResNet50 V1.0 | +Image Recognition | +Post-Training Static Quantization | +pb | +
ResNet50 V1.5 | +Image Recognition | +Post-Training Static Quantization | +pb | +
ResNet101 | +Image Recognition | +Post-Training Static Quantization | +pb | +
MobileNet V1 | +Image Recognition | +Post-Training Static Quantization | +pb / SavedModel | +
MobileNet V2 | +Image Recognition | +Post-Training Static Quantization | +pb / SavedModel | +
MobileNet V3 | +Image Recognition | +Post-Training Static Quantization | +pb | +
Inception V1 | +Image Recognition | +Post-Training Static Quantization | +pb | +
Inception V2 | +Image Recognition | +Post-Training Static Quantization | +pb | +
Inception V3 | +Image Recognition | +Post-Training Static Quantization | +pb | +
Inception V4 | +Image Recognition | +Post-Training Static Quantization | +pb | +
Inception ResNet V2 | +Image Recognition | +Post-Training Static Quantization | +pb | +
VGG16 | +Image Recognition | +Post-Training Static Quantization | +pb / keras | +
VGG19 | +Image Recognition | +Post-Training Static Quantization | +pb / keras | +
ResNet V2 50 | +Image Recognition | +Post-Training Static Quantization | +pb | +
ResNet V2 101 | +Image Recognition | +Post-Training Static Quantization | +pb | +
ResNet V2 152 | +Image Recognition | +Post-Training Static Quantization | +pb | +
DenseNet121 | +Image Recognition | +Post-Training Static Quantization | +pb | +
DenseNet161 | +Image Recognition | +Post-Training Static Quantization | +pb | +
DenseNet169 | +Image Recognition | +Post-Training Static Quantization | +pb | +
EfficientNet B0 | +Image Recognition | +Post-Training Static Quantization | +ckpt | +
MNIST | +Image Recognition | +Quantization-Aware Training | +keras | +
ResNet50 | +Image Recognition | +Post-Training Static Quantization | +keras | +
ResNet50 Fashion | +Image Recognition | +Post-Training Static Quantization | +keras | +
ResNet V2 | +Image Recognition | +Quantization-Aware Training | +keras | +
EfficientNet V2 B0 | +Image Recognition | +Post-Training Static Quantization | +SavedModel | +
BERT base MRPC | +Natural Language Processing | +Post-Training Static Quantization | +ckpt | +
BERT large SQuAD | +Natural Language Processing | +Post-Training Static Quantization | +pb | +
Transformer LT | +Natural Language Processing | +Post-Training Static Quantization | +pb | +
SSD ResNet50 V1 | +Object Detection | +Post-Training Static Quantization | +pb / ckpt | +
SSD MobileNet V1 | +Object Detection | +Post-Training Static Quantization | +pb / ckpt | +
Faster R-CNN Inception ResNet V2 | +Object Detection | +Post-Training Static Quantization | +pb / SavedModel | +
Faster R-CNN ResNet101 | +Object Detection | +Post-Training Static Quantization | +pb / SavedModel | +
Mask R-CNN Inception V2 | +Object Detection | +Post-Training Static Quantization | +pb / ckpt | +
SSD ResNet34 | +Object Detection | +Post-Training Static Quantization | +pb | +
YOLOv3 | +Object Detection | +Post-Training Static Quantization | +pb | +
Wide & Deep | +Recommendation | +Post-Training Static Quantization | +pb | +
Arbitrary Style Transfer | +Style Transfer | +Post-Training Static Quantization | +ckpt | +
Model | +Domain | +Pruning Type | +Approach | +Examples | +
---|---|---|---|---|
Inception V3 | +Image Recognition | +Unstructured | +Magnitude | +pb | +
ResNet V2 | +Image Recognition | +Unstructured | +Magnitude | +pb | +
ViT | +Image Recognition | +Unstructured | +Magnitude | +ckpt | +
Student Model | +Teacher Model | +Domain | +Examples | +
---|---|---|---|
MobileNet | +DenseNet201 | +Image Recognition | +pb | +
Model | +Domain | +Approach | +Examples | +
---|---|---|---|
ResNet18 | +Image Recognition | +Post-Training Static Quantization | +eager / fx | +
ResNet18 | +Image Recognition | +Quantization-Aware Training | +eager / fx | +
ResNet50 | +Image Recognition | +Post-Training Static Quantization | +eager / ipex | +
ResNet50 | +Image Recognition | +Quantization-Aware Training | +eager | +
ResNeXt101_32x16d_wsl | +Image Recognition | +Post-Training Static Quantization | +ipex | +
ResNeXt101_32x8d | +Image Recognition | +Post-Training Static Quantization | +eager | +
Se_ResNeXt50_32x4d | +Image Recognition | +Post-Training Static Quantization | +eager | +
Inception V3 | +Image Recognition | +Post-Training Static Quantization | +eager | +
MobileNet V2 | +Image Recognition | +Post-Training Static Quantization | +eager | +
PeleeNet | +Image Recognition | +Post-Training Static Quantization | +eager | +
ResNeSt50 | +Image Recognition | +Post-Training Static Quantization | +eager | +
3D-UNet | +Image Recognition | +Post-Training Static Quantization | +eager | +
SSD ResNet34 | +Object Detection | +Post-Training Static Quantization | +fx / ipex | +
Mask R-CNN | +Object Detection | +Post-Training Static Quantization | +fx | +
YOLOv3 | +Object Detection | +Post-Training Static Quantization | +eager | +
DLRM | +Recommendation | +Post-Training Static Quantization | +eager / ipex / fx | +
RNN-T | +Speech Recognition | +Post-Training Dynamic / Static Quantization | +eager / ipex | +
Wav2Vec2 | +Speech Recognition | +Post-Training Dynamic Quantization | +eager | +
HuBERT | +Speech Recognition | +Post-Training Dynamic Quantization | +eager | +
BlendCNN | +Natural Language Processing | +Post-Training Static Quantization | +eager | +
bert-large-uncased-whole-word-masking-finetuned-squad | +Natural Language Processing | +Post-Training Static Quantization | +fx / ipex | +
t5-small | +Natural Language Processing | +Post-Training Dynamic Quantization | +eager | +
Helsinki-NLP/opus-mt-en-ro | +Natural Language Processing | +Post-Training Dynamic Quantization | +eager | +
lvwerra/pegasus-samsum | +Natural Language Processing | +Post-Training Dynamic Quantization | +eager | +
Model | +Domain | +Pruning Type | +Approach | +Examples | +
---|---|---|---|---|
ResNet18 | +Image Recognition | +Unstructured | +Magnitude | +eager | +
ResNet34 | +Image Recognition | +Unstructured | +Magnitude | +eager | +
ResNet50 | +Image Recognition | +Unstructured | +Magnitude | +eager | +
ResNet101 | +Image Recognition | +Unstructured | +Magnitude | +eager | +
BERT large | +Natural Language Processing | +Structured | +Group Lasso | +eager | +
Intel/bert-base-uncased-sparse-70-unstructured | +Natural Language Processing (question-answering) | +Unstructured | +Pattern Lock | +eager | +
bert-base-uncased | +Natural Language Processing | +Structured | +Gradient Sensitivity | +eager | +
DistilBERT | +Natural Language Processing | +Unstructured | +Magnitude | +eager | +
Intel/bert-base-uncased-sparse-70-unstructured | +Natural Language Processing (text-classification) | +Unstructured | +Pattern Lock | +eager | +
Student Model | +Teacher Model | +Domain | +Examples | +
---|---|---|---|
CNN-2 | +CNN-10 | +Image Recognition | +eager | +
MobileNet V2-0.35 | +WideResNet40-2 | +Image Recognition | +eager | +
ResNet18|ResNet34|ResNet50|ResNet101 | +ResNet18|ResNet34|ResNet50|ResNet101 | +Image Recognition | +eager | +
VGG-8 | +VGG-13 | +Image Recognition | +eager | +
BlendCNN | +BERT base | +Natural Language Processing | +eager | +
distilbert-base-uncased | +csarron/bert-base-uncased-squad-v1 | +Natural Language Processing | +eager | +
BiLSTM | +textattack/roberta-base-SST-2 | +Natural Language Processing | +eager | +
huawei-noah/TinyBERT_General_4L_312D | +blackbird/bert-base-uncased-MNLI-v1 | +Natural Language Processing | +eager | +
nreimers | +textattack/bert-base-uncased-QQP | +Natural Language Processing | +eager | +
distilroberta-base | +howey/roberta-large-cola | +Natural Language Processing | +eager | +
Model | +Domain | +Approach | +Examples | +
---|---|---|---|
ResNet50 | +Image Recognition | +Multi-shot: Pruning and PTQ |
+ link | +
ResNet50 | +Image Recognition | +One-shot: QAT during Pruning |
+ link | +
Intel/bert-base-uncased-sparse-90-unstructured-pruneofa | +Natural Language Processing (question-answering) | +One-shot: Pruning, Distillation and QAT |
+ link | +
Intel/bert-base-uncased-sparse-90-unstructured-pruneofa | +Natural Language Processing (text-classification) | +One-shot: Pruning, Distillation and QAT |
+ link | +
Model | +Domain | +Approach | +Examples | +
---|---|---|---|
ResNet50 V1.5 | +Image Recognition | +Post-Training Static Quantization | +qlinearops / qdq | +
ResNet50 V1.5 MLPerf | +Image Recognition | +Post-Training Static Quantization | +qlinearops / qdq | +
VGG16 | +Image Recognition | +Post-Training Static Quantization | +qlinearops / qdq | +
MobileNet V2 | +Image Recognition | +Post-Training Static Quantization | +qlinearops / qdq | +
MobileNet V3 MLPerf | +Image Recognition | +Post-Training Static Quantization | +qlinearops / qdq | +
AlexNet | +Image Recognition | +Post-Training Static Quantization | +qlinearops / qdq | +
CaffeNet | +Image Recognition | +Post-Training Static Quantization | +qlinearops / qdq | +
DenseNet | +Image Recognition | +Post-Training Static Quantization | +qlinearops | +
EfficientNet | +Image Recognition | +Post-Training Static Quantization | +qlinearops / qdq | +
FCN | +Image Recognition | +Post-Training Static Quantization | +qlinearops / qdq | +
GoogleNet | +Image Recognition | +Post-Training Static Quantization | +qlinearops / qdq | +
Inception V1 | +Image Recognition | +Post-Training Static Quantization | +qlinearops / qdq | +
MNIST | +Image Recognition | +Post-Training Static Quantization | +qlinearops | +
MobileNet V2 (ONNX Model Zoo) | +Image Recognition | +Post-Training Static Quantization | +qlinearops / qdq | +
ResNet50 V1.5 (ONNX Model Zoo) | +Image Recognition | +Post-Training Static Quantization | +qlinearops / qdq | +
ShuffleNet V2 | +Image Recognition | +Post-Training Static Quantization | +qlinearops / qdq | +
SqueezeNet | +Image Recognition | +Post-Training Static Quantization | +qlinearops / qdq | +
VGG16 (ONNX Model Zoo) | +Image Recognition | +Post-Training Static Quantization | +qlinearops / qdq | +
ZFNet | +Image Recognition | +Post-Training Static Quantization | +qlinearops / qdq | +
BERT base MRPC | +Natural Language Processing | +Post-Training Static Quantization | +integerops / qdq | +
BERT base MRPC | +Natural Language Processing | +Post-Training Dynamic Quantization | +integerops | +
DistilBERT base MRPC | +Natural Language Processing | +Post-Training Dynamic / Static Quantization | +integerops / qdq | +
Mobile bert MRPC | +Natural Language Processing | +Post-Training Dynamic / Static Quantization | +integerops / qdq | +
Roberta base MRPC | +Natural Language Processing | +Post-Training Dynamic / Static Quantization | +integerops / qdq | +
BERT SQuAD | +Natural Language Processing | +Post-Training Dynamic / Static Quantization | +integerops / qdq | +
GPT2 lm head WikiText | +Natural Language Processing | +Post-Training Dynamic Quantization | +integerops | +
MobileBERT SQuAD MLPerf | +Natural Language Processing | +Post-Training Dynamic / Static Quantization | +integerops / qdq | +
SSD MobileNet V1 | +Object Detection | +Post-Training Static Quantization | +qlinearops / qdq | +
SSD MobileNet V2 | +Object Detection | +Post-Training Static Quantization | +qlinearops / qdq | +
SSD MobileNet V1 (ONNX Model Zoo) | +Object Detection | +Post-Training Static Quantization | +qlinearops / qdq | +
DUC | +Object Detection | +Post-Training Static Quantization | +qlinearops | +
Faster R-CNN | +Object Detection | +Post-Training Static Quantization | +qlinearops / qdq | +
Mask R-CNN | +Object Detection | +Post-Training Static Quantization | +qlinearops / qdq | +
SSD | +Object Detection | +Post-Training Static Quantization | +qlinearops / qdq | +
Tiny YOLOv3 | +Object Detection | +Post-Training Static Quantization | +qlinearops | +
YOLOv3 | +Object Detection | +Post-Training Static Quantization | +qlinearops | +
YOLOv4 | +Object Detection | +Post-Training Static Quantization | +qlinearops | +