Skip to content

AnouarITI/FPGA-based-DNN-Accels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

FPGA-based-DNN-Accels

Title LUT/ALM DSP Year Platform Frequency (MHz) Throughput (GOPs) Power (W) Energie Efficiency (GOPs/W) Model W_precision A_precison
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks 186251 2240 2015 VC707 100 61.62 18.61 3.31112 AlexNet FP-32 FP-32
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks -- -- 2016 P395-D8 120 117.8 19.1 6.16754 VGG-16 INT-8 INT-16
Automatic Code Generation of Convolutional Neural Networks in FPGA Implementation -- -- 2016 VC709 100 222.1 24.8 8.95565 AlexNet INT-8 INT-16
Going Deeper wth Embedded FPGA Platform for Convolutional Neural Network 182616 780 2016 ZC706 150 136.97 9.63 14.2233 VGG-16 INT-16 INT-16
A High Performance FPGA-based Accelerator for Large-Scale Convolutional Neural Network -- -- 2016 VC709 156 565.94 30.2 18.7397 AlexNet INT-16 INT-16
Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster -- -- 2016 ZC706+6*VC709 150 1280.3 160 8.00188 VGG-16 INT-16 INT-16
Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster -- -- 2016 ZC706+4*VC709 150 825.6 126 6.55238 AlexNet INT-16 INT-16
Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster -- -- 2016 ZC706+4*VC709 150 128.8 126 1.02222 AlexNet INT-16 INT-16
Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster -- -- 2016 ZC706+VC709 150 290 35 8.28571 VGG-16 INT-16 INT-16
Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster -- -- 2016 ZC706+VC709 150 203.9 35 5.82571 VGG-16 INT-16 INT-16
CirCNN:Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices -- -- 2017 Cyclone V 5CEA9 100 400 0.44 909.091 AlexNet INT-16 INT-16
F-C3D: FPGA-based 3-Dimensional Convolutional Neural Network -- -- 2017 ZC706 176 144.5 9.7 14.8969 C3D INT-16 INT-16
A Fully Connected Layer Elimination for a Binarized Convolutional Network on an FPGA -- -- 2017 Zedboard 143 329.47 2.3 143.248 VGG11 Binary Binary
Accelerating Low Bit-Width Convolutional Neural Networks With Embedded FPGA -- -- 2017 Zynq XC7Z020 200 410.22 2.26 181.513 DoReFa-Net Binary INT-2
FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates 42349 1036 2017 Stratix-V GSMD5 150 364.36 25 14.5744 VGG-19 INT-16 INT-16
FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates 164100 264 2017 Stratix-V GSMD5 150 81 25 3.24 VGG-19 FxP-32 FxP-32
FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates 42349 1036 2017 Stratix-V GSMD5 150 315.85 25 12.634 LSTM-LM INT-16 INT-16
FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates 164100 264 2017 Stratix-V GSMD5 150 86 25 3.44 LSTM-LM FxP-32 FxP-32
FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates 42349 1036 2017 Stratix-V GSMD5 150 226.47 25 9.0588 ResNet-152 INT-16 INT-16
FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates 164100 264 2017 Stratix-V GSMD5 150 73 25 2.92 ResNet-152 FxP-32 FxP-32
Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs 600000 2520 2017 ZCU102 200 2940.7 23.6 124.606 VGG-16 INT-16 INT-16
Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs 600000 2520 2017 ZCU102 200 854.6 23.6 36.2119 AlexNet INT-16 INT-16
Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks 161000 1518 2017 Arria 10 GX1150 150 645.25 21.2 30.4363 VGG-16 INT-8 INT-16
Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network -- 2756 2017 Arria 10 GX1150 385 1790 37.46 47.7843 VGG-16 INT-16 INT-16
Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network -- 1320 2017 Arria 10 GX1150 370 866 41.73 20.7525 VGG-16 FxP-32 FxP-32
ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA 293920 1504 2017 KU060 200 282.2 41 6.88293 LSTM INT-12 INT-16
Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs 155886 824 2017 ZC706 100 229.5 9.4 24.4149 VGG-16 INT-16 INT-16
An OpenCL Deep Learning Accelerator on Arria 10 246000 1476 2017 Arria 10 GX1150 303 1382 45 30.7111 AlexNet FxP-16 FxP-16
Fast and Efficient Implementation of Convolutional Neural Networks on FPGA 196370 256 2017 E5-2600+Stratix V 200 229 8.04 28.4826 VGG-16 INT-32 INT-32
Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System 200522 224 2017 E5-2600+Stratix V 200 123.5 13.18 9.37026 VGG-16 FP-32 FP-32
Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System 200522 224 2017 E5-2600+Stratix V 200 83 13.18 6.29742 AlexNet FP-32 FP-32
FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks 198280 1176 2017 VC707 150 7.26 19.63 0.369842 LSTM-RNN FP-32 FP-32
Maximizing CNN accelerator efficiency through resource partitioning 133854 3494 2017 Xilinx Virtex-7 FPGA 690T 170 909.7 7.2 126.347 SqueezeNet INT-16 INT-16
Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs 46900 3 2017 Zynq-7000 XC7Z020 143 207.8 4.7 44.2128 BNN Binary Binary
A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks 342126 1096 2017 Virtex 7 90 7663 8.2 934.512 BNN Binary Binary
Algorithm-Hardware Co-Design of Single Shot Detector for Fast Object Detection on FPGAs 175000 1518 2018 Arria 10 GX1150 240 1032 40 25.8 SSD300 INT-8 INT-16
Algorithm-Hardware Co-Design of Single Shot Detector for Fast Object Detection on FPGAs 532000 4363 2018 Arria 10 GX2800 300 2178 100 21.78 SSD300 INT-8 INT-16
VIBNN: Hardware Acceleration of Bayesian Neural Networks 98006 342 2018 Cyclone V -- 127.8 6.1 20.9508 MNIST (RLF-Based) Binary Binary
VIBNN: Hardware Acceleration of Bayesian Neural Networks 91126 342 2018 Cyclone V -- 127.8 8.52 15 MNIST (BNNWallace) Binary Binary
FBNA: A Fully Binarized Neural Network Accelerator 29600 0 2018 ZC702 -- 2236 3.2 698.75 SVHN Binary Binary
FBNA: A Fully Binarized Neural Network Accelerator 29600 0 2018 ZC702 -- 722 3.3 218.788 CIFAR-10 Binary Binary
RNA: An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks 203000 0 2018 ZC706 150 687.78 10.56 65.1307 AlexNet INT-4 INT-8
RNA: An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks 203000 0 2018 ZC706 150 878.11 10.56 83.1544 VGG-16 INT-4 INT-8
RNA: An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks 203000 0 2018 ZC706 150 804.03 10.56 76.1392 ResNet-50 INT-4 INT-8
A Novel Low-Communication Energy Efficient Reconfigurable CNN Acceleration Architecture for Embedded Systems 156859 612 2018 ZC706 150 1249.7 9.82 127.261 VGG-16 INT-8 INT-8
A Novel Low-Communication Energy Efficient Reconfigurable CNN Acceleration Architecture for Embedded Systems 156859 612 2018 ZC706 150 685.6 9.76 70.2459 AlexNet INT-8 INT-8
A Novel Low-Communication Energy Efficient Reconfigurable CNN Acceleration Architecture for Embedded Systems 156859 612 2018 ZC706 150 507.2 9.72 52.1811 ResNet-50 INT-8 INT-8
A Design Flow of Accelerating Hybrid Extremely Low Bit-width Neural Network in Embedded FPGA 105673 880 2018 ZC706 200 1972 4.2 469.524 AlexNet hybrid int INT-4
A Design Flow of Accelerating Hybrid Extremely Low Bit-width Neural Network in Embedded FPGA 103505 550 2018 ZC706 200 1233 4.1 300.732 AlexNet hybrid int INT-8
A Design Flow of Accelerating Hybrid Extremely Low Bit-width Neural Network in Embedded FPGA 124317 783 2018 ZC706 200 2530 4.8 527.083 AlexNet hybrid int INT-4
Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA 242000 1536 2018 VC709 150 430.7 25 17.228 C3D INT-16 INT-16
Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA 209000 1536 2018 VUS440 200 784.7 26 30.1808 C3D INT-16 INT-16
Angel-Eye: A Complete Design Flow for Mapping CNN onto Embedded FPGA 29867 190 2018 XC7z020 214 84.3 3.5 24.0857 VGG-16 INT-8 INT-8
Angel-Eye: A Complete Design Flow for Mapping CNN onto Embedded FPGA 85172 900 2018 XC7z045 150 137 9.63 14.2264 VGG-16 INT-16 INT-16
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs -- -- 2018 Arria 10 GX 1150 275 7000 75 93.3333 ResNet-34 1x-wide FP-32 FP-32
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs -- -- 2018 Arria 10 GX 1150 275 8000 75 106.667 ResNet-34 1x-wide INT-8 INT-8
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs -- -- 2018 Arria 10 GX 1150 275 43000 75 573.333 ResNet-34 1x-wide Ternary INT-8
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs -- -- 2018 Arria 10 GX 1150 275 52000 75 693.333 ResNet-34 1x-wide Binary INT-8
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs -- -- 2018 Arria 10 GX 1150 275 18000 75 240 ResNet-34 1x-wide INT-4 INT-4
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs -- -- 2018 Arria 10 GX 1150 275 51000 75 680 ResNet-34 1x-wide INT-3 INT-3
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs -- -- 2018 Arria 10 GX 1150 275 85000 75 1133.33 ResNet-34 1x-wide INT-2 INT-2
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs -- -- 2018 Arria 10 GX 1150 275 98000 75 1306.67 ResNet-34 1x-wide Ternary INT-2
Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs -- -- 2018 Arria 10 GX 1150 275 267000 75 3560 ResNet-34 1x-wide Binary Binary
An Asynchronous Energy-Efficient CNN Accelerator with Reconfigurable Architecture -- -- 2018 VC707 -- 20.3 0.676 30.0296 LeNet-5 INT-16 INT-16
DeltaRNN: A Power-efficient Recurrent Neural Network Accelerator 277440 2020 2018 Zynq-7100 XC7Z100 125 1198 7.3 164.11 GRU-RNN INT-16 INT-16
Accelerator Design with Effective Resource Utilization for Binary Convolutional Neural Networks on an FPGA 61000 0 2018 XCVU190 240 3756 5.9 636.61 BNN Binary Binary
A PYNQ-based Framework for Rapid CNN Prototyping -- -- 2018 XC7Z020 -- 2.56 1.896 1.35021 CNN INT-8 INT-8
Shortcut Mining: Exploiting Cross-layer Shortcut Reuse in DCNN Accelerators 261096 2800 2019 Virtex-7 485T 150 608.28 21.64 28.1091 ResNet-152 INT-16 INT-16
E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs 496958 3435 2019 ADM-PCIE-7V3 200 34529 24 1438.71 LSTM on TIMIT (FFT8) INT-12 INT-12
E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs 411714 2866 2019 ADM-PCIE-7V3 200 54943 25 2197.72 LSTM on TIMIT (FFT16) INT-12 INT-12
Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs 24130 37 2019 Ultra96 250 47.09 5.5 8.56182 DiracDeltaNet INT-4 INT-4
REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs 637671 3456 2019 ADM-PCIE-7V3 200 1967 21 93.6667 YOLO tiny v2 FxP-6 FxP-6
Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity 289000 1518 2019 Arria 10 GX1150 200 2432.8 19.1 127.372 LSTM INT-16 INT-16
Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs 1512810 5349 2019 VCU118 214 1828.61 49.25 37.1291 VGG16 INT-16 INT-16
Automatic Compiler Based FPGA Accelerator for CNN Training 208000 1699 2019 Stratix 10 GX 240 163 20.64 7.89729 CIFAR-10; '1X'CNN INT-16 INT-16
Automatic Compiler Based FPGA Accelerator for CNN Training 415000 3363 2019 Stratix 10 GX 240 282 32.83 8.5897 CIFAR-10; '2X'CNN INT-16 INT-16
Automatic Compiler Based FPGA Accelerator for CNN Training 720000 5760 2019 Stratix 10 GX 240 479 50.47 9.49079 CIFAR-10; '4X'CNN INT-16 INT-16
Towards an Efficient Accelerator for DNN-based Remote Sensing Image Segmentation on FPGAs 170906 1665 2019 Intel Arria10 660 200 1578 32 49.3125 U-Net INT-8 INT-8
An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs 132344 364 2019 ZCU102 200 291 23.6 12.3305 ResNet INT-16 INT-16
FPGA-Based Sparsity-Aware CNN Accelerator for Noise-Resilient Edge-Level Image Recognition -- -- 2019 Intels Stratix-V 100 57.6 2.03 28.3744 VGG-16 int13 int13
Zac: Towards Automatic Optimization and Deployment of Quantized Deep Neural Networks on Embedded Devices 122500 793 2019 ZC706 166 167.58 6.08 27.5625 VGG16 INT-8 INT-16
Zac: Towards Automatic Optimization and Deployment of Quantized Deep Neural Networks on Embedded Devices 55500 84 2019 ZC706 200 405.82 5.6 72.4679 DoReFa-Net Binary INT-2
Zac: Towards Automatic Optimization and Deployment of Quantized Deep Neural Networks on Embedded Devices 51300 31 2019 ZC706 200 441.95 4.88 90.5635 XNOR-Net Binary Binary
Zac: Towards Automatic Optimization and Deployment of Quantized Deep Neural Networks on Embedded Devices 100200 818 2019 ZC706 200 124.9 7.31 17.0862 ResNet-18 INT-8 INT-8
Sparse Winograd Convolutional Neural Networks on Small-scale Systolic Arrays 241202 768 2019 V-Ultra XCVU095 150 460.8 8.24 55.9223 VGG16 INT-8 INT-8
Sparse Winograd Convolutional Neural Networks on Small-scale Systolic Arrays 241202 768 2019 V-Ultra XCVU095 150 230.4 4.12 55.9223 VGG16 INT-16 INT-16
Sparse Winograd Convolutional Neural Networks on Small-scale Systolic Arrays 241202 768 2019 V-Ultra XCVU095 150 921.6 16.49 55.8884 VGG16 INT-8 INT-8
A Fine-Grained Sparse Accelerator for Multi-Precision DNN -- -- 2019 Xilinx XCKU115 200 574.2 13.42 42.7869 CNN INT-4 INT-4
A Fine-Grained Sparse Accelerator for Multi-Precision DNN -- -- 2019 Xilinx XCKU115 200 110.4 13.39 8.24496 RNN INT-4 INT-4
A Fine-Grained Sparse Accelerator for Multi-Precision DNN -- -- 2019 Xilinx XCKU115 200 571.1 13.41 42.5876 CNN+RNN INT-4 INT-4
InS-DLA: An In-SSD Deep Learning Accelerator for Near-Data Processing 93232 0 2019 Zynq XC7Z045 100 44.8 9.621 4.65648 CNN INT-8 INT-8
A 307-fps 351.7-GOPs/W Deep Learning FPGA Accelerator for Real-Time Scene Text Recognition -- -- 2019 Virtex Ultrascale+ 100 11973 34.04 351.733 BSEG Binary Binary
A High Energy-Efficiency FPGA-Based LSTM Accelerator Architecture Design by Structured Pruning and Normalized Linear Quantization -- -- 2019 Arria 10 150 2220 1.679 1322.22 LSTM INT-4 INT-8
A 112-765 FPGA-based CNN Accelerator using Importance Map Guided Adaptive Activation Sparsification for Pix2pix Applications -- -- 2020 Zynq XC7Z035 100 2525 3.3 765.152 SRResNet INT-16 INT-16
NeuroMAX: A High Throughput, Multi-Threaded, Log-Based Accelerator for Convolutional Neural Networks 20600 -- 2020 Zynq7020 SoC 200 324 2.72 119.118 VGG16 FP-32 FP-32
When massive GPU parallelism ain\u2019t enough: A Novel Hardware Architecture of 2D-LSTM Neural Network 191449 440 2020 ZCU102 300 5255.66 13.2 398.156 2D-LSTM Binary Binary
When massive GPU parallelism ain\u2019t enough: A Novel Hardware Architecture of 2D-LSTM Neural Network 93324 234 2020 ZCU102 240 3071.79 15.47 198.564 2D-LSTM INT-4 INT-8
Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks 173522 704 2020 Xilinx XC7K325T 200 295.68 16.518 17.9005 MobileNetV1 INT-8 INT-8
Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks 173522 704 2020 Xilinx XC7K325T 200 197.12 17.14 11.5006 MobileNetV2 INT-8 INT-8
Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks 173522 704 2020 Xilinx XC7K325T 200 168.96 17.07 9.89807 MobileNetV3-Large INT-8 INT-8
Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks 173522 704 2020 Xilinx XC7K325T 200 352 6.7 52.5373 DenseNet-161 INT-8 INT-8
Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks 173522 704 2020 Xilinx XC7K325T 200 267.52 16.9 15.8296 SqueezeNetV1.1 INT-8 INT-8
End-to-End Optimization of Deep Learning Applications 1111980 3420 2020 VCU1525 242.9 117 10 11.7 OpenPose-V2 FP-32 FP-32
FTDL: An FPGA-tailored Architecture for Deep Learning Applications -- -- 2020 UltraScale 650 1272.22 46.1 27.5969 GoogLeNet, ResNet50 INT-16 INT-16
High-Throughput Convolutional Neural Network on an FPGA by Customized JPEG Compression 274795 2370 2020 VirtexUS+XCVU9P 300 2419.2 75 32.256 CNN Binary INT-8
Optimizing Reconfigurable Recurrent Neural Networks 487232 4368 2020 Stratix10 GX2800 260 8015 62.13 129.004 LSTM INT-8 INT-8
A High Throughput MobileNetV2 FPGA Implementation Based on a Flexible Architecture for Depthwise Separable Convolution 145000 1220 2020 Arria 10 200 693 34 20.3824 MobileNet-V2 INT-16 INT-16
A Reconfigurable Multithreaded Accelerator for Recurrent Neural Network 522852 4368 2020 Stratix 10 2800 260 7810 125 62.48 LSTM INT-8 INT-8
Memory-Efficient Dataflow Inference Acceleration for Deep CNNs on FPGA 1027000 1611 2020 Alveo U250 195 18300 71 257.746 ResNet-50 Binary INT-2
FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations 50656 224 2021 ZYNQ ZU3EG 250 702 6.1 115.082 ReActNet (ImageNet) Binary Binary
FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations 51444 126 2021 ZYNQ ZU3EG 250 401 4.1 97.8049 ReActNet (CIFAR-10) Binary Binary
Optimized FPGA-based Deep Learning Accelerator for Sparse CNN using High Bandwidth Memory 334000 1442 2021 Intel Stratix 10 MX2100 257 980.344 79.98 12.2574 MobileNet FxP-16 FxP-16
Optimized FPGA-based Deep Learning Accelerator for Sparse CNN using High Bandwidth Memory 334000 1442 2021 Intel Stratix 10 MX2100 257 5071.24 79.99 63.3985 ResNet-50 FxP-16 FxP-16
ESCA: Event-Based Split-CNN Architecture with Data-Level Parallelism on UltraScale+ FPGA (short) 469288 2100 2021 Virtex UltraScale+ xcvu9p 320 49.92 10.68 4.67416 VGG16 INT-14 INT-14
3D-VNPU_A Flexible Accelerator for 2D/3D CNNs on FPGA (short) -- 1024 2021 Xilinx ZCU102 200 1353 10.2 132.647 C3D INT-8 INT-8
3D-VNPU_A Flexible Accelerator for 2D/3D CNNs on FPGA (short) -- 1024 2021 Xilinx ZCU102 200 1150 10.2 112.745 VGG16 INT-8 INT-8
3D-VNPU_A Flexible Accelerator for 2D/3D CNNs on FPGA (short) -- -- 2021 Xilinx ZCU102 200 1210 10.2 118.627 3D RESNET-18 INT-8 INT-8
Eciton: Very Low-Power LSTM Neural Network Accelerator for Predictive Maintenance at the Edge 4987 6 2021 Lattice iCE40 UP5K 17 0.067 0.017 3.94118 LSTM INT-8 INT-8
FixyFPGA: Efficient FPGA Accelerator for Deep Neural Networks with High Element-Wise Sparsity and without External Memory Access 1078800 1730 2021 Stratix 10 GX 10M FPGA 169.2 3990 28.06 142.195 MobileNet-V1 (1.0) INT-4 INT-4
FixyFPGA: Efficient FPGA Accelerator for Deep Neural Networks with High Element-Wise Sparsity and without External Memory Access 993800 1730 2021 Stratix 10 GX 10M FPGA 196.89 2650 27.41 96.68 MobileNet-V1 (0.75) INT-4 INT-4
FixyFPGA: Efficient FPGA Accelerator for Deep Neural Networks with High Element-Wise Sparsity and without External Memory Access 804500 1730 2021 Stratix 10 GX 10M FPGA 200.76 1240 27.08 45.7903 MobileNet-V1 (0.5) INT-4 INT-4
An FPGA-based MobileNet Accelerator Considering Network Structure Characteristics 308449 2160 2021 Xilinx Virtex-7 XC7V690t 150 181.8 11.35 16.0176 MobileNet INT-8 INT-8
Leveraging Fine-grained Structured Sparsity for CNN Inference on Systolic Array Architectures 336000 1352 2021 Intel Arria 10 GX1150 242 1662 27.8 59.7842 VGG-16 INT-8 INT-8
Leveraging Fine-grained Structured Sparsity for CNN Inference on Systolic Array Architectures 336000 1352 2021 Intel Arria 10 GX1150 242 495 22.6 21.9027 ResNet-50 INT-8 INT-8

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published