(Incomplete) Review of AAAI 2020 for Model Compression

(Incomplete) Review of AAAI Model Compression

Pruning

  • Liu 2020 AutoCompress: Targeted to solving the challenges in tuning the compression ratios for a CNN model, authors provided an integrated solution including structural (filter/channel/filter shape) pruning, L0 pruning, ADMM optimization, and evolutionary search for hyperparameter tuning. Results are shown on VGG/ResNet with CIFAR10/ImageNet(top-5).

  • Ren 2020 DARB: Targeted to prune RNN models or fully connected layers in a structured approach, by leveraging on the insights from "the diverse redundancy of different rows, the sensitivity of different rows to pruning, and the position characteristics of retained weights".

  • Wang 2020 Pruning: Tested the idea to prune from scratch using network slimming technique (scalar gate). They showed promising results on VGG/ResNet(CIFAR10) and ResNet/MobileNet(ImageNet).

Quantization

  • Shen 2020 Q-BERT: Quantized BERT model with mixed-precision driven by the spectrum of Hessian matrices for each layers. The Hessian is calculated by matrix-free power of iteration method, avoiding the challenge of computing 2nd-order derivate. Furthermore, it adopted structured quantization for finer control within layers.

  • Li 2020 RTN: Presented a reparameterization schema for ternary quantization, which improves the squashing range and training efficiency. Furthermore, they demonstrated the energy and size reduction for inference on FPGA.

  • Gennari 2020 DSConv: Proposed a new quantization structure for Convolution layers consisted of Variable Quantized Kernel (VQK) and Kernel Distribution Shift (KDS) to replace a block of convolution layer. Quantizing both weights and activations, the method only lost 1\% accuracy with 4-bit quantization, and restored accuracy by using distillation without labeled data.

Low-Rank

  • Liu 2020 Layerwise: Reduced the extra space of sparse weight matrix by using layerwise sparse coding (LSC) and Signed Relative Index (SRI). The proposed method also provides faster inference speed comparing to vanilla sparse encoding or bitmasking. The method is only verified on highly sparsed ADMM-LeNet.

  • Zhang 2020 High Evaluated the performance of depth-wise and point-wise convolutions on ARM devices and proposed a new algorithm to overcome the problems of "lots of cache misses under multi-core and poor data reuse at register level" and lead to 2~5x speed up comparing to TVM.

Distillation

  • Bai 2020 Few: Proposed a novel soft cross distillation to improve model compression in the few-shot setting.

NAS

  • Zhang 2020 AutoShrink: Efficiently explore and exploit during search for node and edge as network architecture iteratively.

  • Hang 2020 Towards: Combined Oracle Knowledge Distillation with Neural Architecture Search to find the best student model under resource constraint. Also, the teacher model is an ensemble of student nets.

  • Chen 2020 Binarized: Extended the PC-DARTS to search binarized CNN. They improved the search speed and optimization by channel sampling and operation space reduction.

Other

  • Xu 2020 Light: Proposed a new activation function that improves compact model to learn better from teacher models.

  • Fuhl 2020 Training: Replaced convolution layer with decision tree model for better inference speed. They also provided a formulation to train decision trees along with back-propagation.

  • Lee 2020 URNet: Designed a new conditional block with a controllable scaling factor to tune model complexity dynamically.

  • Kim 2020 Plug-in: Proposed a trainable gating function that can be integrated into existing neural networks for pruning.

  • Yang 2020 Gated: Proposed a new light-weight architecture to improve the efficiency of DenseNet by using Hybrid Connectivity and Gated mechanism.

  • Huang 2020 DWM: Proposed a new decomposite to generalize Winograd convolution beyond 3x3 and stride of 1 with 2x better performance and small precision loss.

Application

Comments