Review of CVPR 2019 for Model Compression

Jiayi (Jason) Liu

2020-01-19 11:00

Source

CVPR 2019 Compression Review¶

Workshop¶

Pruning¶

Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression: proposes a new pruning criteria - Kernel Sparsity and Entropy (KSE), which combines the similarity between kernels and information carries from the feature maps. Instead of pruning, they use K-mean clustering to fuse kernels to maintain accuracy.
Compressing Convolutional Neural Networks via Factorized Convolutional Filters: adds additional binary variables to each filter to prune filters in a two-step training procedure. Normal backpropagation for weights update, and ADMM-based update for such binary variables.
Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration: points out that pruning by weight (weight norm) is not necessarily effective, and a better solution is to prune by weight similarity. Based on that, the proposed method prune filters close to the geometric median of filters.
ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model: provides an energy-aware model pruning that relies on modeling energy by regression (on sparsity) and updates weights and sparsity via back-propagation.
Towards Optimal Structured CNN Pruning via Generative Adversarial Learning: combines three major methodologies to prune CNN efficiently. First, cast the pruning problem into three levels - block (skip connection), branch (NAS-like), channel. Second, use distillation to transfer the teacher's knowledge. Third, use adversarial learning to further improve accuracy.
Variational Convolutional Neural Network Pruning: uses the scaling term in the batch normalization as the saliency indicator and models it via Bayesian probability for pruning.

Quantization¶

Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network using Truncated Gaussian Approximation: uses Gaussian approximation to quantize model weights into the ternary format. With a gradient correction on the STE method, their approach further improves the model training with quantized weights.
Regularizing Activation Distribution for Training Binarized Deep Networks: adds three regularizing terms (degeneration, saturation, gradient mismatch) on the activation loss to correct the gradient update for the binary neural network.
HAQ: Hardware-Aware Automated Quantization With Mixed Precision: treats the quantization bits as neural architecture hyperparameters, it applies NAS to search the best quantization strategy with RL.
Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss: parameterizes the quantization interval and learns the quantization via training.
Importance Estimation for Neural Network Pruning: proposes a new importance measure based on the first-order Taylor expansion and it works for filters by grouping weights together.
Quantization Networks: multiple sigmoid functions to approximate the quantization mapping and the thresholds are learned with the training procedure.
Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure: clusters filters into groups and adds an intra-cluster similarity as a penalty for model training (centripetal).

Low-Rank Decomposition¶

Efficient Neural Network Compression: automate the search for the best ranks for matrix decomposition using SVD for each layer.
Building Efficient Deep Neural Networks with Unitary Group Convolutions: reveals the relation between shufflenet (group convolution) and circulant convolution. Using Hadamard transform, they propose an efficient UGconv block to replace the group conv block.
Cascaded Projection: End-to-End Network Compression and Acceleration: proposes to reduce the reconstruction error on the feature maps rather than its linear outputs before the activations. Direct training by general SGD with both classification loss and reconstruction loss is not stable, therefore, they improve it by training odd and even layers iteratively.

Architecture¶

SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization: uses a quantized version to quickly predict the sparsity of the activation map and therefore reduces the computation required for the original dense layer. Similar to another work More is less: A More Complicated Network with Less Inference Complexity. It requires a reimplementation of the network to achieve the speed-up.
MnasNet: Platform-Aware Neural Architecture Search for Mobile: applies NAS for mobile applications, i.e. faster inference with light weight models. It contains a few novelties in the searching procedure: 1) search differently for blocks, 2) early stopping for network candidates, 3) configurable reward combining accuracy and latency for RL agent. The output model architecture is efficient for mobile phones (single-core CPU).
ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network: proposes a new efficient CNN architecture using group convolution and dilated convolution.
FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search: uses measured latency for loss to search for the optimal network architecture. The NAS is treated as to search subnet from a supernet with stochastic sampling with probabilities.
ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation: applies NAS with energy consumption and latency. To speed up the search, they use Gaussian Process to model energy consumption and accuracy. The latency is modeled as a sum of operations in a measured look-up table. The search space is limited to a few parameters associated with the backbone architecture.

Application¶

DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation: provides efficient architecture targets to deliver real-time semantic segmentation. It relies on aggregate features to avoid computation overhead.
NETTAILOR: Tuning the architecture, not just the weights: presents an effective method to distill and build student networks based on using low-fidelity layers to replace current layers.
Efficient Video Classification Using Fewer Frames: designs a distillation strategy to teach a student network uses fewer frames to reproduce the prediction from a teacher network.
AdaFrame: Adaptive Frame Selection for Fast Video Recognition
Fully Quantized Network for Object Detection: provides a detailed guide to fully quantize mobilenet for object detection.
Fast Human Pose Estimation: applies distillation to train light-weight student network (hourglass) for the human pose estimation task.
Deep Virtual Networks for Memory Efficient Inference of Multiple Tasks: presents a weight-sharing schema to design multi-task DNN with shared layers. The schema assumes that neighbor tasks are similar and therefore layers from other tasks can be shared in layer sequence.
Knowledge Adaptation for Efficient Semantic Segmentation: presents an effective solution to distill knowledge to train efficient student network for the semantic segmentation task.

CVPR 2019 Compression Review¶

Workshop¶

Pruning¶

Quantization¶

Low-Rank Decomposition¶

Architecture¶

Application¶

Comments