Edge computing structure

Practical Deep Learning Techniques with Mobile/Edge Computing

Deep learning illustrates outstanding performance in many machine learning tasks, especially in Computer Vision related tasks with large-scale datasets. The prevailing Deep Neural Network (DNN) models are usually highly demanding on computation resources and time. Therefore, traditionally the DL computations are centralized on cloud servers which provide high computing resources.

Top-1 accuracies of models submitted to the ImageNet challenge¹
Edge computing structure

Paper 1: Quantized Convolutional Neural Networks for Mobile Devices (Model Compression)


The first paper is Quantized Convolutional Neural Networks for Mobile Devices by Wu et al in 2016². It focuses on the method to shrink DNN size to speed up test-phase inference time, and the method applies a unified quantization method to both fully connected and convolutional layers.


The paper presents a quantized test-phase computation process of Convolutional Neural Networks (CNN). The quantization method is unified over both the convolutional layers and fully connected layers.

The parameter quantization and test-phase computation process of the fully connected layer
  1. Learn a sub-codebook for each subspace using the k-means method. For each subspace, optimize the following equation. The sub-codebook D(m) contains K sub-codewords, and each column in B(m) is an indicator vector (only one non-zero entry).


The proposed Quantized-CNN framework can improve the efficiency of the whole CNN models of the following four prevailing models to 4–6× speed-up with minor performance loss of 0.5% — 1.6%, as shown in the table.

The speed-up/compression rates and the increase of top 1/5 error rates for the whole CNN model.
Comparison on the efficiency and classification accuracy between the original and quantized AlexNet and CNN-S on a Huawei R Mate 7 smartphone


  • The paper proposed a unified framework to simultaneously accelerate and compress CNNs
  • It reaches 4 ∼ 6× speed-up and 15 ∼ 20× compressions with merely one percent loss of classification accuracy in experiments

Paper 2: When Deep Learning Meets Edge Computing (Edge Server Pre-processing)


The second paper is When Deep Learning Meets Edge Computing published in 2017 by Huang et al¹. Despite its concise length, it’s one of the earliest papers to adopt a complete edge computing framework to Deep Learning Models. Therefore, it is a great paper to introduce the readers to Deep Learning models with Edge computing. The paper proposes to deploy models at edge servers to provide timely service to end-users along with pre-processing data to reduce dimension at edge servers.

The edge learning framework


In the paper, the proposed Edge Learning framework is composed of three main parts:

  1. Edge server: preliminary processing. The edge servers perform the preliminary process to reduce data dimensionality and reduce data noise.
  2. Cluster: execute DNN models.


With 60,000 training data from the MNIST dataset, the learning accuracy can reach 90% and the running time, as well as data transfer traffic, can be significantly reduced.

The learning performance under the different size of training data


  • The paper is one of the earliest papers to address the deep learning cloud computing challenge with edge computing solutions.
  • It Establishes a framework to distribute a deep learning pipeline between cloud, edge, and devices.

Observation and Insights

The two papers introduced a focus on compressing CNN models and edge server pre-processing respectively. For the Edge/Cloud partitioning and Device/Edge partitioning, I recommend reading my teammate’s medium blog post link.


[1] Huang, Yutao, et al. “When deep learning meets edge computing.” 2017 IEEE 25th international conference on network protocols (ICNP). IEEE, 2017.