For a long time, deep neural networks have played an important role in practical problems such as image classification and text recognition. However, considering computing resources and time, deep neural network architectures are often costly. This time, Google researchers proposed a new method of automated neural network architecture, MorphNet, which saves resources and improves performance by iteratively scaling neural networks.
Deep neural networks (DNN) have shown excellent performance in solving practical problems such as image classification, text recognition and speech transcription. However, designing a suitable DNN architecture for a given problem is still a challenging task. Considering the huge architectural search space, designing a network from scratch for a specific application is extremely expensive in terms of computing resources and time. Methods such as Neural Architecture Search (NAS) and AdaNet use machine learning to search the architecture design space to find suitable improved architectures. Another method is to use the existing architecture to solve similar problems, that is, to optimize the architecture once for the task at hand.
Google researchers have proposed a complex method MorphNet that improves the neural network model. The researchers published the paper “MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks”. MorphNet uses existing neural networks as input to generate new neural networks with smaller scale, faster speed and better performance for new problems. Researchers have used this method to solve large-scale problems, designing a product service network with a smaller scale and higher accuracy. At present, MorphNet’s TensoreFlow implementation is open source, and you can use this method to create your own models more efficiently.
MorphNet open source project address: https://github.com/google-research/morph-net
How MorphNet works
MorphNet optimizes the neural network through the cycle of contraction and expansion phases. In the contraction phase, MorphNet uses sparsifying regularizers to identify inefficient neurons and remove them from the network. Therefore, the total loss function of the network includes the cost of each neuron. But for all neurons, MorphNet does not use a unified cost measurement, but calculates the cost of the neuron relative to the target resource. As the training continues, the optimizer knows the resource cost information when calculating the gradient, so that it knows which neurons have high resource efficiency and which neurons can be removed.
For example, consider how MorphNet calculates the computational cost of neural networks (such as FLOPs). For simplicity, let’s think about the neural network layer represented as matrix multiplication. In this case, the neural network layer has 2 inputs (x_n), 6 weights (a, b,…, f) and 3 outputs (y_n). Using the method of multiplying rows and columns in standard textbooks, you will find that 6 multiplications are required to evaluate the neural network layer.
The computational cost of the neuron.
MorphNet expresses its computational cost as the product of the number of inputs and the number of outputs. Note that although the example on the left shows the sparsity of the weights, where two weights are 0, we still need to perform all the multiplications to evaluate the neural network layer. However, the middle example shows structural sparseness, where all the weight values on the last row of neuron y_n are all 0. MorphNet realizes that the new output number of this layer is 2, and the number of multiplications of this layer is reduced from 6 to 4. Based on this, MorphNet can determine the incremental cost of each neuron in the neural network to generate a more efficient model (example on the right), in which neuron y_3 is removed.
In the expansion phase, researchers use a width multiplier to uniformly expand the size of all layers. For example, if the layer size is expanded by 50%, an inefficient layer (starting with 100 neurons and then shrinking to 10 neurons) will be able to expand back to 15, while an important layer that only shrinks to 80 neurons may expand To 120, and have more resources. The net effect is to reallocate computing resources from the inefficient parts of the network to more useful parts.
Users can stop MorphNet after the contraction phase, thereby reducing the size of the network and making it in line with a more compact resource budget. This can result in a more efficient network in terms of target cost, but it can sometimes result in a decrease in accuracy. Alternatively, the user can also complete the expansion phase, which will match the initial target resources, but the accuracy will be higher.
Why use MorphNet?
MorphNet can provide the following four key values:
Targeted regularization: The regularization method adopted by MorphNet is more purposeful than other sparsity regularization methods. Specifically, the MorphNet method is used for better sparsity, but its goal is to reduce resources (such as FLOPs per inference or model size). This allows better control of the network structure derived by MorphNet, which varies significantly depending on the application domain and constraints.
For example, the left figure below shows the ResNet-101 baseline network trained on the JFT dataset. In the case of specifying target FLOPs (FLOPs reduced by 40%, middle image) or model size (weight reduced by 43%, right image), the structure of MorphNet output is very different. When optimizing the computational cost, the high-resolution neurons in the lower layers will be trimmed more than the low-resolution neurons in the higher layers of the network. When the target is a smaller model size, the pruning strategy is the opposite.
MorphNet has Targeted Regularization. The width of the rectangle is proportional to the number of channels in the hierarchy, and the purple bar at the bottom represents the input layer. Left: input to MorphNet’s baseline network; middle: output result after applying FLOP regularizer; right: output result after applying size regularizer.
MorphNet can target specific optimization parameters, which makes it possible to set specific parameter targets for specific implementations. For example, you can use “delay” as the primary optimization parameter for integrating device-specific calculation time and memory time.
Topology Morphing: MorphNet learns neurons in each layer, so this algorithm may encounter special cases where all neurons in a layer are sparse. When the number of neurons in a layer is 0, it cuts off the affected network branches, thereby effectively changing the network topology. For example, in the ResNet architecture, MorphNet may retain the residual connection, but remove the residual module (as shown on the left in the figure below). For the Inception structure, MorphNet may remove the entire parallel branch (as shown on the right in the figure below).
Left: MorphNet removes the residual module in the ResNet network. Right: MorphNet removes the parallel branch in the Inception network.
Scalability: MorphNet learns a new network structure in a single training run. This is a great method when your training budget is limited. MorphNet can also be used directly on expensive networks and data sets. For example, in the above comparison, MorphNet was directly used in ResNet-101, which was trained on the JFT dataset with extremely high computational cost.
Portability: The networks output by MorphNet are portable because they can be trained from scratch, and the model weights are not tied to the architecture learning process. You don’t have to copy checkpoints or perform training according to a specific training script, just train the new network normally.
Google applies MorphNet to the Inception V2 model trained on the ImageNet dataset by fixing FLOPs (see the figure below for details). The baseline method uniformly reduces the output of each convolution, and uses the width multiplier to weigh the accuracy and FLOPs (red). The MorphNet method directly fixes FLOPs when shrinking the model to generate a better trade-off curve. With the same accuracy, the FLOP cost of the new method is 11%-15% lower than the baseline.
The performance of applying MorphNet to the Inception V2 model trained on the ImageNet dataset. The performance using only the flop regularizer (blue) is 11-15% higher than the baseline (red) performance. After a complete cycle (including flop regularizer and width multiplier), the accuracy of the model is improved (“x1”, purple) at the same cost, and after the second cycle, the model performance continues to improve (“x2”, blue).
At this time, you can choose a MorphNet network to meet a smaller FLOP budget. Alternatively, you can extend the network back to the original FLOP cost to complete the scaling cycle, resulting in better accuracy (purple) at the same cost. Repeating MorphNet zoom in/out again will increase the accuracy rate (cyan) again, increasing the overall accuracy rate by 1.1%.
Google has applied MorphNet to many of its production-level image processing models. MorphNet can bring about a significant reduction in model size/FLOPs with almost no quality loss.
Paper: MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep
Link to the paper: https://arxiv.org/pdf/1711.06798.pdf
Abstract: This research introduces MorphNet, a new method of automatic neural network structure design. MorphNet iteratively scales the network. Specifically, it shrinks the network through the resource-weighted sparsity regularization term on the activation function, and enlarges the network by implementing a unified multiplicative factor on all layers. . MorphNet can be extended to large networks, is adaptable to specific resource constraints (such as FLOPs per inference), and can improve network performance. When MorphNet is applied to a standard network architecture trained on a large number of data sets, this method can discover new structures in each field and improve the performance of the network under limited resource conditions.