Deep neural networks have a huge advantage: they replace “feature engineering,” a difficult and arduous part of the classic machine learning cycle, with an end-to-end process that automatically learns to extract features.

However, finding the right deep learning architecture for your application can be difficult. There are many ways to structure and configure a neural network, using different types and sizes of layers, activation functions, and operations. Each architecture has its strengths and weaknesses. And depending on the application and environment in which you want to deploy your neural networks, you may have special requirements, such as memory and compute constraints.

The classic way to find a suitable deep learning architecture is to start with a model that looks promising, gradually modify it, and train it until you find the best configuration. However, this can be time consuming, given the many configurations and the time each training and testing cycle can take.

An alternative to manual design is “neural architecture search” (NAS), a series of machine learning techniques that can help discover the optimal neural networks for a given problem. Neural architecture research is a broad area of research and holds great promise for future applications of deep learning.

Search spaces for deep learning

Although there are many techniques for researching neural architecture, most of them have a few things in common. In a paper titled “Neural Architecture Search: A Survey,” researchers from Bosch and the University of Freiburg break down NAS into three elements: search space, search strategy, and performance estimation strategy.

The first part of a NAS strategy is to define the search space of the target neural network. The basic component of any deep learning model is the neural layer. You can determine the number and type of layers to explore. For example, you might want your deep learning model to be composed of a range of convolutional (CNN) and fully connected layers. You can also determine layer configurations, such as number of units or, in the case of CNNs, core size, number of filters, and stride. Other elements can be included in the search space such as functions and activation operations (pooling layers, drop layers, etc.).

The more items you add to the search space, the more versatility you get. But naturally, additional degrees of freedom expand the search space and increase the costs of finding the optimal deep learning architecture.

More advanced architectures usually have multiple branches of layers and other elements. For example, ResNet, a popular image recognition model, uses jump connections, where the output of one layer is provided not only to the next layer, but also to layers further down the stream. These types of architectures are harder to explore with NAS because they have more moving parts.

One of the techniques that help reduce the complexity of the search space while maintaining the complexity of the neural network architecture is the use of “cells”. In this case, the NAS algorithm can optimize small blocks separately and then use them in combination. For example, VGGNet, another famous image recognition network, is made up of repeating blocks consisting of a convolution layer, an activation function, and a clustering layer. The NAS algorithm can optimize the block separately and then find the best block configuration in a large network.

## Search strategy

Even basic search spaces usually require a lot of trial and error to find the optimal deep learning architecture. Therefore, a neural architecture search algorithm also needs a “search strategy”. The search strategy determines how the NAS algorithm experiments with different neural networks.

The most basic strategy is “random search,” in which the NAS algorithm randomly selects a neural network from the search space, trains and validates it, saves the results, and moves on to the next one. Random search is extremely expensive because the NAS algorithm works its way through the search space, wasting expensive resources testing solutions that can be eliminated with simpler methods. Depending on the complexity of the search space, the random search can take days or weeks of GPU time to check every possible neural network architecture.

There are other techniques that speed up the research process. An example is Bayesian optimization, which starts with random choices and gradually adjusts its search direction as it gathers information about the performance of different architectures.

Another strategy is to frame the search for neural architecture as a reinforcement learning problem. In this case, the environment of the RL agent is the search space, the actions are the different configurations of the neural network and the reward is the performance of the network. The reinforcement learning agent starts out with random changes, but over time it learns to choose configurations that produce better improvements in neural network performance.

Other search strategies include evolutionary algorithms and Monte Carlo tree search. Every research strategy has its strengths and weaknesses, and engineers must strike the right balance between “explore and exploit,” which essentially means testing entirely new architectures or tweaking those that have shown promise so far.

## Performance Estimation Strategy

As the NAS algorithm traverses the search space, it must train and validate deep learning models to compare their performance and choose the optimal neural network. Obviously, doing a full training on each neural network takes a lot of time and requires very large computational resources.

To reduce the cost of evaluating deep learning models, NAS algorithm engineers use “proxy metrics” that can be measured without requiring extensive training of the neural network.

For example, they can train their models for fewer epochs, on a smaller data set, or on lower resolution data. Although the resulting deep learning model will not reach its full potential, these lower fidelity training regimes provide a baseline against which to compare different models at lower cost. Once the set of architectures has been selected for a few promising neural networks, the NAS algorithm can perform further model training and testing.

Another way to reduce performance estimation costs is to initialize new models on the weights of previously trained models. Known as transfer learning, this practice results in much faster convergence, meaning the deep learning model will require fewer training epochs. Transfer learning is applicable when the source and destination models have compatible architectures.

## A work in progress

Neural architecture research still has challenges to overcome, such as providing explanations for why some architectures are better than others and dealing with complex applications that go beyond simple image classification.

NAS is nevertheless a very useful and attractive field for the deep learning community and can have great applications for both academic research and applied machine learning.

Sometimes technologies like NAS are portrayed as an artificial intelligence that creates its own AI, making humans redundant and bringing us towards the singularity of AI. But in reality, the NAS is a perfect example of how humans and contemporary AI systems can work together to solve complex problems. Humans use their intuition, knowledge, and common sense to find interesting problem spaces and define their boundaries and intended outcome. NAS algorithms, on the other hand, are very efficient problem solvers that can search the solution space and find the best neural network architecture for the intended application.

*This article was originally published by Ben Dickson on TechTalks, a publication that examines trends in technology, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new technologies, and what we need to watch out for. You can read the original article here.*