NASNet (2017) by Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le from Google Brain. The first major CNN architecture designed by artificial intelligence, not humans. Using Neural Architecture Search (NAS) with reinforcement learning, Google trained an RNN controller to discover optimal building blocks (cells) on CIFAR-10, then transferred them to ImageNet. NASNet achieved 82.7% top-1 accuracy on ImageNet, surpassing all hand-designed models while using 28% fewer FLOPs than previous state-of-the-art
By 2017, human-designed architectures (VGG, ResNet, Inception, DenseNet) had plateaued. Engineers spent months tweaking layer configurations for marginal gains. Google Brain asked: “Can a neural network design better neural networks than humans?” The answer was NASNet, proving that AutoML could outperform human expertise in architecture engineering
Architecture
Traditional Process (Human-Designed)
- Expert proposes architecture based on intuition
- Train on dataset (weeks of GPU time)
- Evaluate performance
- Manually tweak layers, connections, hyperparameters
- Repeat until marginal improvements stop
- Result: Time-consuming, biased by human assumptions, limited exploration
NAS Solution (AI-Designed)
- RNN controller generates thousands of candidate architectures
- Each candidate trained and evaluated automatically
- Reinforcement learning updates controller based on validation accuracy
- Controller learns to propose better architectures over time
- Result: Explores architectural space far beyond human creativity
How NAS Works
It has three Core Components
- Search Space (what architectures are possible)
- Defines layer types: conv, pooling, separable conv, identity
- Connection patterns between layers
- Number of filters, kernel sizes, skip connections
- Search Strategy (how to explore the space)
- Reinforcement Learning: RNN controller generates architectures
- Reward signal: Validation accuracy on target dataset
- Policy gradient: Updates controller to favor high-accuracy designs
- Performance Evaluation (how to rank candidates)
- Train each generated architecture to convergence
- Measure validation accuracy as reward
- Feed reward back to controller for learning
Step 1: Controller RNN samples architecture A from search space
Step 2: Train architecture A on CIFAR-10 dataset
Step 3: Evaluate A on validation set → accuracy R
Step 4: Use R as reward to update controller via policy gradient
Step 5: Repeat Steps 1-4 for thousands of iterations
Step 6: Select best-performing architecture as final modelComputational Cost: Original NAS required 800 GPUs for 28 days to find optimal architecture. NASNet improved this to 500 GPUs for 4 days
The Transferability Problem
Training full architectures on ImageNet is prohibitively expensive (each candidate = days of training)
Solution: Search Small, Transfer Large
- Search on CIFAR-10 (small 32×32 images, fast to train)
- Discover optimal cell structures (not full networks)
- Transfer cells to ImageNet by stacking more copies
- Each cell has independent parameters when stacked
Insight: Good architectural building blocks (cells) generalize across datasets they are of two types
- Normal Cell
- Purpose: Feature processing while maintaining spatial dimensions
- Input and output have same height and width
- Stacked repeatedly to increase network depth
- Example: 224×224 → 224×224
- Reduction Cell
- Purpose: Downsampling to reduce spatial dimensions
- Output height/width = Input height/width ÷ 2
- Applied at specific positions (similar to pooling layers)
- Example: 224×224 → 112×112
Input (e.g., 331×331×3)
↓
Stem Convolutions (initial processing)
↓
N × Normal Cell
↓
Reduction Cell (downsample)
↓
N × Normal Cell
↓
Reduction Cell (downsample)
↓
N × Normal Cell
↓
Global Average Pooling
↓
Fully Connected + SoftmaxN = number of cell repetitions (more N = deeper network)
Each cell is a directed acyclic graph (DAG) with 5 blocks, each performing the following steps:
- Step 1: Select hidden state h_i from previous layers or current cell
- Step 2: Select second hidden state h_j from previous layers or current cell
- Step 3: Choose operation to apply to h_i:
- 3×3 depthwise separable conv
- 5×5 depthwise separable conv
- 3×3 average pooling
- 3×3 max pooling
- Identity (skip connection)
- Others
- Step 4: Choose operation to apply to h_j (same options as Step 3)
- Step 5: Combine outputs via:
- Element-wise addition
- Concatenation along channel dimension
Result: Each block produces a new hidden state that becomes input for subsequent blocks
Cell Output:
All unused hidden states at the end are concatenated along the channel dimension to form final cell output
NASNet Variants: Mobile to Large
| Model | Input Size | Parameters | FLOPs | Top-1 Acc | Top-5 Acc | Use Case |
|---|---|---|---|---|---|---|
| NASNet-Mobile | 224×224 | 5.3M | 564M | 74.0% | 91.6% | Mobile devices |
| NASNet-A | 224×224 | — | — | 82.7% | 96.2% | Balanced |
| NASNet-Large | 331×331 | 88.9M | 23.8B | 82.7% | 96.2% | High accuracy |
ScheduledDropPath: Critical Regularization
NASNet cells have many parallel paths connecting layers. Without regularization, models overfit badly
- Standard DropPath: Randomly drop entire paths during training with fixed probability
- ScheduledDropPath (NASNet Innovation): Drop paths with probability that linearly increases during training:
drop_prob(epoch) = 0.0 at start → 0.7 at endWhy it works:
- Early training: Keep all paths (learn diverse features)
- Late training: Aggressively drop paths (force ensemble-like regularization)
- Result: Significantly better generalization on ImageNet
Without ScheduledDropPath: NASNet overfits
With ScheduledDropPath: State-of-the-art accuracy
Key Architectural Details
Search Space Specification
- Operations: 13 different ops (various convs, poolings, identity)
- Connections: Each block connects to any previous hidden state
- Search scope: Normal cell structure + Reduction cell structure
- Fixed: Macro-architecture (how cells stack), only cell internals searched
Cell Stacking Strategy
NASNet-Mobile: N=4 cells per stage, F=44 initial filters
NASNet-Large: N=6 cells per stage, F=168 initial filtersWhere N = repetitions, F = base filter count
Summary
NASNet proved three critical points:
- Automated architecture search outperforms human experts when given sufficient compute
- Transferable cells unlock practical NAS (search small, deploy large)
- Reinforcement learning effectively explores architectural search spaces
The Trade-off:
- Pro: State-of-the-art accuracy with less human effort
- Con: Enormous computational cost (500 GPUs × 4 days)
- Resolution: Sparked research into efficient NAS (ENAS, DARTS), making AutoML practical