11. NasNet

NASNet (2017) by Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le from Google Brain. The first major CNN architecture designed by artificial intelligence, not humans. Using Neural Architecture Search (NAS) with reinforcement learning, Google trained an RNN controller to discover optimal building blocks (cells) on CIFAR-10, then transferred them to ImageNet. NASNet achieved 82.7% top-1 accuracy on ImageNet, surpassing all hand-designed models while using 28% fewer FLOPs than previous state-of-the-art
By 2017, human-designed architectures (VGG, ResNet, Inception, DenseNet) had plateaued. Engineers spent months tweaking layer configurations for marginal gains. Google Brain asked: “Can a neural network design better neural networks than humans?” The answer was NASNet, proving that AutoML could outperform human expertise in architecture engineering

Architecture

Traditional Process (Human-Designed)

Expert proposes architecture based on intuition
Train on dataset (weeks of GPU time)
Evaluate performance
Manually tweak layers, connections, hyperparameters
Repeat until marginal improvements stop
Result: Time-consuming, biased by human assumptions, limited exploration

NAS Solution (AI-Designed)

RNN controller generates thousands of candidate architectures
Each candidate trained and evaluated automatically
Reinforcement learning updates controller based on validation accuracy
Controller learns to propose better architectures over time
Result: Explores architectural space far beyond human creativity

How NAS Works
It has three Core Components

Search Space (what architectures are possible)
- Defines layer types: conv, pooling, separable conv, identity
- Connection patterns between layers
- Number of filters, kernel sizes, skip connections
Search Strategy (how to explore the space)
- Reinforcement Learning: RNN controller generates architectures
- Reward signal: Validation accuracy on target dataset
- Policy gradient: Updates controller to favor high-accuracy designs
Performance Evaluation (how to rank candidates)
- Train each generated architecture to convergence
- Measure validation accuracy as reward
- Feed reward back to controller for learning

Step 1: Controller RNN samples architecture A from search space
Step 2: Train architecture A on CIFAR-10 dataset
Step 3: Evaluate A on validation set → accuracy R
Step 4: Use R as reward to update controller via policy gradient
Step 5: Repeat Steps 1-4 for thousands of iterations
Step 6: Select best-performing architecture as final model

Computational Cost: Original NAS required 800 GPUs for 28 days to find optimal architecture. NASNet improved this to 500 GPUs for 4 days

The Transferability Problem

Training full architectures on ImageNet is prohibitively expensive (each candidate = days of training)
Solution: Search Small, Transfer Large

Search on CIFAR-10 (small 32×32 images, fast to train)
Discover optimal cell structures (not full networks)
Transfer cells to ImageNet by stacking more copies
Each cell has independent parameters when stacked

Insight: Good architectural building blocks (cells) generalize across datasets they are of two types

Normal Cell
- Purpose: Feature processing while maintaining spatial dimensions
- Input and output have same height and width
- Stacked repeatedly to increase network depth
- Example: 224×224 → 224×224
Reduction Cell
- Purpose: Downsampling to reduce spatial dimensions
- Output height/width = Input height/width ÷ 2
- Applied at specific positions (similar to pooling layers)
- Example: 224×224 → 112×112

Input (e.g., 331×331×3)
    ↓
Stem Convolutions (initial processing)
    ↓
N × Normal Cell
    ↓
Reduction Cell (downsample)
    ↓
N × Normal Cell
    ↓
Reduction Cell (downsample)
    ↓
N × Normal Cell
    ↓
Global Average Pooling
    ↓
Fully Connected + Softmax

N = number of cell repetitions (more N = deeper network)
Each cell is a directed acyclic graph (DAG) with 5 blocks, each performing the following steps:

Step 1: Select hidden state h_i from previous layers or current cell
Step 2: Select second hidden state h_j from previous layers or current cell
Step 3: Choose operation to apply to h_i:
- 3×3 depthwise separable conv
- 5×5 depthwise separable conv
- 3×3 average pooling
- 3×3 max pooling
- Identity (skip connection)
- Others
Step 4: Choose operation to apply to h_j (same options as Step 3)
Step 5: Combine outputs via:
- Element-wise addition
- Concatenation along channel dimension

Result: Each block produces a new hidden state that becomes input for subsequent blocks
Cell Output:
All unused hidden states at the end are concatenated along the channel dimension to form final cell output

NASNet Variants: Mobile to Large

Model	Input Size	Parameters	FLOPs	Top-1 Acc	Top-5 Acc	Use Case
NASNet-Mobile	224×224	5.3M	564M	74.0%	91.6%	Mobile devices
NASNet-A	224×224	—	—	82.7%	96.2%	Balanced
NASNet-Large	331×331	88.9M	23.8B	82.7%	96.2%	High accuracy

ScheduledDropPath: Critical Regularization

NASNet cells have many parallel paths connecting layers. Without regularization, models overfit badly

Standard DropPath: Randomly drop entire paths during training with fixed probability
ScheduledDropPath (NASNet Innovation): Drop paths with probability that linearly increases during training:

drop_prob(epoch) = 0.0 at start → 0.7 at end

Why it works:

Early training: Keep all paths (learn diverse features)
Late training: Aggressively drop paths (force ensemble-like regularization)
Result: Significantly better generalization on ImageNet

Without ScheduledDropPath: NASNet overfits
With ScheduledDropPath: State-of-the-art accuracy

Key Architectural Details

Search Space Specification

Operations: 13 different ops (various convs, poolings, identity)
Connections: Each block connects to any previous hidden state
Search scope: Normal cell structure + Reduction cell structure
Fixed: Macro-architecture (how cells stack), only cell internals searched

Cell Stacking Strategy

NASNet-Mobile: N=4 cells per stage, F=44 initial filters
NASNet-Large: N=6 cells per stage, F=168 initial filters

Where N = repetitions, F = base filter count

Summary

NASNet proved three critical points:

Automated architecture search outperforms human experts when given sufficient compute
Transferable cells unlock practical NAS (search small, deploy large)
Reinforcement learning effectively explores architectural search spaces

The Trade-off:

Pro: State-of-the-art accuracy with less human effort
Con: Enormous computational cost (500 GPUs × 4 days)
Resolution: Sparked research into efficient NAS (ENAS, DARTS), making AutoML practical

Sadiq's Knowledge Vaults

Explorer