3. AlexNet

Learnable Parameters

For practical implementation, consider this example: using 5 filters of size 4×4×3 results in 4 × 4 × 3 × 5 = 240 learnable weights total for that convolutional layer
Q Why Stack Convolutions?
A Stacking convolutions creates a hierarchical feature detection system:

Single layer: Detects basic edges
Multiple layers: Detect patterns of patterns
Hierarchy progression: Pixels → Edges → Textures → Parts → Objects

This stacking enables progression from simple features to complex object recognition.

Pooling Layers

Pooling primarily reduces spatial size and computational requirements. MaxPooling, the most common type, takes the maximum value within a window, typically using 2×2 max-pooling with stride 2
Benefits of Pooling:

Down samples feature maps to reduce dimensions
Introduces spatial invariance for position tolerance
Reduces parameters for computational efficiency

AlexNet

Performance Achievement
AlexNet won ImageNet 2012 by an enormous margin:

AlexNet error rate: 15.3% (top-5)
Second-best model: 26.2% error rate
Improvement: Approximately 11% better than the nearest competitor

Architecture:
The complete parameter breakdown shows the network’s complexity:

Conv1:     34,944 parameters
Conv2:    614,656 parameters  
Conv3:    885,120 parameters
Conv4:  1,327,488 parameters
Conv5:    884,992 parameters
FC1:   37,752,832 parameters
FC2:   16,781,312 parameters
FC3:    4,097,000 parameters
TOTAL: 62,378,344 parameters

ReLU activation replaced sigmoid/tanh for faster training
Dropout regularization reduced overfitting significantly
Overlapping max pooling improved feature extraction
Data augmentation through random cropping, mirroring, and rotation

Training Revolution: AlexNet pioneered GPU utilization for neural network training, achieving dramatically faster speeds than CPU training and making deep network training practically feasible
AlexNet’s success caused several immediate shifts in the field:

Mainstream adoption of deep learning (from research topic to standard practice)
Obsolescence of traditional methods (outperformed SVMs and random forests)
Computer vision paradigm shift (CNNs became the default approach)

Sadiq's Knowledge Vaults

Explorer

Learnable Parameters

Pooling Layers

AlexNet

Graph View