Learnable Parameters
For practical implementation, consider this example: using 5 filters of size 4×4×3 results in 4 × 4 × 3 × 5 = 240 learnable weights total for that convolutional layer
Q Why Stack Convolutions?
A Stacking convolutions creates a hierarchical feature detection system:
- Single layer: Detects basic edges
- Multiple layers: Detect patterns of patterns
- Hierarchy progression: Pixels → Edges → Textures → Parts → Objects
This stacking enables progression from simple features to complex object recognition.
Pooling Layers
Pooling primarily reduces spatial size and computational requirements. MaxPooling, the most common type, takes the maximum value within a window, typically using 2×2 max-pooling with stride 2
Benefits of Pooling:
- Down samples feature maps to reduce dimensions
- Introduces spatial invariance for position tolerance
- Reduces parameters for computational efficiency
AlexNet
Performance Achievement
AlexNet won ImageNet 2012 by an enormous margin:
- AlexNet error rate: 15.3% (top-5)
- Second-best model: 26.2% error rate
- Improvement: Approximately 11% better than the nearest competitor
Architecture:
The complete parameter breakdown shows the network’s complexity:
Conv1: 34,944 parameters
Conv2: 614,656 parameters
Conv3: 885,120 parameters
Conv4: 1,327,488 parameters
Conv5: 884,992 parameters
FC1: 37,752,832 parameters
FC2: 16,781,312 parameters
FC3: 4,097,000 parameters
TOTAL: 62,378,344 parameters- ReLU activation replaced sigmoid/tanh for faster training
- Dropout regularization reduced overfitting significantly
- Overlapping max pooling improved feature extraction
- Data augmentation through random cropping, mirroring, and rotation
Training Revolution:
AlexNet pioneered GPU utilization for neural network training, achieving dramatically faster speeds than CPU training and making deep network training practically feasible
AlexNet’s success caused several immediate shifts in the field:
- Mainstream adoption of deep learning (from research topic to standard practice)
- Obsolescence of traditional methods (outperformed SVMs and random forests)
- Computer vision paradigm shift (CNNs became the default approach)