SqueezeNet by Forrest Iandola et al., DeepScale, UC Berkeley & Stanford (2016) - Achieved AlexNet-level accuracy with 50x fewer parameters and compressed to <0.5MB model size (510x smaller than AlexNet)
“What if we didn’t need 100+ million parameters to get state-of-the-art accuracy?”

Focus shifted from accuracy alone to efficiency with equivalent accuracy:

  1. Smaller DNNs require less communication across servers during distributed training
  2. Smaller DNNs require less bandwidth to export models from cloud to autonomous cars
  3. Smaller DNNs are more feasible to deploy on FPGAs and hardware with limited memory

Key Innovation: Introduce the Fire Module - a squeeze-and-expand building block that drastically reduces parameters while maintaining accuracy
Three Architectural Design Strategies:

  • Strategy 1: Replace 3×3 filters with 1×1 filters
    • A 1×1 filter has 9x fewer parameters than 3×3​
  • Strategy 2: Decrease the number of input channels to 3×3 filters
    • Use squeeze layers to reduce depth before expensive convolutions
  • Strategy 3: Down sample late in the network
    • Keep large activation maps longer for higher classification accuracy

Parameter Calculation: Standard convolution layer: D×D×M×N parameters
With Fire module: (1×1×M×s)+(1×1×s×N/2)+(3×3×s×N/2) Example (D=3, M=256, N=512, s=32 with SR=0.125):

  • Standard: 1,179,648 parameters
  • Fire module: 180,224 parameters
  • Compression ratio: 84.72%

Squeeze Ratio (SR):​

  • SqueezeNet uses SR = 0.125, meaning squeeze layer has 8x fewer channels than expand layer
  • Tunable hyperparameter: Higher SR = larger model with better accuracy; Lower SR = smaller model​

Architecture

Network Structure: Total layers: 18 weight layers (conv + fire modules)

  1. Layer 1: Regular convolution layer (96 filters, 7×7, stride 2)
  2. Layers 2-9: Eight Fire modules (fire2 through fire9)
  3. Layer 10: Regular convolution layer (1000 filters, 1×1)
  4. Layer 11: Global average pooling → SoftMax

Fire Module Progression: Gradually increase filters per module:

  • fire2: 16 squeeze, 64+64 expand
  • fire3: 16 squeeze, 64+64 expand
  • fire4: 32 squeeze, 128+128 expand
  • fire5: 32 squeeze, 128+128 expand
  • fire6: 48 squeeze, 192+192 expand
  • fire7: 48 squeeze, 192+192 expand
  • fire8: 64 squeeze, 256+256 expand
  • fire9: 64 squeeze, 256+256 expand

Pooling Strategy (Delayed Down sampling): Max-pooling (3×3, stride 2) applied only after:

  • conv1 (layer 1)
  • fire4 (layer 4)
  • fire8 (layer 8)

Result: Large activation maps maintained throughout most of the network

  • No fully connected layers (inspired by Network-in-Network)
  • Global average pooling instead of FC layers
  • Dropout (p=0.5) after fire9 to reduce overfitting
  • ReLU activation applied after all squeeze and expand layers​