Deep networks are brittle: a perturbation imperceptible to humans can flip a classifier's decision with high confidence. These adversarial examples threaten autonomous driving, malware detection and biometric systems, making adversarial defence a core security topic.
How attacks work
Most attacks follow the loss gradient with respect to the input. The Fast Gradient Sign Method (FGSM) takes one step in the sign of the gradient; Projected Gradient Descent (PGD) iterates this within a small ε-ball, producing the strongest first-order attack. Defences must withstand such adaptive, white-box adversaries — not just weak ones.
Defence strategies
- Adversarial training — train on PGD-generated examples; the strongest empirical defence, formulated as a min-max (robust optimisation) problem
- Certified defences — randomised smoothing gives a provable robustness radius, trading some clean accuracy for guarantees
- Input transformation / purification — denoise or use a diffusion model to project inputs back to the data manifold before inference
- Detection — flag inputs whose statistics look adversarial
| Defence | Guarantee | Cost |
|---|---|---|
| Adversarial training | Empirical | Expensive training, lower clean acc. |
| Randomised smoothing | Certified radius | Many forward passes at inference |
| Diffusion purification | Empirical | Heavy inference compute |
| Detection | None (filter only) | Can be evaded by adaptive attacks |
Critical caveatMany published defences were later broken because they caused gradient masking rather than true robustness. Always evaluate against adaptive attacks designed with full knowledge of the defence.
Applications
- Robust perception for autonomous vehicles and traffic-sign recognition
- Malware and spam classifiers facing evasive adversaries
- Biometric and content-moderation systems
References & further reading
- Goodfellow et al., “Explaining and Harnessing Adversarial Examples,” ICLR 2015.
- Madry et al., “Towards Deep Learning Models Resistant to Adversarial Attacks (PGD),” ICLR 2018.
- Cohen et al., “Certified Adversarial Robustness via Randomized Smoothing,” ICML 2019.