Adversarial Attack
Short Definition
Full Definition
Adversarial attacks represent one of the most surprising and concerning discoveries in deep learning research. First systematically studied by Szegedy et al. in 2014, they revealed that state-of-the-art neural networks could be fooled by adding tiny, human-imperceptible noise to inputs. A classifier that correctly identifies a panda with 99% confidence can be made to classify the same image as a gibbon with 99% confidence after adding a carefully computed perturbation that is invisible to the human eye. This discovery challenged the assumption that high accuracy on test sets means robust real-world performance. Adversarial attacks come in several forms. White-box attacks like FGSM (Fast Gradient Sign Method) and PGD (Projected Gradient Descent) use full knowledge of the model to compute optimal perturbations. Black-box attacks work without model access, often by training a surrogate model or using query-based methods. Physical-world attacks create adversarial objects that fool models in real environments — adversarial patches on stop signs that cause self-driving cars to misclassify them. The existence of adversarial attacks has profound implications for AI safety, particularly in security-critical applications like autonomous vehicles, medical diagnosis, and facial recognition. Adversarial robustness research develops defenses including adversarial training, certified defenses, and input preprocessing, though no complete solution exists. The field also provides insights into how neural networks represent and process information.
Technical Explanation
FGSM computes a single-step perturbation: x_adv = x + epsilon * sign(gradient_x L(theta, x, y)), where epsilon controls perturbation magnitude. PGD iterates: x_{t+1} = Project_{S}(x_t + alpha * sign(gradient L)), projecting back to the epsilon-ball around the original input after each step. The C&W attack directly optimizes: minimize ||delta||_p + c * f(x + delta) where f is designed so f(x + delta) < 0 when the attack succeeds. Adversarial training augments training with adversarial examples: min_theta E[max_{delta in S} L(theta, x+delta, y)], solving a minimax optimization. Certified defenses provide provable robustness guarantees within a defined perturbation budget using randomized smoothing or interval bound propagation.
Use Cases
Advantages
Disadvantages
Schema Type
Featured Snippet Candidate
Difficulty Level