Evasion Attacks (FGSM, PGD, C&W) - Module 2 Lesson 1

📚 Learning Objectives

By the end of this lesson, you will be able to:

Understand the principles behind evasion attacks
Implement Fast Gradient Sign Method (FGSM)
Apply Projected Gradient Descent (PGD) attacks
Execute Carlini & Wagner (C&W) attacks
Evaluate attack effectiveness and stealth
Implement detection and mitigation strategies

🔬 Understanding Evasion Attacks

What are Evasion Attacks?

Evasion attacks are adversarial techniques that craft inputs specifically designed to fool machine learning models while appearing legitimate to humans. These attacks exploit the model's decision boundaries and sensitivity to small perturbations.

🎯 Key Characteristics:

Minimal Perturbation: Small changes to input data
Targeted Misclassification: Specific wrong predictions
Stealth: Changes should be imperceptible to humans
Transferability: Attacks may work across different models

Attack Taxonomy

📊 By Knowledge Level

White-box: Full access to model architecture and parameters
Black-box: Only input/output access
Gray-box: Partial knowledge (architecture but not weights)

🎯 By Attack Goal

Targeted: Force specific wrong prediction
Untargeted: Any wrong prediction acceptable
Universal: Single perturbation works for multiple inputs

⚡ Fast Gradient Sign Method (FGSM)

Mathematical Foundation

FGSM is a simple yet effective one-step attack that uses the gradient of the loss function to determine the direction of perturbation.

📐 FGSM Formula:

x_adv = x + ε * sign(∇_x J(θ, x, y))

Where:

x_adv: Adversarial example
x: Original input
ε: Perturbation magnitude (epsilon)
∇_x J: Gradient of loss function w.r.t. input
y: True label

Implementation Example

import torch
import torch.nn.functional as F

def fgsm_attack(model, data, target, epsilon):
    """
    Fast Gradient Sign Method Attack
    
    Args:
        model: The model to attack
        data: Input data (batch)
        target: True labels
        epsilon: Attack strength
    
    Returns:
        adversarial_data: Perturbed inputs
    """
    # Set requires_grad to True for input data
    data.requires_grad = True
    
    # Forward pass
    output = model(data)
    loss = F.nll_loss(output, target)
    
    # Backward pass to compute gradients
    model.zero_grad()
    loss.backward()
    
    # Collect gradients
    data_grad = data.grad.data
    
    # Create adversarial example
    perturbed_data = data + epsilon * data_grad.sign()
    
    # Clamp to valid range [0, 1]
    perturbed_data = torch.clamp(perturbed_data, 0, 1)
    
    return perturbed_data

⚙️ FGSM Parameters:

ε (epsilon): Controls attack strength (0.01-0.3 typical)
Loss function: Cross-entropy most common
Gradient direction: Sign function for L∞ norm

🔄 Projected Gradient Descent (PGD)

Iterative Refinement

PGD is an iterative version of FGSM that applies multiple small steps to find stronger adversarial examples within a specified norm ball.

📐 PGD Algorithm:

x_0 = x + uniform_noise(-ε, ε)
For i = 1 to num_iterations:
    x_i = x_{i-1} + α * sign(∇_x J(θ, x_{i-1}, y))
    x_i = project_to_ball(x_i, x, ε)

Implementation Example

def pgd_attack(model, data, target, epsilon, alpha, num_iter):
    """
    Projected Gradient Descent Attack
    
    Args:
        model: The model to attack
        data: Input data (batch)
        target: True labels
        epsilon: Maximum perturbation (L∞ norm)
        alpha: Step size per iteration
        num_iter: Number of iterations
    
    Returns:
        adversarial_data: Perturbed inputs
    """
    # Initialize with random noise
    perturbed_data = data + torch.empty_like(data).uniform_(-epsilon, epsilon)
    perturbed_data = torch.clamp(perturbed_data, 0, 1)
    
    for i in range(num_iter):
        perturbed_data.requires_grad = True
        
        # Forward pass
        output = model(perturbed_data)
        loss = F.nll_loss(output, target)
        
        # Compute gradients
        model.zero_grad()
        loss.backward()
        
        # Update perturbation
        with torch.no_grad():
            perturbed_data = perturbed_data + alpha * perturbed_data.grad.sign()
            
            # Project back to epsilon ball
            delta = torch.clamp(perturbed_data - data, -epsilon, epsilon)
            perturbed_data = data + delta
            
            # Clamp to valid range
            perturbed_data = torch.clamp(perturbed_data, 0, 1)
    
    return perturbed_data

⚙️ PGD Parameters:

ε (epsilon): Maximum perturbation (0.01-0.3)
α (alpha): Step size (typically ε/4)
num_iter: Number of iterations (10-40 typical)

🎯 Carlini & Wagner (C&W) Attack

Optimization-Based Approach

C&W attack formulates adversarial example generation as an optimization problem, often finding smaller perturbations than gradient-based methods.

📐 C&W Objective Function:

minimize ||δ||_p + c * f(x + δ)

subject to:
- x + δ ∈ [0, 1]^n
- x + δ ∈ valid_input_space

Where f(x + δ) is the objective function that encourages misclassification.

Implementation Example

def cw_attack(model, data, target, c=1.0, kappa=0, max_iter=1000, 
              binary_search_steps=9, learning_rate=0.01):
    """
    Carlini & Wagner L2 Attack
    
    Args:
        model: The model to attack
        data: Input data (single sample)
        target: True label
        c: Initial constant for binary search
        kappa: Confidence parameter
        max_iter: Maximum optimization iterations
        binary_search_steps: Binary search steps for c
        learning_rate: Adam optimizer learning rate
    
    Returns:
        adversarial_data: Perturbed input
    """
    device = data.device
    batch_size = data.shape[0]
    
    # Binary search for c
    c_low = 0.0
    c_high = c
    c_best = c
    
    # Initialize adversarial example
    delta = torch.zeros_like(data, requires_grad=True)
    optimizer = torch.optim.Adam([delta], lr=learning_rate)
    
    for search_step in range(binary_search_steps):
        c = (c_low + c_high) / 2
        
        for iteration in range(max_iter):
            optimizer.zero_grad()
            
            # Forward pass
            perturbed_data = data + delta
            output = model(perturbed_data)
            
            # C&W objective function
            target_score = output.gather(1, target.unsqueeze(1)).squeeze(1)
            max_other_score = torch.max(output * (1 - F.one_hot(target, output.shape[1])), dim=1)[0]
            
            # Objective function
            f = torch.clamp(max_other_score - target_score + kappa, min=0.0)
            
            # Total loss
            loss = torch.sum(delta**2) + c * torch.sum(f)
            
            loss.backward()
            optimizer.step()
            
            # Check if attack succeeded
            with torch.no_grad():
                prediction = torch.argmax(model(data + delta), dim=1)
                success = (prediction != target).float()
                
                if torch.all(success):
                    c_best = c
                    c_high = c
                    break
        
        if not torch.all(success):
            c_low = c
    
    return torch.clamp(data + delta.detach(), 0, 1)

⚙️ C&W Parameters:

c: Confidence parameter (0.1-10)
κ (kappa): Margin parameter (0-10)
max_iter: Optimization iterations (1000)
learning_rate: Adam learning rate (0.01)

🧪 Hands-On Exercise

Exercise: Implement and Compare Evasion Attacks

Objective: Implement FGSM, PGD, and C&W attacks on a pre-trained model and compare their effectiveness.

📋 Steps:

Setup Environment

Install required libraries and load a pre-trained model:

pip install torch torchvision tensorboard
pip install cleverhans  # Adversarial attack library

# Load pre-trained model
import torch
import torchvision.models as models
model = models.resnet50(pretrained=True)
model.eval()

Prepare Test Data

Load and preprocess test images:

from torchvision import datasets, transforms

transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                        std=[0.229, 0.224, 0.225])
])

test_dataset = datasets.ImageNet(root='./data', split='val', transform=transform)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=1)

Implement Attack Comparison

Create a function to compare all three attacks:

def compare_attacks(model, data, target, epsilon=0.03):
    """
    Compare FGSM, PGD, and C&W attacks
    """
    results = {}
    
    # FGSM Attack
    fgsm_adv = fgsm_attack(model, data, target, epsilon)
    fgsm_pred = torch.argmax(model(fgsm_adv), dim=1)
    results['FGSM'] = {
        'success': (fgsm_pred != target).float().mean().item(),
        'perturbation': torch.norm(fgsm_adv - data, p=float('inf')).item()
    }
    
    # PGD Attack
    pgd_adv = pgd_attack(model, data, target, epsilon, epsilon/4, 20)
    pgd_pred = torch.argmax(model(pgd_adv), dim=1)
    results['PGD'] = {
        'success': (pgd_pred != target).float().mean().item(),
        'perturbation': torch.norm(pgd_adv - data, p=float('inf')).item()
    }
    
    # C&W Attack
    cw_adv = cw_attack(model, data, target)
    cw_pred = torch.argmax(model(cw_adv), dim=1)
    results['C&W'] = {
        'success': (cw_pred != target).float().mean().item(),
        'perturbation': torch.norm(cw_adv - data, p=2).item()
    }
    
    return results

Visualize Results

Create visualizations comparing attack effectiveness:

import matplotlib.pyplot as plt

def visualize_attacks(original, fgsm_adv, pgd_adv, cw_adv):
    """
    Visualize original and adversarial examples
    """
    fig, axes = plt.subplots(1, 4, figsize=(16, 4))
    
    # Denormalize for visualization
    def denormalize(tensor):
        mean = torch.tensor([0.485, 0.456, 0.406]).view(3, 1, 1)
        std = torch.tensor([0.229, 0.224, 0.225]).view(3, 1, 1)
        return tensor * std + mean
    
    images = [original, fgsm_adv, pgd_adv, cw_adv]
    titles = ['Original', 'FGSM', 'PGD', 'C&W']
    
    for i, (img, title) in enumerate(zip(images, titles)):
        img_denorm = torch.clamp(denormalize(img.squeeze()), 0, 1)
        axes[i].imshow(img_denorm.permute(1, 2, 0))
        axes[i].set_title(title)
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()

📄 Deliverables:

Working implementation of all three attacks
Attack success rate comparison
Perturbation magnitude analysis
Visual comparison of adversarial examples
Performance benchmark results

🎯 Lesson 1: Evasion Attacks (FGSM, PGD, C&W)

📚 Learning Objectives

🔬 Understanding Evasion Attacks

What are Evasion Attacks?

🎯 Key Characteristics:

Attack Taxonomy

📊 By Knowledge Level

🎯 By Attack Goal

⚡ Fast Gradient Sign Method (FGSM)

Mathematical Foundation

📐 FGSM Formula:

Implementation Example

⚙️ FGSM Parameters:

🔄 Projected Gradient Descent (PGD)

Iterative Refinement

📐 PGD Algorithm:

Implementation Example

⚙️ PGD Parameters:

🎯 Carlini & Wagner (C&W) Attack

Optimization-Based Approach

📐 C&W Objective Function:

Implementation Example

⚙️ C&W Parameters:

🧪 Hands-On Exercise

Exercise: Implement and Compare Evasion Attacks

📋 Steps:

📄 Deliverables:

🔗 External Resources:

📊 Knowledge Check

Question 1: What is the main difference between FGSM and PGD attacks?

Question 2: Which attack method typically produces the smallest perturbations?

Question 3: What does the epsilon parameter control in FGSM and PGD attacks?