AI Threat Landscape - Lesson 2

📚 Learning Objectives

By the end of this lesson, you will be able to:

Understand the taxonomy of AI threats
Identify different types of adversarial attacks
Recognize data poisoning threats
Understand model extraction and backdoor attacks
Analyze privacy attacks on ML models

🎯 AI Threat Taxonomy

Primary Threat Categories

🔴 Evasion Attacks

Attacks during inference to cause misclassification

Adversarial examples
FGSM, PGD, C&W attacks
Universal adversarial perturbations

🟡 Poisoning Attacks

Attacks during training to compromise model behavior

Data poisoning
Label flipping
Backdoor insertion

🟠 Extraction Attacks

Attacks to steal or reconstruct models

Model extraction
Model inversion
Membership inference

🔵 Privacy Attacks

Attacks to extract sensitive information

Training data reconstruction
Attribute inference
Property inference

⚔️ Evasion Attacks (Adversarial Examples)

What are Adversarial Examples?

Adversarial examples are inputs that have been subtly modified to cause machine learning models to make incorrect predictions, while remaining virtually indistinguishable from the original inputs to human observers.

Fast Gradient Sign Method (FGSM)

Principle: Uses the gradient of the loss function to create adversarial examples

x' = x + ε × sign(∇ₓJ(θ, x, y))

Advantages: Fast and simple
Disadvantages: Single-step attack, easily defended
Use Case: Quick adversarial example generation

Projected Gradient Descent (PGD)

Principle: Iterative version of FGSM with projection

x₀ = x + random_noise
xₙ₊₁ = Π(xₙ + α × sign(∇ₓJ(θ, xₙ, y)))

Advantages: Strong attack, hard to defend
Disadvantages: Computationally expensive
Use Case: Robustness testing

Carlini & Wagner (C&W)

Principle: Optimizes for minimal perturbation

minimize ||δ||₂ + c × f(x + δ)

Advantages: Very effective, minimal perturbation
Disadvantages: Slow, requires optimization
Use Case: Breaking defensive measures

☠️ Data Poisoning Attacks

Types of Data Poisoning

1. Label Flipping

Changing the labels of training examples to mislead the model.

Example:

In a spam detection system, changing legitimate emails to be labeled as spam, causing the model to classify legitimate emails as spam.

2. Data Injection

Adding malicious samples to the training dataset.

Example:

Injecting images with specific triggers into a facial recognition training set to create backdoors.

3. Backdoor Attacks

Inserting hidden functionality into the model through poisoned data.

Example:

Training a model to recognize a specific pattern (backdoor trigger) that causes it to misclassify inputs when the trigger is present.

Impact of Poisoning Attacks

Performance Degradation: Reduced accuracy on legitimate data
Security Bypass: Model fails on specific inputs
Privacy Violation: Unauthorized access through backdoors
Reputation Damage: Loss of trust in AI systems

🕵️ Model Extraction Attacks

What is Model Extraction?

Model extraction attacks aim to steal or reconstruct machine learning models by querying them and analyzing the responses.

1. Black-box Extraction

Extracting model functionality without access to internal parameters.

Process:

Query the target model with various inputs
Collect input-output pairs
Train a surrogate model on the collected data
Use the surrogate model for inference

2. Model Inversion

Reconstructing training data from model outputs.

Process:

Query model with specific target outputs
Use optimization to find inputs that produce target outputs
Reconstruct original training data

3. Membership Inference

Determining if specific data points were in the training set.

Process:

Train shadow models on similar data
Analyze confidence scores and outputs
Build inference models to detect membership

Consequences of Model Extraction

Intellectual Property Theft: Stealing proprietary models
Privacy Violations: Accessing sensitive training data
Competitive Advantage: Using stolen models for profit
Security Bypass: Understanding model behavior for attacks

🔒 Privacy Attacks on ML Models

Types of Privacy Attacks

1. Training Data Reconstruction

Reconstructing individual training examples from model parameters or outputs.

Methods:

Model Inversion: Using model outputs to reconstruct inputs
Gradient-based Attacks: Using gradients to reconstruct data
GAN-based Reconstruction: Using generative models

2. Attribute Inference

Inferring sensitive attributes about individuals from model outputs.

Example:

From a recommendation system's output, inferring a user's age, gender, or political preferences.

3. Property Inference

Inferring properties of the training dataset.

Example:

Determining the distribution of sensitive attributes in the training data.

Privacy Protection Techniques

Differential Privacy: Adding noise to protect individual privacy
Federated Learning: Training without centralizing data
Homomorphic Encryption: Computing on encrypted data
Secure Multi-party Computation: Computing without revealing inputs

🧪 Hands-On Exercise

Exercise: Threat Analysis Workshop

Objective: Analyze different AI systems and identify applicable threats.

📋 Scenarios:

Scenario 1: Autonomous Vehicle AI

An AI system that processes camera feeds to identify road signs and obstacles.

What types of attacks are possible?
Which threats pose the highest risk?
What would be the impact of successful attacks?

Scenario 2: Medical Diagnosis AI

An AI system that analyzes medical images to detect diseases.

What privacy concerns exist?
How could the system be compromised?
What are the ethical implications?

Scenario 3: Financial Fraud Detection

An AI system that monitors transactions to detect fraudulent activity.

What evasion techniques could be used?
How could attackers bypass the system?
What data could be extracted?

📄 Deliverables:

Threat assessment for each scenario
Risk matrix with likelihood and impact
Recommended mitigation strategies
Incident response procedures

⚠️ Lesson 2: AI Threat Landscape

📚 Learning Objectives

🎯 AI Threat Taxonomy

Primary Threat Categories

🔴 Evasion Attacks

🟡 Poisoning Attacks

🟠 Extraction Attacks

🔵 Privacy Attacks

⚔️ Evasion Attacks (Adversarial Examples)

What are Adversarial Examples?

Fast Gradient Sign Method (FGSM)

Projected Gradient Descent (PGD)

Carlini & Wagner (C&W)

☠️ Data Poisoning Attacks

Types of Data Poisoning

1. Label Flipping

Example:

2. Data Injection

Example:

3. Backdoor Attacks

Example:

Impact of Poisoning Attacks

🕵️ Model Extraction Attacks

What is Model Extraction?

1. Black-box Extraction

Process:

2. Model Inversion

Process:

3. Membership Inference

Process:

Consequences of Model Extraction

🔒 Privacy Attacks on ML Models

Types of Privacy Attacks

1. Training Data Reconstruction

Methods:

2. Attribute Inference

Example:

3. Property Inference

Example:

Privacy Protection Techniques

🧪 Hands-On Exercise

Exercise: Threat Analysis Workshop

📋 Scenarios:

Scenario 1: Autonomous Vehicle AI

Scenario 2: Medical Diagnosis AI

Scenario 3: Financial Fraud Detection

📄 Deliverables:

📊 Knowledge Check

Question 1: What is the main goal of evasion attacks?

Question 2: Which attack involves modifying training data?

Question 3: What is the purpose of model extraction attacks?