AI Monitoring & Detection - AI Security Module 3 Lesson 3

🎯 Learning Objectives

Design AI security monitoring architectures
Implement adversarial attack detection systems
Monitor model drift and performance degradation
Set up real-time threat detection pipelines
Create incident response procedures for AI security

📚 Core Concepts

1. AI Security Monitoring Architecture

A comprehensive monitoring system that tracks AI model behavior, detects anomalies, and identifies security threats in real-time.

Monitoring Components

Input Validation

Sanitize and validate incoming data

Model Monitoring

Track predictions and confidence scores

Anomaly Detection

Identify unusual patterns and behaviors

Threat Detection

Detect adversarial attacks and exploits

Alerting System

Notify security teams of incidents

Incident Response

Automated response and mitigation

Key Metrics to Monitor

Prediction confidence: Unusually high/low confidence scores
Input distribution: Changes in data patterns
Model performance: Accuracy and latency metrics
Resource utilization: CPU, memory, and GPU usage
Error rates: Failed predictions and exceptions

2. Adversarial Attack Detection

Techniques to identify when AI models are under adversarial attack or receiving malicious inputs.

Statistical Detection Methods

import numpy as np
from scipy import stats

class AdversarialDetector:
    def __init__(self, baseline_stats=None):
        self.baseline_stats = baseline_stats or {}
        
    def detect_anomaly(self, input_data, model_output):
        """
        Detect potential adversarial inputs using statistical methods
        """
        # Feature-based detection
        feature_stats = self._compute_feature_stats(input_data)
        
        # Output-based detection
        output_stats = self._compute_output_stats(model_output)
        
        # Combined anomaly score
        anomaly_score = self._compute_anomaly_score(feature_stats, output_stats)
        
        return {
            'is_anomaly': anomaly_score > 0.8,
            'anomaly_score': anomaly_score,
            'feature_stats': feature_stats,
            'output_stats': output_stats
        }
    
    def _compute_feature_stats(self, input_data):
        """Compute statistical features of input data"""
        return {
            'mean': np.mean(input_data),
            'std': np.std(input_data),
            'skewness': stats.skew(input_data.flatten()),
            'kurtosis': stats.kurtosis(input_data.flatten()),
            'entropy': self._compute_entropy(input_data)
        }
    
    def _compute_entropy(self, data):
        """Compute entropy of data distribution"""
        hist, _ = np.histogram(data.flatten(), bins=50)
        prob = hist / np.sum(hist)
        prob = prob[prob > 0]  # Remove zeros
        return -np.sum(prob * np.log2(prob))

Ensemble Detection

class EnsembleDetector:
    def __init__(self):
        self.detectors = [
            StatisticalDetector(),
            GradientBasedDetector(),
            UncertaintyDetector(),
            ReconstructionDetector()
        ]
        
    def detect_adversarial(self, input_data, model_output):
        """
        Ensemble detection using multiple methods
        """
        scores = []
        for detector in self.detectors:
            score = detector.detect(input_data, model_output)
            scores.append(score)
        
        # Weighted ensemble decision
        weights = [0.3, 0.25, 0.25, 0.2]  # Learned weights
        final_score = sum(w * s for w, s in zip(weights, scores))
        
        return {
            'is_adversarial': final_score > 0.7,
            'confidence': final_score,
            'individual_scores': scores
        }

3. Model Drift Detection

Monitor for changes in data distribution and model performance over time.

Data Drift Detection

from scipy.stats import ks_2samp
import pandas as pd

class DataDriftDetector:
    def __init__(self, reference_data):
        self.reference_data = reference_data
        
    def detect_drift(self, current_data, threshold=0.05):
        """
        Detect data drift using statistical tests
        """
        drift_results = {}
        
        for column in self.reference_data.columns:
            ref_values = self.reference_data[column].values
            curr_values = current_data[column].values
            
            # Kolmogorov-Smirnov test
            ks_stat, ks_pvalue = ks_2samp(ref_values, curr_values)
            
            # Population Stability Index (PSI)
            psi = self._compute_psi(ref_values, curr_values)
            
            drift_results[column] = {
                'ks_statistic': ks_stat,
                'ks_pvalue': ks_pvalue,
                'psi': psi,
                'has_drift': ks_pvalue < threshold or psi > 0.2
            }
        
        return drift_results
    
    def _compute_psi(self, reference, current, bins=10):
        """Compute Population Stability Index"""
        # Create bins based on reference data
        ref_hist, bin_edges = np.histogram(reference, bins=bins)
        curr_hist, _ = np.histogram(current, bins=bin_edges)
        
        # Normalize to probabilities
        ref_prob = ref_hist / np.sum(ref_hist)
        curr_prob = curr_hist / np.sum(curr_hist)
        
        # Compute PSI
        psi = 0
        for i in range(len(ref_prob)):
            if ref_prob[i] > 0 and curr_prob[i] > 0:
                psi += (curr_prob[i] - ref_prob[i]) * np.log(curr_prob[i] / ref_prob[i])
        
        return psi

Concept Drift Detection

class ConceptDriftDetector:
    def __init__(self, window_size=1000):
        self.window_size = window_size
        self.performance_history = []
        
    def update_performance(self, accuracy, timestamp=None):
        """Update performance metrics"""
        self.performance_history.append({
            'accuracy': accuracy,
            'timestamp': timestamp or time.time()
        })
        
        # Keep only recent history
        if len(self.performance_history) > self.window_size:
            self.performance_history = self.performance_history[-self.window_size:]
    
    def detect_concept_drift(self, threshold=0.05):
        """Detect concept drift based on performance degradation"""
        if len(self.performance_history) < 100:
            return {'has_drift': False, 'confidence': 0.0}
        
        # Split into two windows
        mid_point = len(self.performance_history) // 2
        recent_performance = [p['accuracy'] for p in self.performance_history[mid_point:]]
        historical_performance = [p['accuracy'] for p in self.performance_history[:mid_point]]
        
        # Statistical test for performance difference
        from scipy.stats import ttest_ind
        t_stat, p_value = ttest_ind(historical_performance, recent_performance)
        
        # Calculate drift severity
        mean_diff = np.mean(recent_performance) - np.mean(historical_performance)
        
        return {
            'has_drift': p_value < threshold and mean_diff < -0.02,
            'p_value': p_value,
            'performance_degradation': abs(mean_diff),
            'confidence': 1 - p_value
        }

🔧 Implementation Strategies

1. Real-time Monitoring Pipeline

Build a scalable monitoring system for production AI deployments.

import asyncio
import aiohttp
from dataclasses import dataclass
from typing import Dict, Any
import logging

@dataclass
class MonitoringConfig:
    """Configuration for AI monitoring system"""
    model_name: str
    monitoring_endpoint: str
    alert_thresholds: Dict[str, float]
    batch_size: int = 100
    check_interval: int = 60

class AIMonitoringPipeline:
    def __init__(self, config: MonitoringConfig):
        self.config = config
        self.detector = EnsembleDetector()
        self.drift_detector = DataDriftDetector()
        self.alerts = []
        
    async def monitor_prediction(self, input_data, prediction, metadata=None):
        """Monitor a single prediction"""
        try:
            # Adversarial detection
            adv_result = self.detector.detect_adversarial(input_data, prediction)
            
            # Drift detection (batch mode)
            self._update_drift_data(input_data, prediction)
            
            # Performance monitoring
            perf_metrics = self._compute_performance_metrics(prediction, metadata)
            
            # Check for alerts
            alerts = self._check_alert_conditions(adv_result, perf_metrics)
            
            # Log results
            await self._log_monitoring_result({
                'timestamp': time.time(),
                'adversarial_detection': adv_result,
                'performance_metrics': perf_metrics,
                'alerts': alerts
            })
            
            return {
                'is_safe': not adv_result['is_adversarial'] and len(alerts) == 0,
                'confidence': adv_result['confidence'],
                'alerts': alerts
            }
            
        except Exception as e:
            logging.error(f"Monitoring error: {e}")
            return {'is_safe': False, 'error': str(e)}
    
    def _check_alert_conditions(self, adv_result, perf_metrics):
        """Check if any alert conditions are met"""
        alerts = []
        
        if adv_result['is_adversarial']:
            alerts.append({
                'type': 'ADVERSARIAL_ATTACK',
                'severity': 'HIGH',
                'message': f"Potential adversarial input detected (confidence: {adv_result['confidence']:.2f})"
            })
        
        if perf_metrics['confidence'] < self.config.alert_thresholds['min_confidence']:
            alerts.append({
                'type': 'LOW_CONFIDENCE',
                'severity': 'MEDIUM',
                'message': f"Low prediction confidence: {perf_metrics['confidence']:.2f}"
            })
        
        if perf_metrics['latency'] > self.config.alert_thresholds['max_latency']:
            alerts.append({
                'type': 'HIGH_LATENCY',
                'severity': 'MEDIUM',
                'message': f"High prediction latency: {perf_metrics['latency']:.2f}ms"
            })
        
        return alerts

2. Automated Response System

Implement automated responses to detected threats and anomalies.

class AutomatedResponseSystem:
    def __init__(self, model_manager, notification_service):
        self.model_manager = model_manager
        self.notification_service = notification_service
        self.response_policies = self._load_response_policies()
        
    async def handle_security_incident(self, incident_data):
        """Handle detected security incidents"""
        incident_type = incident_data['type']
        severity = incident_data['severity']
        
        # Get response policy
        policy = self.response_policies.get(incident_type, {})
        
        # Execute automated responses
        responses = []
        
        if severity == 'HIGH':
            # Immediate response for high severity
            responses.extend(await self._execute_high_severity_response(incident_data))
        elif severity == 'MEDIUM':
            # Moderate response for medium severity
            responses.extend(await self._execute_medium_severity_response(incident_data))
        
        # Log incident and response
        await self._log_incident_response(incident_data, responses)
        
        # Notify security team
        await self.notification_service.send_alert(incident_data, responses)
        
        return responses
    
    async def _execute_high_severity_response(self, incident_data):
        """Execute high severity incident response"""
        responses = []
        
        # Block suspicious input
        if incident_data['type'] == 'ADVERSARIAL_ATTACK':
            responses.append({
                'action': 'BLOCK_INPUT',
                'input_id': incident_data['input_id'],
                'reason': 'Detected adversarial attack'
            })
        
        # Switch to fallback model
        if incident_data['type'] == 'MODEL_COMPROMISE':
            fallback_model = await self.model_manager.get_fallback_model()
            responses.append({
                'action': 'SWITCH_MODEL',
                'new_model': fallback_model,
                'reason': 'Primary model potentially compromised'
            })
        
        # Increase monitoring frequency
        responses.append({
            'action': 'INCREASE_MONITORING',
            'frequency': 'HIGH',
            'duration': 3600  # 1 hour
        })
        
        return responses
    
    async def _execute_medium_severity_response(self, incident_data):
        """Execute medium severity incident response"""
        responses = []
        
        # Log detailed information
        responses.append({
            'action': 'ENHANCED_LOGGING',
            'level': 'DETAILED',
            'duration': 1800  # 30 minutes
        })
        
        # Increase sampling rate for analysis
        responses.append({
            'action': 'INCREASE_SAMPLING',
            'rate': 0.5,  # 50% of inputs
            'duration': 1800
        })
        
        return responses

📊 Monitoring Dashboards

Key Performance Indicators (KPIs)

Security Metrics

Adversarial attack detection rate
False positive rate
Mean time to detection (MTTD)
Mean time to response (MTTR)

Model Performance

Prediction accuracy
Confidence score distribution
Latency percentiles
Throughput metrics

Data Quality

Data drift indicators
Input validation failures
Missing value rates
Outlier detection rates

System Health

Model availability
Resource utilization
Error rates
Alert frequency

Real-time Visualization

# Example dashboard configuration
dashboard_config = {
    'panels': [
        {
            'title': 'Security Threats',
            'type': 'timeseries',
            'metrics': [
                'adversarial_detection_rate',
                'blocked_requests',
                'false_positive_rate'
            ],
            'refresh_interval': 30
        },
        {
            'title': 'Model Performance',
            'type': 'gauge',
            'metrics': [
                'prediction_accuracy',
                'average_confidence',
                'p95_latency'
            ],
            'thresholds': {
                'accuracy': {'warning': 0.9, 'critical': 0.8},
                'confidence': {'warning': 0.7, 'critical': 0.5},
                'latency': {'warning': 100, 'critical': 500}
            }
        },
        {
            'title': 'Data Drift',
            'type': 'heatmap',
            'metrics': ['feature_drift_scores'],
            'color_scheme': 'red-yellow-green'
        }
    ],
    'alerts': {
        'channels': ['email', 'slack', 'pagerduty'],
        'escalation_policy': {
            'immediate': ['adversarial_attack', 'model_failure'],
            '5_minutes': ['performance_degradation', 'data_drift'],
            '15_minutes': ['resource_issues', 'validation_failures']
        }
    }
}

🛠️ Tools & Platforms

Open Source Solutions

Evidently AI - ML monitoring and drift detection
WhyLabs - Data logging and monitoring
Feast - Feature store with monitoring
ClearML - ML operations platform

Commercial Platforms

WhyLabs - Enterprise ML monitoring
Weights & Biases - ML experiment tracking
Neptune - ML metadata management
Verta - Model operations platform

Security-Specific Tools

Adversarial Robustness Toolbox - Attack and defense evaluation
ART - Comprehensive security toolkit
Microsoft Robust ML - Robustness research tools

⚠️ Challenges & Best Practices

Common Challenges

False Positive Management

Problem: High false positive rates can overwhelm security teams
Solution: Implement adaptive thresholds and ensemble methods
Best Practice: Use feedback loops to improve detection accuracy

Performance Impact

Problem: Monitoring can introduce latency and overhead
Solution: Use asynchronous processing and sampling
Best Practice: Balance monitoring depth with performance requirements

Scalability

Problem: Monitoring systems must scale with model deployments
Solution: Use distributed architectures and caching
Best Practice: Design for horizontal scaling from the start

Best Practices

1. Comprehensive Coverage

Monitor all aspects of the AI pipeline: data, model, predictions, and infrastructure.

2. Layered Defense

Implement multiple detection methods to catch different types of threats.

3. Automated Response

Have automated responses ready for common threat scenarios.

4. Regular Updates

Continuously update detection models and thresholds based on new threats.

5. Incident Documentation

Maintain detailed logs and post-incident analysis for continuous improvement.

🎯 Hands-on Exercise

Exercise: Build a Monitoring Dashboard

Create a comprehensive monitoring system for an AI model with real-time threat detection.

Tasks:

Implement adversarial detection for image classification
Set up data drift monitoring
Create alerting system with different severity levels
Build a simple dashboard for visualization
Test the system with simulated attacks

Expected Outcomes:

Understanding of monitoring architecture
Experience with detection algorithms
Insight into production monitoring challenges

💻 Starter Code

# TODO: Implement monitoring pipeline
class AIMonitoringSystem:
    def __init__(self, model, config):
        self.model = model
        self.config = config
        # Initialize detectors and monitors
        
    def monitor_prediction(self, input_data, prediction):
        # Your implementation here
        pass
    
    def check_for_drift(self, batch_data):
        # Your implementation here
        pass
    
    def send_alert(self, alert_type, severity, message):
        # Your implementation here
        pass

🔍 AI Monitoring & Detection