Drift Detection

Drift detection monitors connected data sources for schema changes and degrades confidence scores when the knowledge model becomes stale. This ensures agents are warned when definitions may no longer be accurate, rather than silently returning incorrect results.

How It Works

DriftScanner (`drift_scanner.py`)

The DriftScanner class compares the live database schema against the stored schema snapshot:

scanner = DriftScanner()
findings = scanner.scan_data_source(
    source_name="production_postgres",
    workspace_id=workspace_id
)

The scan_data_source() method:

Loads the stored schema snapshot from the last scan
Crawls the live database schema using schema_crawler.py
Compares the two using compare_schemas() — detecting added/removed/modified tables and columns
Compares row counts using compare_row_counts() — detecting significant data volume changes
Returns structured drift findings with severity levels

Finding types include:

Table added/removed — New tables appeared or existing tables were dropped
Column added/removed/modified — Column changes within existing tables
Type changed — Column data type modifications
Row count anomaly — Significant changes in table row counts

DriftResolver (`drift_resolver.py`)

Handles the resolution workflow for drift findings:

Auto-resolve — Non-breaking changes (e.g., new nullable column) can be auto-resolved
Manual review — Breaking changes (column removed, type changed) require human review
Status transitions — Validates finding lifecycle: open -> acknowledged -> resolved/deferred

Gap Analyzer (`gap_analyzer.py`)

Identifies gaps between the knowledge model and the database:

Tables that exist in the database but have no Object Type mapping
Columns that are used in verified queries but have been dropped
Foreign key relationships that changed since the last ontology update

Schema Crawler (`schema_crawler.py`)

Connects to data sources and extracts current schema metadata:

Table names, column names, data types, nullability
Primary keys and foreign key relationships
Row counts and basic statistics

Drift Metrics (`drift_metrics.py`)

Aggregates drift findings into metrics:

Total open findings by severity
Mean time to resolution
Affected verified query count (via find_affected_vq_ids())
Confidence impact score

Architecture

The drift detection system operates on a scan schedule:

The drift_scanner runs periodically (configured interval) or on-demand
Findings are persisted in the drift_findings PostgreSQL table
Affected verified queries are identified and flagged
Confidence scores for affected concepts are degraded
Webhook notifications can be sent via drift_webhook.py

Configuration

# Drift scan scheduling is managed via the API or admin interface
# Webhook URL for drift notifications
EPISTOM_DRIFT_WEBHOOK_URL=https://your-webhook-endpoint

Technical Details

Schema comparisons normalize data types for cross-dialect compatibility (_normalize_type())
The compute_auto_resolves() function compares previous and new findings to automatically close resolved drift
The deduplicate_findings() function prevents duplicate findings when scanning the same schema change multiple times
Drift findings include affected_vq_ids — a list of verified query IDs whose tables/columns were affected

On this page