Drift Detection
Schema scanning, confidence degradation, and gap analysis
Drift detection monitors connected data sources for schema changes and degrades confidence scores when the knowledge model becomes stale. This ensures agents are warned when definitions may no longer be accurate, rather than silently returning incorrect results.
How It Works
DriftScanner (drift_scanner.py)
The DriftScanner class compares the live database schema against the stored schema snapshot:
The scan_data_source() method:
- Loads the stored schema snapshot from the last scan
- Crawls the live database schema using
schema_crawler.py - Compares the two using
compare_schemas()— detecting added/removed/modified tables and columns - Compares row counts using
compare_row_counts()— detecting significant data volume changes - Returns structured drift findings with severity levels
Finding types include:
- Table added/removed — New tables appeared or existing tables were dropped
- Column added/removed/modified — Column changes within existing tables
- Type changed — Column data type modifications
- Row count anomaly — Significant changes in table row counts
DriftResolver (drift_resolver.py)
Handles the resolution workflow for drift findings:
- Auto-resolve — Non-breaking changes (e.g., new nullable column) can be auto-resolved
- Manual review — Breaking changes (column removed, type changed) require human review
- Status transitions — Validates finding lifecycle:
open->acknowledged->resolved/deferred
Gap Analyzer (gap_analyzer.py)
Identifies gaps between the knowledge model and the database:
- Tables that exist in the database but have no Object Type mapping
- Columns that are used in verified queries but have been dropped
- Foreign key relationships that changed since the last ontology update
Schema Crawler (schema_crawler.py)
Connects to data sources and extracts current schema metadata:
- Table names, column names, data types, nullability
- Primary keys and foreign key relationships
- Row counts and basic statistics
Drift Metrics (drift_metrics.py)
Aggregates drift findings into metrics:
- Total open findings by severity
- Mean time to resolution
- Affected verified query count (via
find_affected_vq_ids()) - Confidence impact score
Architecture
The drift detection system operates on a scan schedule:
- The
drift_scannerruns periodically (configured interval) or on-demand - Findings are persisted in the
drift_findingsPostgreSQL table - Affected verified queries are identified and flagged
- Confidence scores for affected concepts are degraded
- Webhook notifications can be sent via
drift_webhook.py
Configuration
Technical Details
- Schema comparisons normalize data types for cross-dialect compatibility (
_normalize_type()) - The
compute_auto_resolves()function compares previous and new findings to automatically close resolved drift - The
deduplicate_findings()function prevents duplicate findings when scanning the same schema change multiple times - Drift findings include
affected_vq_ids— a list of verified query IDs whose tables/columns were affected