Pramiti Docs

Drift Detection

Schema scanning, confidence degradation, and gap analysis

Drift detection monitors connected data sources for schema changes and degrades confidence scores when the knowledge model becomes stale. This ensures agents are warned when definitions may no longer be accurate, rather than silently returning incorrect results.

How It Works

DriftScanner (drift_scanner.py)

The DriftScanner class compares the live database schema against the stored schema snapshot:

scanner = DriftScanner()
findings = scanner.scan_data_source(
    source_name="production_postgres",
    workspace_id=workspace_id
)

The scan_data_source() method:

  1. Loads the stored schema snapshot from the last scan
  2. Crawls the live database schema using schema_crawler.py
  3. Compares the two using compare_schemas() — detecting added/removed/modified tables and columns
  4. Compares row counts using compare_row_counts() — detecting significant data volume changes
  5. Returns structured drift findings with severity levels

Finding types include:

  • Table added/removed — New tables appeared or existing tables were dropped
  • Column added/removed/modified — Column changes within existing tables
  • Type changed — Column data type modifications
  • Row count anomaly — Significant changes in table row counts

DriftResolver (drift_resolver.py)

Handles the resolution workflow for drift findings:

  • Auto-resolve — Non-breaking changes (e.g., new nullable column) can be auto-resolved
  • Manual review — Breaking changes (column removed, type changed) require human review
  • Status transitions — Validates finding lifecycle: open -> acknowledged -> resolved/deferred

Gap Analyzer (gap_analyzer.py)

Identifies gaps between the knowledge model and the database:

  • Tables that exist in the database but have no Object Type mapping
  • Columns that are used in verified queries but have been dropped
  • Foreign key relationships that changed since the last ontology update

Schema Crawler (schema_crawler.py)

Connects to data sources and extracts current schema metadata:

  • Table names, column names, data types, nullability
  • Primary keys and foreign key relationships
  • Row counts and basic statistics

Drift Metrics (drift_metrics.py)

Aggregates drift findings into metrics:

  • Total open findings by severity
  • Mean time to resolution
  • Affected verified query count (via find_affected_vq_ids())
  • Confidence impact score

Architecture

The drift detection system operates on a scan schedule:

  1. The drift_scanner runs periodically (configured interval) or on-demand
  2. Findings are persisted in the drift_findings PostgreSQL table
  3. Affected verified queries are identified and flagged
  4. Confidence scores for affected concepts are degraded
  5. Webhook notifications can be sent via drift_webhook.py

Configuration

# Drift scan scheduling is managed via the API or admin interface
# Webhook URL for drift notifications
EPISTOM_DRIFT_WEBHOOK_URL=https://your-webhook-endpoint

Technical Details

  • Schema comparisons normalize data types for cross-dialect compatibility (_normalize_type())
  • The compute_auto_resolves() function compares previous and new findings to automatically close resolved drift
  • The deduplicate_findings() function prevents duplicate findings when scanning the same schema change multiple times
  • Drift findings include affected_vq_ids — a list of verified query IDs whose tables/columns were affected

On this page