Source Registry
Class-to-source-to-table mapping with industry starter kits
The source registry maps knowledge model concepts (Object Types) to their physical locations in databases. It also provides industry-specific starter kits with pre-built ontologies to accelerate onboarding.
How It Works
Source Mapping
Every Object Type in the knowledge model must map to a physical source:
This three-level mapping (class -> source -> table) allows the same business concept to exist in multiple data sources without naming conflicts. The source registry decouples business semantics from database vendor specifics.
Starter Kit Loader (starter_kit_loader.py)
The StarterKitLoader class provides pre-built industry ontologies:
Available verticals:
- SaaS — 150 classes covering customers, subscriptions, revenue, billing, support
- Retail — 150 classes covering products, orders, inventory, customers, promotions
- Healthcare — 150 classes covering patients, encounters, claims, providers
Each starter kit includes:
- OWL ontology (classes, properties, domain definitions)
- Verified queries (curated SQL patterns)
- Source mapping templates (customizable to your schema)
Starter Kit Matcher (starter_kit_matcher.py)
Automatically matches your database schema against starter kit patterns:
- Scans your database tables and columns
- Compares against starter kit patterns using name similarity and structure matching
- Proposes Object Type mappings with confidence scores
- Human data steward reviews and approves/rejects each mapping
Source Models (source_models.py)
SQLAlchemy models for the source registry in PostgreSQL:
DataSource— Registered data sources (connection details, type, status)SourceMapping— Class-to-table mappingsColumnMapping— Property-to-column mappings with type annotations
Architecture
The source registry sits between the knowledge model and the database connectors:
The SQL validator uses the source registry to verify that generated SQL references valid tables and columns. The NLQ engine uses it to determine which data source to query.
Configuration
Data sources are registered via the REST API:
Technical Details
- Source mappings are cached in memory with TTL for performance
- The starter kit patterns in
starter_kit_patterns.pydefine canonical column patterns for each industry vertical - Trust tiers (
trust_tier_loader.py) assign confidence levels to source mappings:steward_confirmed(highest),auto_resolved(lower),unverified(lowest) - Auto-resolved mappings are read-only (SI-1 safety invariant) — writes require steward confirmation
- Join paths (
join_path_loader.py) store validated foreign key relationships between tables