Source Registry

The source registry maps knowledge model concepts (Object Types) to their physical locations in databases. It also provides industry-specific starter kits with pre-built ontologies to accelerate onboarding.

How It Works

Source Mapping

Every Object Type in the knowledge model must map to a physical source:

Object Type: Customer
  └── Source: production_postgres
       └── Table: public.customers
            └── Columns: id, name, email, created_at, ...

This three-level mapping (class -> source -> table) allows the same business concept to exist in multiple data sources without naming conflicts. The source registry decouples business semantics from database vendor specifics.

Starter Kit Loader (`starter_kit_loader.py`)

The StarterKitLoader class provides pre-built industry ontologies:

loader = StarterKitLoader(vertical="saas", include_base=True)
loader.load_into_graph(graph_uri="workspace-1", oxigraph_endpoint="http://localhost:7878")
loader.load_verified_queries_to_ks(workspace_id=workspace_id)

Available verticals:

SaaS — 150 classes covering customers, subscriptions, revenue, billing, support
Retail — 150 classes covering products, orders, inventory, customers, promotions
Healthcare — 150 classes covering patients, encounters, claims, providers

Each starter kit includes:

OWL ontology (classes, properties, domain definitions)
Verified queries (curated SQL patterns)
Source mapping templates (customizable to your schema)

Starter Kit Matcher (`starter_kit_matcher.py`)

Automatically matches your database schema against starter kit patterns:

Scans your database tables and columns
Compares against starter kit patterns using name similarity and structure matching
Proposes Object Type mappings with confidence scores
Human data steward reviews and approves/rejects each mapping

Source Models (`source_models.py`)

SQLAlchemy models for the source registry in PostgreSQL:

DataSource — Registered data sources (connection details, type, status)
SourceMapping — Class-to-table mappings
ColumnMapping — Property-to-column mappings with type annotations

Architecture

The source registry sits between the knowledge model and the database connectors:

Knowledge Model (OWL)
    ↕
Source Registry (PostgreSQL)
    ↕
Database Connectors (PostgreSQL, Snowflake, BigQuery, Oracle, Athena)

The SQL validator uses the source registry to verify that generated SQL references valid tables and columns. The NLQ engine uses it to determine which data source to query.

Configuration

Data sources are registered via the REST API:

POST /api/v1/sources
{
  "name": "production_postgres",
  "type": "postgresql",
  "connection_string": "postgresql://user:pass@host:5432/db",
  "workspace_id": "ws-1"
}

Technical Details

Source mappings are cached in memory with TTL for performance
The starter kit patterns in starter_kit_patterns.py define canonical column patterns for each industry vertical
Trust tiers (trust_tier_loader.py) assign confidence levels to source mappings: steward_confirmed (highest), auto_resolved (lower), unverified (lowest)
Auto-resolved mappings are read-only (SI-1 safety invariant) — writes require steward confirmation
Join paths (join_path_loader.py) store validated foreign key relationships between tables

On this page