NLQ Engine
Query routing, SQL generation, validation gate, and self-correction
The NLQ Engine is the core orchestrator that transforms natural language questions into validated, executed SQL. It routes questions through multiple trust tiers, generates SQL with semantic context, validates it before execution, and self-corrects on failure.
How It Works
Query Router (query_router.py)
The query router classifies incoming questions by intent using keyword scoring and pattern matching. It returns a RouteDecision with the classified intent and matched ontology classes.
Intent classes:
| Intent | Description | LLM Required |
|---|---|---|
knowledge | Direct definition lookup (acronyms, concepts) | No |
vocabulary | Ontology context + LLM synthesis for plain English | Yes |
discovery | Schema exploration ("what tables exist?") | No |
analytical | Full SQL generation pipeline | Yes |
impact | Backward/forward propagation analysis | No |
ontop | SPARQL template match against Virtual Knowledge Graph | No |
metric | Composable metric computation | Yes |
The classify_intent() function uses keyword-to-class mappings that can be customized per workspace. Keywords are scored and matched against registered ontology classes to determine the best routing path.
SemanticEngine (engine.py)
The SemanticEngine class is the main orchestrator. Its answer() method:
- Sanitizes the input question (PII scrubbing, injection guards)
- Calls the query router to classify intent
- Routes to the appropriate trust tier handler
- Returns an
NLQResultwith SQL, data, confidence score, and metadata
Key features:
- Circuit breaker — Prevents cascading failures when LLM or database is down
- Soft timeout — Queries that exceed
EPISTOM_QUERY_TIMEOUT_SECONDSare cancelled - PII masking — Results are scanned for PII patterns and redacted
- Multi-source routing — Questions spanning multiple data sources can route through Trino federation
NLQ Engine (nlq_engine.py)
The NLQEngine class handles the LLM interaction for SQL generation:
- Builds a prompt with semantic context (schema, definitions, verified queries)
- Sends it to the configured LLM via the
LLMAdapter - Extracts SQL from the response
- Passes it through the SQL validator
Prompt Assembler (prompt_assembler.py)
Assembles the LLM prompt by combining:
- Database schema (relevant tables and columns)
- Ontology definitions (what concepts mean)
- Verified query examples (few-shot patterns)
- Business rules (constraints and aggregation rules)
- Question sanitization (removes PII patterns, injection attempts)
Self-Correction (self_correction.py)
When the SQL validator rejects generated SQL, the self-correction module:
- Analyzes the validation errors
- Appends error context to the prompt
- Asks the LLM to regenerate with corrections
- Re-validates the corrected SQL
This loop runs up to 2 times before returning a failure.
SQL Validator (sql_validator.py)
The pre-execution validation gate checks SQL before it touches the database:
- Column existence — Every referenced column must exist in the schema
- Table existence — Every referenced table must be in a registered source
- Join path validation — JOIN conditions must use valid foreign key relationships
- PII column check — Queries selecting PII-annotated columns are flagged or blocked
- Anti-pattern detection — Common LLM SQL mistakes (HAVING without GROUP BY, correlated subqueries, etc.)
- Multi-statement rejection — Only single SELECT statements are allowed
- SQL injection guard — DDL, DML, and system commands are rejected
The validator uses both AST parsing (via sqlglot) and regex fallback for maximum coverage.