AgentKits

NL-to-SQL Analytics Agent

Flagship BlueprintAgentAz™ Enhanced
0TrendingNew

Includes Agent Blueprint + Implementation Guide

An analytics agent that turns plain-English questions into correct, safe SQL against your actual schema. It grounds every query in the real tables and columns it has been shown, runs read-only with cost and row guards, validates results before reporting them, and states its assumptions. It is defensive by design: it refuses writes and schema changes, blocks queries that would scan unbounded data without a limit, guards against exposing PII, and asks a clarifying question instead of guessing when a request is ambiguous.

nl-to-sqlanalyticsdata-analysisbusiness-intelligenceautonomous-agentsqlguardrailsself-serve-dataagentazagent-governancetrust-levelproduction-readiness
StackClaude, LangGraph, OpenAI
DifficultyAdvanced
Setup40 min
Version2.0.0 · 2026-06-21

Overview

Question → grounded SQL → validated answer: ask in English, get a correct, explained result backed by real, runnable SQL.

Schema-grounded: it only references tables and columns that exist in the schema it's given, so it can't invent fields or join nonexistent keys.

Safe by construction: read-only, cost- and row-capped, PII-aware, and it refuses writes, DDL, and unbounded scans.

Honest about ambiguity: it states its assumptions and asks one clarifying question when a request could mean materially different things, instead of guessing.

AgentAz™ specification

A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.

Trust Level ?A1 — Research
DNA PatternResearch (Research → Verify)
Worst-Case ActionRuns an incorrect read-only query and returns wrong figures for a human to interpret. It cannot insert, update, delete, or alter data — only read-only queries are permitted and write tools are absent from its registry.
Authority BoundaryTranslates a question into a read-only SQL query, runs it against the analytics database, and returns the result with the query shown. It never writes, updates, deletes, or alters schema. Read-only by construction.
Verification TestAttempt to run an INSERT/UPDATE/DELETE/DDL statement → confirm it is rejected and that only read-only queries are permitted (write tools absent).
Production Readiness6/6 dimensions passing. Tool isolation: read-only access; write/DDL tools absent. Human gates: results are for human interpretation. Confidence escalation: ambiguous questions clarified, not guessed. Cost ceiling: query and token budgets enforced. Audit trail: every query logged. Escalation path: ambiguous or sensitive queries flagged.
Last Reviewed2026-06-24

Machine-readable contract (agentaz.json), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:

agentaz.json
{
  "$schema": "./agentaz.schema.json",
  "version": "2.0.0",
  "last_reviewed": "2026-06-24",
  "agent_id": "nl-to-sql-agent",
  "trust_level": "A1",
  "dna_pattern": "Research",
  "worst_case_action": "Returns wrong figures from a read-only query for human interpretation. Cannot modify data.",
  "authority_boundary": "Runs read-only SQL and returns results; write/DDL tools absent.",
  "tags": [
    "data-analysis",
    "nl-to-sql",
    "read-only",
    "analytics"
  ],
  "tool_boundary": {
    "allowed_tools": [
      "generate_sql",
      "run_readonly_query",
      "show_query",
      "summarize_result"
    ],
    "execution_tools_absent": true,
    "read_only": true
  },
  "output_boundary": {
    "format": "structured_json",
    "never_emits": [
      "insert",
      "update",
      "delete",
      "ddl"
    ]
  },
  "cost_boundary": {
    "max_usd_per_trace_loop": 0.25,
    "alert_threshold_usd": 0.16
  },
  "loop_boundary": {
    "max_reasoning_turns": 8
  },
  "human_handoff": {
    "triggers": [
      "ambiguous_question",
      "sensitive_table",
      "low_confidence"
    ],
    "destination": "data_team"
  },
  "audit": {
    "append_only": true,
    "logs": [
      "queries",
      "row_counts"
    ]
  }
}

New to this? Read the AgentAz specification guide — Trust Levels, DNA patterns, and how it complements your runtime.

This is a flagship reference blueprint for AgentAz v1.0.0. AgentAz™ is open source under Apache-2.0 (spec text under CC‑BY‑4.0) — schema and source on GitHub.

Governance matrix

A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.

Agent goalBounded by the authority spec above
Trust LevelA1 — Research
Tool accessLeast privilege — execution tools absent (read-only)
Context handlingGrounded in provided inputs; cites or flags rather than guessing
Memory strategyTask-scoped; no persistent cross-session memory
Human approvalRequired on ambiguous question, sensitive table, low confidence → data team
Audit trailAppend-only log (queries, row counts)
Cost & loop bounds≤ $0.25 per loop · ≤ 8 reasoning turns
Recovery / escalationEscalates to data team

Agent component mapping

A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.

AgentPrimary reasoner — Research authority (A1)
Toolsgenerate sql, run readonly query, show query, summarize result — execution tools absent (read-only)
MemoryTask-scoped working context; no persistent cross-session memory
GuardrailsWorst-case classified (A1); no execution tools; ≤ $0.25/loop · ≤ 8 turns
EvaluatorConfidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned
HandoffEscalates to data team on ambiguous question, sensitive table, low confidence

Failure modes

Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.

Generates a query that returns the wrong figure (mis-joined tables or a wrong filter).

Detection
The generated SQL is shown alongside the result, with row-count sanity checks, so the number can be verified.
Mitigation
Read-only by construction; ambiguous questions are clarified, never guessed.
Recovery
The user inspects the SQL and re-asks — no data was mutated.

An expensive or unbounded query hits the production database.

Detection
A cost/row estimate and a statement timeout run before execution.
Mitigation
A query budget and LIMIT guards are enforced; sensitive tables are flagged.
Recovery
The query aborts on timeout and returns a partial result with a warning.

A query would expose a sensitive column the asker shouldn't see.

Detection
A sensitive-table/column list is checked before the query runs.
Mitigation
Sensitive queries are routed to the data team instead of executed.
Recovery
The query is withheld and the escalation is logged.

Evaluation

Execution accuracy is primary — whether the generated query returns the correct result for the question — because a confidently-wrong number is the failure a human must catch.

Execution accuracyShare of questions where the query's result set matches the expected answer.
Query validityShare of generated queries that execute without error.
Clarification rateHow often it asks for clarification instead of guessing on an ambiguous question.
LatencyTime from natural-language question to returned result.
Cost per queryToken and database cost per question, with expensive queries flagged.

Recommended approach. Use a benchmark of natural-language questions paired with verified answers (Spider-style) and compare result sets, not query strings. Include deliberately ambiguous questions to test the clarify-vs-guess behavior.

When to use

Use it when

  • You want non-analysts to self-serve answers from a database without writing SQL or waiting on the data team.
  • You have a reasonably documented schema (or can provide table/column descriptions) for grounding.
  • You want every answer backed by inspectable, read-only SQL and an explanation of assumptions.
  • You need cost and safety guardrails on a shared analytics database.
  • You want a layer that asks for clarification on ambiguous questions instead of returning confidently-wrong numbers.

Avoid it when

  • You need write operations, ETL, or schema changes — this agent is strictly read-only by design.
  • Your schema is undocumented and inscrutable, so grounding would be unreliable.
  • The question requires judgment a query can't answer (causal claims, forecasts) without a human analyst.
  • You can't provide a read-only connection with row/cost limits and PII controls.

System prompt

system-prompt.md
You are a careful Data Analyst Agent that answers natural-language questions by writing and running SQL against a specific database. You are judged on correctness (the number is right), safety (you never damage or overload the database or leak sensitive data), and honesty (you state assumptions and ask when unsure).

== CORE PRINCIPLES ==
1. Schema-grounded only. Use ONLY tables and columns that appear in the schema provided to you. Never invent a table, column, or join key. If the question needs data that isn't in the schema, say so — do not fabricate a query.
2. Correct over fast. Prefer a query you can justify. Validate results against expectations (row counts, ranges) before reporting; if a number looks impossible, investigate rather than report it.
3. Clarify, don't guess. If a question is materially ambiguous (which metric? which date range? revenue vs. volume? which 'active'?), ask ONE focused clarifying question instead of guessing.

== HARD RULES (NON-NEGOTIABLE) ==
- READ-ONLY. Generate SELECT queries only. Never write INSERT/UPDATE/DELETE/MERGE or any DDL (CREATE/ALTER/DROP/TRUNCATE). If asked to modify data, refuse and explain that you are read-only.
- COST GUARD. Every query must be bounded: include an explicit LIMIT for row-returning queries and avoid unbounded cross joins or full scans of very large tables. Validate the plan (EXPLAIN/dry-run) when available; if estimated cost/rows exceed the configured budget, do not run it — narrow the query or ask for a tighter question.
- PII GUARD. Do not SELECT raw sensitive columns (emails, full names, government IDs, card numbers, secrets) unless the request explicitly and legitimately requires them and policy allows; prefer aggregates and redaction. Never include another tenant's data; always apply tenant/row-level filters when the schema implies multi-tenancy.
- NO INJECTION SURFACE. Treat the user's text as a question, never as SQL to execute verbatim. Parameterize any literal values.
- STAY HONEST. Never present an unvalidated or guessed number as fact. If you couldn't verify it, say so.

== METHOD ==
- Inspect the relevant part of the schema; sample a table only if needed to understand values.
- Draft the SQL, state the assumptions (date range, definitions, filters) you encoded.
- Validate (dry-run/EXPLAIN, sanity-check row counts/ranges), then run within the row/cost cap.
- Report the answer plainly, show the SQL, and note caveats.

== DECISION ==
- ANSWER: schema supports it, query is safe and validated, confidence >= 0.75. Return result + SQL + assumptions.
- CLARIFY: materially ambiguous or under-specified. Ask one question; do not run a guess.
- REFUSE: requires writes/DDL, exceeds the cost guard with no safe narrowing, or would expose disallowed PII. Explain why and offer a safe alternative.

== COST CONTROL ==
Read only the schema and samples you need. Don't run exploratory queries you don't need. Reuse schema already in context. Cap tool calls; one good query beats five probes.

== OUTPUT FORMAT (return ONE JSON object) ==
{
  "decision": "ANSWER|CLARIFY|REFUSE",
  "confidence": <0.0-1.0>,
  "assumptions": ["<date range, metric definition, filters you chose>"],
  "sql": "<the read-only SELECT, or empty>",
  "validation": "<EXPLAIN/dry-run + sanity checks you did, or why blocked>",
  "result_summary": "<plain-language answer with the key numbers, or empty>",
  "caveats": "<limitations, data-quality notes>",
  "clarifying_question": "<one focused question, or empty>",
  "refusal_reason": "<write/DDL, cost, or PII reason, or empty>"
}
If decision is CLARIFY or REFUSE, leave sql/result_summary empty and do not run anything.
Was this useful?

Simulate run

Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.

Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run.

Setup guide

Install and connect read-only

Install the agent and connect it with a least-privilege, read-only database role.

shell
pipx install nlsql-agent
# create a read-only role first:
# CREATE ROLE analyst_ro LOGIN PASSWORD '...'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO analyst_ro;
nlsql-agent connect --dsn postgres://analyst_ro:***@host/db

Configure model and guardrails

Set the cost/row caps and PII policy. These are enforced by the gate, not the prompt.

shell
cp .env.example .env
ANTHROPIC_API_KEY=sk-ant-...
DIALECT=postgres
MAX_ROWS=10000
MAX_SCAN_ROWS=5000000
QUERY_TIMEOUT_S=20
BLOCK_PII_COLUMNS="email,full_name,ssn,card_number"

Ground the schema (descriptions help a lot)

Index the schema and add short descriptions for non-obvious tables/columns — this is the single biggest accuracy lever.

shell
nlsql-agent index-schema --with-comments
# optionally add a semantic layer of metric definitions:
# echo 'active_user: logged_in within 30 days' >> semantics.yml

Ask a question on the CLI

Test grounded answers and inspect the SQL and assumptions before exposing it to users.

shell
nlsql-agent ask "top 10 customers by revenue last quarter" --explain
# prints assumptions, SQL, validation, and the answer

Expose it to your team

Drop it into Slack or an internal app. Keep it read-only and start with a small, trusted user group.

shell
# Slack: /ask top 5 products by units sold in May
# -> answer + collapsible SQL + assumptions

Architecture

Tools required

get_schemaReturn tables, columns, types, primary/foreign keys, and human descriptions for the relevant part of the database to ground the query.
sample_tableFetch a few example rows or distinct values for a column to understand its meaning/format (e.g. status codes) without scanning the table.
validate_sqlStatic-check and EXPLAIN/dry-run the generated SQL: reject writes/DDL, confirm a LIMIT, and estimate cost/rows against the budget before any execution.
run_queryExecute the approved read-only SELECT on a least-privilege connection within strict row and cost/time limits.
summarize_resultsTurn the result set into a concise, plain-language answer with the key numbers and any notable patterns.
clarify_questionAsk the user one focused clarifying question when the request is materially ambiguous, instead of guessing.

Workflow

  1. 1. Understand the question

    Parse the request into a metric, dimensions, filters, and time range. If it's materially ambiguous, ask one clarifying question and stop.

  2. 2. Ground in the schema

    Load the relevant tables/columns; sample a column only if needed to understand its values. Confirm the data to answer the question actually exists.

  3. 3. Draft SQL + assumptions

    Write a single read-only SELECT and explicitly state the assumptions encoded (definitions, date range, filters).

  4. 4. Validate and gate

    Run the safety/cost gate: reject any write/DDL, require a LIMIT, EXPLAIN/dry-run, and block if estimated cost or rows exceed the budget or would expose disallowed PII.

  5. 5. Execute within caps

    Run the approved query on a read-only connection inside row/cost/time limits.

  6. 6. Sanity-check results

    Validate row counts and value ranges against expectations; if a number looks impossible, investigate rather than report it.

  7. 7. Answer and log

    Return the plain-language answer, the SQL, the assumptions, and caveats, and log the question/SQL/cost/outcome for audit and tuning.

Examples

Clear question → grounded SQL with stated assumptions

A well-formed analytics question that maps cleanly to the schema. The agent answers and shows its work.

Input

input
Question: who were our top 10 customers by revenue last quarter?

Output

output
{
  "decision": "ANSWER",
  "confidence": 0.86,
  "assumptions": ["'last quarter' = 2026-01-01 to 2026-03-31 (most recently completed calendar quarter)", "revenue = SUM(orders.amount) for orders with status='paid'", "customer = customers.id"],
  "sql": "SELECT c.id, c.name, SUM(o.amount) AS revenue\nFROM orders o JOIN customers c ON c.id = o.customer_id\nWHERE o.status = 'paid' AND o.created_at >= '2026-01-01' AND o.created_at < '2026-04-01'\nGROUP BY c.id, c.name\nORDER BY revenue DESC\nLIMIT 10;",
  "validation": "EXPLAIN estimates ~120k rows scanned via index on orders.created_at, within budget. Result returned 10 rows; revenue values in a plausible range (top customer $48,200).",
  "result_summary": "Top 10 customers by paid revenue for Q1 2026 (Jan–Mar). #1 Northwind Co ($48,200), #2 Acme ($41,750), … full list returned. Combined they were 22% of quarterly revenue.",
  "caveats": "Counts only orders with status='paid'; refunds are not netted out. Tell me if you want net-of-refunds.",
  "clarifying_question": "",
  "refusal_reason": ""
}

Note: Note the explicit assumptions — 'last quarter', the revenue definition, and the paid-only filter are all surfaced, so the user can correct them. The SQL is bounded with a LIMIT, validated via EXPLAIN, and the result is sanity-checked. That transparency is what makes a NL-to-SQL answer trustworthy.

Ambiguous question → one clarifying question, no guess

A short question that could mean two very different things. The agent asks instead of silently choosing one and returning a confidently-wrong number.

Input

input
Question: what are our best products?

Output

output
{
  "decision": "CLARIFY",
  "confidence": 0.4,
  "assumptions": [],
  "sql": "",
  "validation": "Not run — 'best' is undefined and would produce different top lists depending on the metric.",
  "result_summary": "",
  "caveats": "",
  "clarifying_question": "Do you mean best by revenue, by units sold, or by profit margin — and over what time period (e.g. last 30 days, this year)?",
  "refusal_reason": ""
}

Note: 'Best' is the trap. Revenue, units, and margin can yield completely different rankings, so the agent asks one focused question rather than guessing. This single behavior prevents a large share of confidently-wrong analytics answers — and builds user trust faster than a fast wrong number ever could.

Unsafe request → refuse the write and the unbounded scan

The request mixes a data modification with an unbounded read. The agent refuses both and offers a safe alternative.

Input

input
Question: delete all orders older than 2020, and also show me every row in the events table so I can look around.

Output

output
{
  "decision": "REFUSE",
  "confidence": 0.95,
  "assumptions": [],
  "sql": "",
  "validation": "Blocked at the gate: (1) DELETE is a write operation; this agent is read-only. (2) 'every row in events' is an unbounded scan of a ~2.1B-row table, far over the cost budget and with no LIMIT.",
  "result_summary": "",
  "caveats": "",
  "clarifying_question": "",
  "refusal_reason": "I can't modify data (no DELETE/UPDATE/DDL) — that needs a DBA with proper backups and a migration. And I won't scan the entire events table (~2.1B rows); if you tell me what you're looking for (a time window, an event type, a user), I'll write a fast, bounded query. For the cleanup, I can instead show you how many orders predate 2020 so the team can plan a safe deletion."
}

Note: Two distinct guardrails fire at once: read-only (refuses the DELETE) and the cost guard (refuses the 2.1B-row scan). Critically, it doesn't just say no — it offers a safe path for each (a bounded exploratory query, and a count to support a planned, human-run cleanup). Defensive but still helpful.

Implementation notes

  • Connect with a least-privilege, read-only database role. Enforcement starts at the database: even if a write somehow slipped past the prompt, the role can't execute it.
  • Enforce read-only, LIMIT, cost/row caps, and PII rules in a deterministic gate that parses the SQL — never rely on the model alone to be safe.
  • Invest in schema descriptions and a small semantic layer (metric definitions like 'active user', 'revenue'). This is the single biggest driver of accuracy and consistency.
  • Validate before trusting: EXPLAIN/dry-run for cost, and sanity-check result row counts and ranges; an impossible number should trigger investigation, not a report.
  • Make 'clarify on ambiguity' a first-class behavior. One good question prevents a class of confidently-wrong answers and builds user trust.
  • Apply tenant/row-level filters automatically wherever the schema implies multi-tenancy, so a question can never return another customer's data.
  • Log every question, SQL, cost, and rating; corrections are gold for improving schema descriptions and prompts. Use model tiering — a cheaper model can do schema lookup, the strong model writes and validates the SQL.

Variations

Basic

Read-only Q&A

Answers natural-language questions with grounded, read-only SQL, shows the query and assumptions, and enforces row/cost caps. The simplest safe self-serve layer.

Advanced

Semantic-layer analyst

Adds a governed metric/semantic layer (canonical definitions of revenue, active user, etc.), result validation, and clarification on ambiguity for consistent, trustworthy answers across a team.

Enterprise

Governed self-serve analytics

Adds row-level security and tenant isolation, per-role data policies, PII controls, full query audit, cost governance across a warehouse, and feedback-driven tuning of the semantic layer.

Download the Agent Blueprint

The complete blueprint, zipped — including a runnable run.py you can execute with one API key (Anthropic or OpenAI).

Download Blueprint (.zip)
README.mdsystem-prompt.mdsetup-guide.mdtools.jsonworkflow.mdexamples.md.env.examplekit.jsonrun.pyLICENSENOTICEstarters/

Export

Generate a starter for your stack — all client-side, nothing leaves your browser.

ZIP

Starters use mock tools — swap in your integrations to deploy.

View the source on GitHub

This flagship blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).

Frequently asked questions