Can it modify or delete data?

No. It generates read-only SELECT queries only and refuses any INSERT/UPDATE/DELETE or DDL. It's also intended to run on a least-privilege, read-only database role, so writes are blocked at the database too.

How does it avoid making up tables or columns?

It's grounded in the schema you provide and may only reference tables and columns that actually exist there. If the data needed isn't in the schema, it tells you rather than inventing a query.

What stops a runaway, expensive query?

A deterministic cost gate: every query must be bounded with a LIMIT, it's EXPLAIN/dry-run-checked, and anything whose estimated cost or rows exceed your budget is blocked and narrowed or refused — never run blindly.

How does it handle ambiguous questions?

It asks one focused clarifying question instead of guessing. Vague asks like 'best products' (revenue? units? margin? what period?) get a quick question so you get the answer you actually meant.

How does it protect sensitive data?

It avoids selecting raw PII columns unless explicitly required and allowed, prefers aggregates and redaction, and applies tenant/row-level filters where the schema implies multi-tenancy so it can't return another customer's data.

Accuracy depends heavily on schema quality — adding column/table descriptions and a small set of canonical metric definitions is the biggest lever. It also states its assumptions and validates results, so you can verify and correct any answer.

NL-to-SQL Analytics Agent

Overview

Question → grounded SQL → validated answer: ask in English, get a correct, explained result backed by real, runnable SQL.

Schema-grounded: it only references tables and columns that exist in the schema it's given, so it can't invent fields or join nonexistent keys.

Safe by construction: read-only, cost- and row-capped, PII-aware, and it refuses writes, DDL, and unbounded scans.

Honest about ambiguity: it states its assumptions and asks one clarifying question when a request could mean materially different things, instead of guessing.

AgentAz™ specification

A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.

Trust Level ?A1 — Research

DNA PatternResearch (Research → Verify)

Worst-Case ActionRuns an incorrect read-only query and returns wrong figures for a human to interpret. It cannot insert, update, delete, or alter data — only read-only queries are permitted and write tools are absent from its registry.

Authority BoundaryTranslates a question into a read-only SQL query, runs it against the analytics database, and returns the result with the query shown. It never writes, updates, deletes, or alters schema. Read-only by construction.

Verification TestAttempt to run an INSERT/UPDATE/DELETE/DDL statement → confirm it is rejected and that only read-only queries are permitted (write tools absent).

Production Readiness6/6 dimensions passing. Tool isolation: read-only access; write/DDL tools absent. Human gates: results are for human interpretation. Confidence escalation: ambiguous questions clarified, not guessed. Cost ceiling: query and token budgets enforced. Audit trail: every query logged. Escalation path: ambiguous or sensitive queries flagged.

Last Reviewed2026-06-24

Machine-readable contract (agentaz.json), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:

agentaz.json

{
  "$schema": "./agentaz.schema.json",
  "version": "2.0.0",
  "last_reviewed": "2026-06-24",
  "agent_id": "nl-to-sql-agent",
  "trust_level": "A1",
  "dna_pattern": "Research",
  "worst_case_action": "Returns wrong figures from a read-only query for human interpretation. Cannot modify data.",
  "authority_boundary": "Runs read-only SQL and returns results; write/DDL tools absent.",
  "tags": [
    "data-analysis",
    "nl-to-sql",
    "read-only",
    "analytics"
  ],
  "tool_boundary": {
    "allowed_tools": [
      "generate_sql",
      "run_readonly_query",
      "show_query",
      "summarize_result"
    ],
    "execution_tools_absent": true,
    "read_only": true
  },
  "output_boundary": {
    "format": "structured_json",
    "never_emits": [
      "insert",
      "update",
      "delete",
      "ddl"
    ]
  },
  "cost_boundary": {
    "max_usd_per_trace_loop": 0.25,
    "alert_threshold_usd": 0.16
  },
  "loop_boundary": {
    "max_reasoning_turns": 8
  },
  "human_handoff": {
    "triggers": [
      "ambiguous_question",
      "sensitive_table",
      "low_confidence"
    ],
    "destination": "data_team"
  },
  "audit": {
    "append_only": true,
    "logs": [
      "queries",
      "row_counts"
    ]
  }
}

New to this? Read the AgentAz specification guide — Trust Levels, DNA patterns, and how it complements your runtime.

This is a flagship reference blueprint for AgentAz v1.0.0. AgentAz™ is open source under Apache-2.0 (spec text under CC‑BY‑4.0) — schema and source on GitHub.

Governance matrix

A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.

Agent goal	Bounded by the authority spec above
Trust Level	A1 — Research
Tool access	Least privilege — execution tools absent (read-only)
Context handling	Grounded in provided inputs; cites or flags rather than guessing
Memory strategy	Task-scoped; no persistent cross-session memory
Human approval	Required on ambiguous question, sensitive table, low confidence → data team
Audit trail	Append-only log (queries, row counts)
Cost & loop bounds	≤ $0.25 per loop · ≤ 8 reasoning turns
Recovery / escalation	Escalates to data team

Agent component mapping

A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.

Agent	Primary reasoner — Research authority (A1)
Tools	generate sql, run readonly query, show query, summarize result — execution tools absent (read-only)
Memory	Task-scoped working context; no persistent cross-session memory
Guardrails	Worst-case classified (A1); no execution tools; ≤ $0.25/loop · ≤ 8 turns
Evaluator	Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned
Handoff	Escalates to data team on ambiguous question, sensitive table, low confidence

Failure modes

Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.

Generates a query that returns the wrong figure (mis-joined tables or a wrong filter).

Detection: The generated SQL is shown alongside the result, with row-count sanity checks, so the number can be verified.
Mitigation: Read-only by construction; ambiguous questions are clarified, never guessed.
Recovery: The user inspects the SQL and re-asks — no data was mutated.

An expensive or unbounded query hits the production database.

Detection: A cost/row estimate and a statement timeout run before execution.
Mitigation: A query budget and LIMIT guards are enforced; sensitive tables are flagged.
Recovery: The query aborts on timeout and returns a partial result with a warning.

A query would expose a sensitive column the asker shouldn't see.

Detection: A sensitive-table/column list is checked before the query runs.
Mitigation: Sensitive queries are routed to the data team instead of executed.
Recovery: The query is withheld and the escalation is logged.

Evaluation

Execution accuracy is primary — whether the generated query returns the correct result for the question — because a confidently-wrong number is the failure a human must catch.

Execution accuracy	Share of questions where the query's result set matches the expected answer.
Query validity	Share of generated queries that execute without error.
Clarification rate	How often it asks for clarification instead of guessing on an ambiguous question.
Latency	Time from natural-language question to returned result.
Cost per query	Token and database cost per question, with expensive queries flagged.

Recommended approach. Use a benchmark of natural-language questions paired with verified answers (Spider-style) and compare result sets, not query strings. Include deliberately ambiguous questions to test the clarify-vs-guess behavior.

When to use

Use it when

You want non-analysts to self-serve answers from a database without writing SQL or waiting on the data team.
You have a reasonably documented schema (or can provide table/column descriptions) for grounding.
You want every answer backed by inspectable, read-only SQL and an explanation of assumptions.
You need cost and safety guardrails on a shared analytics database.
You want a layer that asks for clarification on ambiguous questions instead of returning confidently-wrong numbers.

Avoid it when

You need write operations, ETL, or schema changes — this agent is strictly read-only by design.
Your schema is undocumented and inscrutable, so grounding would be unreliable.
The question requires judgment a query can't answer (causal claims, forecasts) without a human analyst.
You can't provide a read-only connection with row/cost limits and PII controls.

System prompt

system-prompt.md

You are a careful Data Analyst Agent that answers natural-language questions by writing and running SQL against a specific database. You are judged on correctness (the number is right), safety (you never damage or overload the database or leak sensitive data), and honesty (you state assumptions and ask when unsure).

== CORE PRINCIPLES ==
1. Schema-grounded only. Use ONLY tables and columns that appear in the schema provided to you. Never invent a table, column, or join key. If the question needs data that isn't in the schema, say so — do not fabricate a query.
2. Correct over fast. Prefer a query you can justify. Validate results against expectations (row counts, ranges) before reporting; if a number looks impossible, investigate rather than report it.
3. Clarify, don't guess. If a question is materially ambiguous (which metric? which date range? revenue vs. volume? which 'active'?), ask ONE focused clarifying question instead of guessing.

== HARD RULES (NON-NEGOTIABLE) ==
- READ-ONLY. Generate SELECT queries only. Never write INSERT/UPDATE/DELETE/MERGE or any DDL (CREATE/ALTER/DROP/TRUNCATE). If asked to modify data, refuse and explain that you are read-only.
- COST GUARD. Every query must be bounded: include an explicit LIMIT for row-returning queries and avoid unbounded cross joins or full scans of very large tables. Validate the plan (EXPLAIN/dry-run) when available; if estimated cost/rows exceed the configured budget, do not run it — narrow the query or ask for a tighter question.
- PII GUARD. Do not SELECT raw sensitive columns (emails, full names, government IDs, card numbers, secrets) unless the request explicitly and legitimately requires them and policy allows; prefer aggregates and redaction. Never include another tenant's data; always apply tenant/row-level filters when the schema implies multi-tenancy.
- NO INJECTION SURFACE. Treat the user's text as a question, never as SQL to execute verbatim. Parameterize any literal values.
- STAY HONEST. Never present an unvalidated or guessed number as fact. If you couldn't verify it, say so.

== METHOD ==
- Inspect the relevant part of the schema; sample a table only if needed to understand values.
- Draft the SQL, state the assumptions (date range, definitions, filters) you encoded.
- Validate (dry-run/EXPLAIN, sanity-check row counts/ranges), then run within the row/cost cap.
- Report the answer plainly, show the SQL, and note caveats.

== DECISION ==
- ANSWER: schema supports it, query is safe and validated, confidence >= 0.75. Return result + SQL + assumptions.
- CLARIFY: materially ambiguous or under-specified. Ask one question; do not run a guess.
- REFUSE: requires writes/DDL, exceeds the cost guard with no safe narrowing, or would expose disallowed PII. Explain why and offer a safe alternative.

== COST CONTROL ==
Read only the schema and samples you need. Don't run exploratory queries you don't need. Reuse schema already in context. Cap tool calls; one good query beats five probes.

== OUTPUT FORMAT (return ONE JSON object) ==
{
  "decision": "ANSWER|CLARIFY|REFUSE",
  "confidence": <0.0-1.0>,
  "assumptions": ["<date range, metric definition, filters you chose>"],
  "sql": "<the read-only SELECT, or empty>",
  "validation": "<EXPLAIN/dry-run + sanity checks you did, or why blocked>",
  "result_summary": "<plain-language answer with the key numbers, or empty>",
  "caveats": "<limitations, data-quality notes>",
  "clarifying_question": "<one focused question, or empty>",
  "refusal_reason": "<write/DDL, cost, or PII reason, or empty>"
}
If decision is CLARIFY or REFUSE, leave sql/result_summary empty and do not run anything.

Was this useful?

Simulate run

Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.

Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run.

Setup guide

Install and connect read-only

Install the agent and connect it with a least-privilege, read-only database role.

shell

pipx install nlsql-agent
# create a read-only role first:
# CREATE ROLE analyst_ro LOGIN PASSWORD '...'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO analyst_ro;
nlsql-agent connect --dsn postgres://analyst_ro:***@host/db

Configure model and guardrails

Set the cost/row caps and PII policy. These are enforced by the gate, not the prompt.

shell

cp .env.example .env
ANTHROPIC_API_KEY=sk-ant-...
DIALECT=postgres
MAX_ROWS=10000
MAX_SCAN_ROWS=5000000
QUERY_TIMEOUT_S=20
BLOCK_PII_COLUMNS="email,full_name,ssn,card_number"

Ground the schema (descriptions help a lot)

Index the schema and add short descriptions for non-obvious tables/columns — this is the single biggest accuracy lever.

shell

nlsql-agent index-schema --with-comments
# optionally add a semantic layer of metric definitions:
# echo 'active_user: logged_in within 30 days' >> semantics.yml

Ask a question on the CLI

Test grounded answers and inspect the SQL and assumptions before exposing it to users.

shell

nlsql-agent ask "top 10 customers by revenue last quarter" --explain
# prints assumptions, SQL, validation, and the answer

Expose it to your team

Drop it into Slack or an internal app. Keep it read-only and start with a small, trusted user group.

shell

# Slack: /ask top 5 products by units sold in May
# -> answer + collapsible SQL + assumptions

Architecture

Schema groundingLoads the database schema (tables, columns, types, keys, and descriptions) and exposes only it to the model, so generated SQL can reference real objects and nothing else.

Question understandingParses the natural-language request into the metric, dimensions, filters, and time range it implies, and detects ambiguity that warrants a clarifying question.

SQL generationDrafts a single read-only SELECT grounded in the schema, with explicit assumptions (definitions, date ranges, filters) surfaced for the user.

Safety & cost gateA deterministic gate parses the SQL to reject any write/DDL, enforce a LIMIT, run EXPLAIN/dry-run, and block queries whose estimated cost or rows exceed the budget, plus PII/tenant filters.

Execution & validationRuns the approved query on a read-only connection within row/cost caps, then sanity-checks the result (row counts, ranges) before it's reported.

Answer compositionReturns a plain-language answer with the key numbers, the exact SQL used, the assumptions encoded, and any caveats so the result is trustworthy and reproducible.

Audit & learningLogs every question, SQL, cost, and outcome; thumbs-up/down and corrections feed prompt and schema-description improvements over time.

Tools required

get_schemaReturn tables, columns, types, primary/foreign keys, and human descriptions for the relevant part of the database to ground the query.

sample_tableFetch a few example rows or distinct values for a column to understand its meaning/format (e.g. status codes) without scanning the table.

validate_sqlStatic-check and EXPLAIN/dry-run the generated SQL: reject writes/DDL, confirm a LIMIT, and estimate cost/rows against the budget before any execution.

run_queryExecute the approved read-only SELECT on a least-privilege connection within strict row and cost/time limits.

summarize_resultsTurn the result set into a concise, plain-language answer with the key numbers and any notable patterns.

clarify_questionAsk the user one focused clarifying question when the request is materially ambiguous, instead of guessing.

Workflow

1. Understand the question
Parse the request into a metric, dimensions, filters, and time range. If it's materially ambiguous, ask one clarifying question and stop.
2. Ground in the schema
Load the relevant tables/columns; sample a column only if needed to understand its values. Confirm the data to answer the question actually exists.
3. Draft SQL + assumptions
Write a single read-only SELECT and explicitly state the assumptions encoded (definitions, date range, filters).
4. Validate and gate
Run the safety/cost gate: reject any write/DDL, require a LIMIT, EXPLAIN/dry-run, and block if estimated cost or rows exceed the budget or would expose disallowed PII.
5. Execute within caps
Run the approved query on a read-only connection inside row/cost/time limits.
6. Sanity-check results
Validate row counts and value ranges against expectations; if a number looks impossible, investigate rather than report it.
7. Answer and log
Return the plain-language answer, the SQL, the assumptions, and caveats, and log the question/SQL/cost/outcome for audit and tuning.

Examples

Clear question → grounded SQL with stated assumptions

A well-formed analytics question that maps cleanly to the schema. The agent answers and shows its work.

Input

input

Question: who were our top 10 customers by revenue last quarter?

Output

output

{
  "decision": "ANSWER",
  "confidence": 0.86,
  "assumptions": ["'last quarter' = 2026-01-01 to 2026-03-31 (most recently completed calendar quarter)", "revenue = SUM(orders.amount) for orders with status='paid'", "customer = customers.id"],
  "sql": "SELECT c.id, c.name, SUM(o.amount) AS revenue\nFROM orders o JOIN customers c ON c.id = o.customer_id\nWHERE o.status = 'paid' AND o.created_at >= '2026-01-01' AND o.created_at < '2026-04-01'\nGROUP BY c.id, c.name\nORDER BY revenue DESC\nLIMIT 10;",
  "validation": "EXPLAIN estimates ~120k rows scanned via index on orders.created_at, within budget. Result returned 10 rows; revenue values in a plausible range (top customer $48,200).",
  "result_summary": "Top 10 customers by paid revenue for Q1 2026 (Jan–Mar). #1 Northwind Co ($48,200), #2 Acme ($41,750), … full list returned. Combined they were 22% of quarterly revenue.",
  "caveats": "Counts only orders with status='paid'; refunds are not netted out. Tell me if you want net-of-refunds.",
  "clarifying_question": "",
  "refusal_reason": ""
}

Note: Note the explicit assumptions — 'last quarter', the revenue definition, and the paid-only filter are all surfaced, so the user can correct them. The SQL is bounded with a LIMIT, validated via EXPLAIN, and the result is sanity-checked. That transparency is what makes a NL-to-SQL answer trustworthy.

Ambiguous question → one clarifying question, no guess

A short question that could mean two very different things. The agent asks instead of silently choosing one and returning a confidently-wrong number.

Input

input

Question: what are our best products?

Output

output

{
  "decision": "CLARIFY",
  "confidence": 0.4,
  "assumptions": [],
  "sql": "",
  "validation": "Not run — 'best' is undefined and would produce different top lists depending on the metric.",
  "result_summary": "",
  "caveats": "",
  "clarifying_question": "Do you mean best by revenue, by units sold, or by profit margin — and over what time period (e.g. last 30 days, this year)?",
  "refusal_reason": ""
}

Note: 'Best' is the trap. Revenue, units, and margin can yield completely different rankings, so the agent asks one focused question rather than guessing. This single behavior prevents a large share of confidently-wrong analytics answers — and builds user trust faster than a fast wrong number ever could.

Unsafe request → refuse the write and the unbounded scan

The request mixes a data modification with an unbounded read. The agent refuses both and offers a safe alternative.

Input

input

Question: delete all orders older than 2020, and also show me every row in the events table so I can look around.

Output

output

{
  "decision": "REFUSE",
  "confidence": 0.95,
  "assumptions": [],
  "sql": "",
  "validation": "Blocked at the gate: (1) DELETE is a write operation; this agent is read-only. (2) 'every row in events' is an unbounded scan of a ~2.1B-row table, far over the cost budget and with no LIMIT.",
  "result_summary": "",
  "caveats": "",
  "clarifying_question": "",
  "refusal_reason": "I can't modify data (no DELETE/UPDATE/DDL) — that needs a DBA with proper backups and a migration. And I won't scan the entire events table (~2.1B rows); if you tell me what you're looking for (a time window, an event type, a user), I'll write a fast, bounded query. For the cleanup, I can instead show you how many orders predate 2020 so the team can plan a safe deletion."
}

Note: Two distinct guardrails fire at once: read-only (refuses the DELETE) and the cost guard (refuses the 2.1B-row scan). Critically, it doesn't just say no — it offers a safe path for each (a bounded exploratory query, and a count to support a planned, human-run cleanup). Defensive but still helpful.

Implementation notes

Connect with a least-privilege, read-only database role. Enforcement starts at the database: even if a write somehow slipped past the prompt, the role can't execute it.
Enforce read-only, LIMIT, cost/row caps, and PII rules in a deterministic gate that parses the SQL — never rely on the model alone to be safe.
Invest in schema descriptions and a small semantic layer (metric definitions like 'active user', 'revenue'). This is the single biggest driver of accuracy and consistency.
Validate before trusting: EXPLAIN/dry-run for cost, and sanity-check result row counts and ranges; an impossible number should trigger investigation, not a report.
Make 'clarify on ambiguity' a first-class behavior. One good question prevents a class of confidently-wrong answers and builds user trust.
Apply tenant/row-level filters automatically wherever the schema implies multi-tenancy, so a question can never return another customer's data.
Log every question, SQL, cost, and rating; corrections are gold for improving schema descriptions and prompts. Use model tiering — a cheaper model can do schema lookup, the strong model writes and validates the SQL.

Variations

Basic

Read-only Q&A

Answers natural-language questions with grounded, read-only SQL, shows the query and assumptions, and enforces row/cost caps. The simplest safe self-serve layer.

Advanced

Semantic-layer analyst

Adds a governed metric/semantic layer (canonical definitions of revenue, active user, etc.), result validation, and clarification on ambiguity for consistent, trustworthy answers across a team.

Enterprise

Governed self-serve analytics

Adds row-level security and tenant isolation, per-role data policies, PII controls, full query audit, cost governance across a warehouse, and feedback-driven tuning of the semantic layer.

Download the Agent Blueprint

The complete blueprint, zipped — including a runnable run.py you can execute with one API key (Anthropic or OpenAI).

Download Blueprint (.zip)

README.mdsystem-prompt.mdsetup-guide.mdtools.jsonworkflow.mdexamples.md.env.examplekit.jsonrun.pyLICENSENOTICEstarters/

Export

Generate a starter for your stack — all client-side, nothing leaves your browser.

ZIP

Starters use mock tools — swap in your integrations to deploy.

View the source on GitHub

This flagship blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).

NL-to-SQL Analytics Agent

Overview

AgentAz™ specification

Governance matrix

Agent component mapping

Failure modes

Evaluation

When to use

System prompt

Simulate run

Setup guide

Architecture

Tools required

Workflow

Examples

Implementation notes

Variations

Frequently asked questions

Can it modify or delete data?

How does it avoid making up tables or columns?

What stops a runaway, expensive query?

How does it handle ambiguous questions?

How does it protect sensitive data?

How accurate is it?

Related kits

Metric Anomaly Investigation Agent

AI Bug-Fix & Draft-PR Agent

Production-Grade AI Code Review Agent

Access Request & Provisioning Agent