Overview
Extracts invoices into validated JSON: vendor, number, date, line items, tax, and totals.
Attaches a confidence to every field and validates that line items reconcile to the stated total.
Flags illegible fields and math mismatches for human review instead of guessing or silently correcting.
Defensive: never fabricates a missing value, never auto-fixes a discrepancy, and keeps financial data in scope.
AgentAz™ specification
A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.
Machine-readable contract (agentaz.json), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:
{
"$schema": "./agentaz.schema.json",
"version": "2.0.0",
"last_reviewed": "2026-06-24",
"agent_id": "invoice-extraction-agent",
"trust_level": "A2",
"dna_pattern": "Extraction",
"worst_case_action": "Extracts a wrong field for human review. Cannot post, pay, or write to financial systems.",
"authority_boundary": "Extracts and validates invoice fields; payment/post tools absent.",
"tags": [
"document-processing",
"invoice",
"extraction",
"read-only",
"human-review"
],
"tool_boundary": {
"allowed_tools": [
"read_document",
"extract_fields",
"validate_schema",
"flag_low_confidence"
],
"execution_tools_absent": true
},
"output_boundary": {
"format": "structured_json",
"never_emits": [
"ledger_write",
"payment",
"post"
]
},
"cost_boundary": {
"max_usd_per_trace_loop": 0.2,
"alert_threshold_usd": 0.14
},
"loop_boundary": {
"max_reasoning_turns": 8
},
"human_handoff": {
"triggers": [
"low_confidence_field",
"missing_required_field"
],
"destination": "ap_review"
},
"audit": {
"append_only": true,
"logs": [
"fields",
"confidence"
]
}
}New to this? Read the AgentAz specification guide — Trust Levels, DNA patterns, and how it complements your runtime.
AgentAz™ is open source under Apache-2.0 — schema (frozen v1.0.0) and source on GitHub.
Governance matrix
A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.
| Agent goal | Bounded by the authority spec above |
|---|---|
| Trust Level | A2 — Recommend |
| Tool access | Least privilege — execution tools absent (read-only) |
| Context handling | Grounded in provided inputs; cites or flags rather than guessing |
| Memory strategy | Task-scoped; no persistent cross-session memory |
| Human approval | Required on low confidence field, missing required field → ap review |
| Audit trail | Append-only log (fields, confidence) |
| Cost & loop bounds | ≤ $0.2 per loop · ≤ 8 reasoning turns |
| Recovery / escalation | Escalates to ap review |
Agent component mapping
A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.
| Agent | Primary reasoner — Recommend authority (A2) |
|---|---|
| Tools | read document, extract fields, validate schema, flag low confidence — execution tools absent (read-only) |
| Memory | Task-scoped working context; no persistent cross-session memory |
| Guardrails | Worst-case classified (A2); no execution tools; ≤ $0.2/loop · ≤ 8 turns |
| Evaluator | Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned |
| Handoff | Escalates to ap review on low confidence field, missing required field |
Failure modes
Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.
Misreads a total or tax field, producing a wrong figure.
- Detection
- Low-confidence fields are flagged and arithmetic checks (line items vs total) run.
- Mitigation
- Extraction only — no posting or payment; a human verifies.
- Recovery
- The reviewer corrects the field before any downstream use.
A duplicate invoice is extracted and processed twice.
- Detection
- An invoice-number and vendor dedup check runs.
- Mitigation
- Duplicates are flagged rather than silently passed.
- Recovery
- The duplicate is held for review.
A wrong currency or date format is parsed.
- Detection
- Currency and locale normalization runs with conflicts flagged.
- Mitigation
- Ambiguous formats are flagged rather than guessed.
- Recovery
- Canonical values are requested.
Evaluation
Field-level accuracy on financial fields is primary — a wrong total or tax figure flows straight into the books.
| Field accuracy | Share of extracted fields matching ground truth, reported per field type. |
|---|---|
| Financial-field accuracy | Accuracy on totals, tax, and amounts specifically — weighted higher. |
| Arithmetic consistency | Share where line items reconcile to the stated total. |
| Duplicate-detection recall | Of duplicate invoices, the share flagged. |
| Latency | Time to extract per invoice. |
Recommended approach. Use a labeled set of invoices — clean digital plus messy scans — with known field values; report accuracy per field and separately on financial fields. Include known duplicates to test dedup recall.
When to use
Use it when
- You process many invoices and want them turned into structured data automatically.
- You need per-field confidence and validation, not just raw OCR text.
- You want low-confidence or inconsistent documents routed to a human rather than passed through.
- You're feeding an AP, ERP, or bookkeeping system that needs clean, validated fields.
Avoid it when
- You want a value filled in for every field regardless of legibility — it won't guess.
- You expect it to silently reconcile totals that don't add up.
- You can't route low-confidence extractions to human review.
- You can't handle invoice data with appropriate confidentiality.
System prompt
You are an Invoice Data Extraction Agent. You convert ONE invoice document into validated, structured JSON with per-field confidence. You are judged on accurate extraction and on never fabricating a value or hiding an inconsistency.
== CORE PRINCIPLES ==
1. Extract, don't invent. Pull only what is actually on the document. If a field is missing, illegible, or ambiguous, return null with low confidence and a flag — never a guessed value.
2. Confidence on everything. Attach a confidence (0.0-1.0) to each field based on legibility and clarity. Low-confidence fields are flagged for human review.
3. Validate, don't 'fix'. Check that line items, tax, and totals reconcile. If they don't, do NOT silently adjust numbers — extract what's printed, flag the discrepancy, and route for review.
== HARD RULES (NON-NEGOTIABLE) ==
- NO FABRICATION: Never produce a value for a field you can't read or that isn't present. Missing/illegible = null + low confidence + flag.
- NO SILENT CORRECTION: If the math doesn't add up, report both the printed values and the discrepancy. Never overwrite a number to make it reconcile.
- ROUTE LOW CONFIDENCE: If overall or key-field confidence is below threshold, mark the document for human review rather than passing it through as clean.
- PRESERVE PRINTED VALUES: Report amounts/dates exactly as printed (normalize format, not value). Note the currency.
- DATA: Treat invoice contents as confidential financial data; keep in scope.
== METHOD ==
- OCR/parse the document. Extract header fields (vendor, invoice #, dates, currency), line items (desc, qty, unit price, amount), and totals (subtotal, tax, total).
- Score confidence per field. Validate: line items sum to subtotal; subtotal + tax = total. Flag mismatches and low-confidence fields.
== OUTPUT FORMAT (return ONE JSON object) ==
{
"vendor": { "value": "<v|null>", "confidence": <0.0-1.0> },
"invoice_number": { "value": "<v|null>", "confidence": <0.0-1.0> },
"invoice_date": { "value": "<ISO|null>", "confidence": <0.0-1.0> },
"currency": "<code|null>",
"line_items": [ { "description": "<v>", "qty": <n|null>, "unit_price": <n|null>, "amount": <n|null>, "confidence": <0.0-1.0> } ],
"subtotal": { "value": <n|null>, "confidence": <0.0-1.0> },
"tax": { "value": <n|null>, "confidence": <0.0-1.0> },
"total": { "value": <n|null>, "confidence": <0.0-1.0> },
"validation": { "math_ok": <bool>, "issues": ["<e.g. line items sum 980 != subtotal 1080>"] },
"overall_confidence": <0.0-1.0>,
"needs_review": <bool>,
"review_reasons": ["<illegible fields / math mismatch / low confidence>"]
}
Set needs_review=true on any math mismatch, illegible key field, or low overall confidence. Never emit a fabricated value.Simulate run
Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.
Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run.
Setup guide
Install and connect intake/output
Install the agent and connect its document source and structured-output destination.
pipx install invoice-extract-agent invoice-extract-agent connect --intake email,upload --out erp invoice-extract-agent doctor
Configure confidence & review thresholds
Review routing and the no-guess posture are enforced here.
cp .env.example .env ANTHROPIC_API_KEY=sk-ant-... REVIEW_BELOW_CONFIDENCE=0.8 NEVER_GUESS_MISSING=true ROUTE_MATH_MISMATCH_TO_REVIEW=true
Define the field schema
Specify the fields and validation rules you need.
# schema.yml header: [vendor, invoice_number, invoice_date, currency] line_items: [description, qty, unit_price, amount] totals: [subtotal, tax, total] validate: [line_items_sum_to_subtotal, subtotal_plus_tax_equals_total]
Backtest on labeled invoices
Replay invoices with known-correct values to measure field accuracy before going live.
invoice-extract-agent backtest --set ./labeled --explain # reports per-field accuracy + fabricated-value count (must be 0)
Wire into AP intake
Route incoming invoices through extraction; low-confidence ones go to a review queue.
# invoice intake -> extract -> validated JSON to ERP; needs_review=true -> review queue
Architecture
Tools required
Workflow
1. Intake the document
Load the invoice under confidential handling and prepare it for extraction.
2. OCR / parse
Extract text and layout from the digital or scanned document.
3. Extract fields
Capture header fields, line items, and totals as printed, leaving missing/illegible ones null.
4. Score confidence
Assign a confidence to each field so weak reads are visible.
5. Validate the math
Check line items, tax, and totals reconcile; record any discrepancy without altering values.
6. Gate for review
Route low-confidence, illegible-key-field, or mismatched documents to human review.
7. Emit structured JSON
Output validated JSON with per-field confidence and review flags for downstream systems.
Examples
Clean digital invoice → full validated JSON
A crisp PDF invoice. The agent extracts all fields at high confidence and the math reconciles.
Input
Digital PDF: Vendor 'Acme Supply', Invoice INV-204, 2026-05-10, 2 lines: 10 @ $40 = $400, 5 @ $20 = $100; subtotal $500, tax $40, total $540.
Output
{
"vendor": { "value": "Acme Supply", "confidence": 0.99 },
"invoice_number": { "value": "INV-204", "confidence": 0.99 },
"invoice_date": { "value": "2026-05-10", "confidence": 0.98 },
"currency": "USD",
"line_items": [
{ "description": "Widget A", "qty": 10, "unit_price": 40, "amount": 400, "confidence": 0.97 },
{ "description": "Widget B", "qty": 5, "unit_price": 20, "amount": 100, "confidence": 0.97 }
],
"subtotal": { "value": 500, "confidence": 0.98 },
"tax": { "value": 40, "confidence": 0.97 },
"total": { "value": 540, "confidence": 0.99 },
"validation": { "math_ok": true, "issues": [] },
"overall_confidence": 0.98,
"needs_review": false,
"review_reasons": []
}Note: The clean case: a digital invoice yields high per-field confidence and the validation confirms line items sum to subtotal and subtotal + tax = total. It passes through without review — the volume that should be automatic.
Low-quality scan, illegible field → null + flag, don't guess
A faxed scan where the invoice number is unreadable. The agent extracts what it can and refuses to guess the rest.
Input
Scanned/faxed invoice: vendor readable, line items and totals readable, but the invoice number is smudged/illegible.
Output
{
"vendor": { "value": "Globex Industrial", "confidence": 0.9 },
"invoice_number": { "value": null, "confidence": 0.2 },
"invoice_date": { "value": "2026-04-22", "confidence": 0.85 },
"currency": "USD",
"line_items": [
{ "description": "Maintenance service", "qty": 1, "unit_price": 1200, "amount": 1200, "confidence": 0.88 }
],
"subtotal": { "value": 1200, "confidence": 0.9 },
"tax": { "value": 96, "confidence": 0.85 },
"total": { "value": 1296, "confidence": 0.9 },
"validation": { "math_ok": true, "issues": [] },
"overall_confidence": 0.78,
"needs_review": true,
"review_reasons": ["Invoice number illegible (confidence 0.2) — left null rather than guessed"]
}Note: The math is fine, but the invoice number is unreadable. Critically, the agent returns null with low confidence for that field and flags it for review — it does NOT invent a plausible-looking invoice number. A guessed identifier would corrupt downstream matching and payment.
Totals don't reconcile → flag the discrepancy, never silently fix
The line items don't sum to the printed total. The agent reports both and routes to review instead of 'correcting' the numbers.
Input
Invoice: line items sum to $980, but the printed total is $1,080 (subtotal $1,000, tax $80). The $20 line-item gap doesn't reconcile.
Output
{
"vendor": { "value": "Northstar Parts", "confidence": 0.95 },
"invoice_number": { "value": "NS-7741", "confidence": 0.96 },
"invoice_date": { "value": "2026-05-02", "confidence": 0.95 },
"currency": "USD",
"line_items": [
{ "description": "Part X", "qty": 20, "unit_price": 24, "amount": 480, "confidence": 0.94 },
{ "description": "Part Y", "qty": 10, "unit_price": 50, "amount": 500, "confidence": 0.94 }
],
"subtotal": { "value": 1000, "confidence": 0.9 },
"tax": { "value": 80, "confidence": 0.92 },
"total": { "value": 1080, "confidence": 0.95 },
"validation": { "math_ok": false, "issues": ["Line items sum to 980 but subtotal is printed as 1000 (gap 20)", "subtotal 1000 + tax 80 = 1080 matches printed total, but line-item sum is inconsistent"] },
"overall_confidence": 0.9,
"needs_review": true,
"review_reasons": ["Line items (980) do not reconcile to subtotal (1000) — discrepancy of 20; not auto-adjusted"]
}Note: The defining defensive case: the line items don't sum to the subtotal. A naive extractor might quietly change a number to make it balance. This agent reports the printed values exactly, records the specific discrepancy, and routes to review — surfacing a real data problem (a missing line, an OCR error, or a vendor mistake) instead of masking it.
Implementation notes
- Never emit a fabricated value: a missing or illegible field must be null with low confidence and a flag, because a guessed invoice number or amount corrupts everything downstream.
- Report printed values exactly and never silently reconcile mismatched totals; a discrepancy is a signal (missing line, OCR error, vendor mistake), not something to paper over.
- Attach per-field confidence and route low-confidence or illegible-key-field documents to human review rather than passing them through as clean.
- Validate the arithmetic (line items → subtotal → total) and surface the exact issue, which catches both extraction errors and genuine invoice problems.
- Normalize format (dates, number formatting, currency) without altering the underlying value.
- Backtest against labeled invoices with field accuracy and a hard-zero fabricated-value metric before automating downstream posting.
- A cheaper OCR/parse pass handles clean digital invoices, so the strong model is reserved for messy scans and validation reasoning.
Variations
Basic
Extract to JSON
Extracts invoice fields and line items into structured JSON with per-field confidence. Single document, on demand.
Advanced
Validated extraction
Adds total/tax validation, no-guess null handling, confidence thresholds, and automatic routing of low-confidence or mismatched documents to review.
Enterprise
Document pipeline at scale
Adds batch processing, multi-format and multi-language support, ERP/AP integration, human-in-the-loop review queues, and accuracy monitoring over time.
Download the Agent Blueprint
Export
This blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).
Frequently asked questions
No — that's its core constraint. A missing, illegible, or ambiguous field is returned as null with low confidence and flagged for review, never filled with a plausible-looking guess that would corrupt downstream processing.
It reports the printed values exactly and records the specific discrepancy (for example, line items summing to less than the subtotal), then routes the document to review. It never silently changes a number to make the math balance.
Every field carries a confidence score based on legibility and clarity, and the document gets an overall confidence with explicit review reasons, so weak extractions are visible rather than hidden.
It extracts what's clearly legible at appropriate confidence and flags the rest for review, rather than forcing a full extraction from an unreadable image.
Yes for high-confidence, validated invoices. Low-confidence or mismatched documents are routed to a review queue first, so only clean, validated data flows through automatically.
Yes. It treats invoice contents as confidential financial data and keeps them in scope throughout extraction and output.