Does it guess values it can't read?

No — that's its core constraint. A missing, illegible, or ambiguous field is returned as null with low confidence and flagged for review, never filled with a plausible-looking guess that would corrupt downstream processing.

What happens when the totals don't add up?

It reports the printed values exactly and records the specific discrepancy (for example, line items summing to less than the subtotal), then routes the document to review. It never silently changes a number to make the math balance.

How do I know which fields to trust?

Every field carries a confidence score based on legibility and clarity, and the document gets an overall confidence with explicit review reasons, so weak extractions are visible rather than hidden.

What about low-quality scans?

It extracts what's clearly legible at appropriate confidence and flags the rest for review, rather than forcing a full extraction from an unreadable image.

Can it feed our ERP/AP system automatically?

Yes for high-confidence, validated invoices. Low-confidence or mismatched documents are routed to a review queue first, so only clean, validated data flows through automatically.

Is invoice data handled securely?

Yes. It treats invoice contents as confidential financial data and keeps them in scope throughout extraction and output.

Invoice Data Extraction Agent

Overview

Extracts invoices into validated JSON: vendor, number, date, line items, tax, and totals.

Attaches a confidence to every field and validates that line items reconcile to the stated total.

Flags illegible fields and math mismatches for human review instead of guessing or silently correcting.

Defensive: never fabricates a missing value, never auto-fixes a discrepancy, and keeps financial data in scope.

AgentAz™ specification

A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.

Trust Level ?A2 — Recommend

DNA PatternExtraction (Extract → Verify)

Worst-Case ActionExtracts an incorrect field value, surfaced with a low-confidence flag for human review. It cannot post, pay, or write invoice data to any financial system — execution tools are absent from its registry.

Authority BoundaryReads an invoice, extracts structured fields, validates them against a schema, and flags low-confidence values for review. It never posts to a ledger, pays, or writes to financial systems. A human verifies before any downstream use.

Verification TestAttempt to call a ledger-write, payment, or post tool → confirm it is absent from the agent's registry.

Production Readiness6/6 dimensions passing. Tool isolation: payment/post tools absent. Human gates: a human verifies before use. Confidence escalation: low-confidence fields flagged. Cost ceiling: bounded per document. Audit trail: extracted fields and confidence logged. Escalation path: low-confidence extractions routed to review.

Last Reviewed2026-06-24

Machine-readable contract (agentaz.json), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:

agentaz.json

{
  "$schema": "./agentaz.schema.json",
  "version": "2.0.0",
  "last_reviewed": "2026-06-24",
  "agent_id": "invoice-extraction-agent",
  "trust_level": "A2",
  "dna_pattern": "Extraction",
  "worst_case_action": "Extracts a wrong field for human review. Cannot post, pay, or write to financial systems.",
  "authority_boundary": "Extracts and validates invoice fields; payment/post tools absent.",
  "tags": [
    "document-processing",
    "invoice",
    "extraction",
    "read-only",
    "human-review"
  ],
  "tool_boundary": {
    "allowed_tools": [
      "read_document",
      "extract_fields",
      "validate_schema",
      "flag_low_confidence"
    ],
    "execution_tools_absent": true
  },
  "output_boundary": {
    "format": "structured_json",
    "never_emits": [
      "ledger_write",
      "payment",
      "post"
    ]
  },
  "cost_boundary": {
    "max_usd_per_trace_loop": 0.2,
    "alert_threshold_usd": 0.14
  },
  "loop_boundary": {
    "max_reasoning_turns": 8
  },
  "human_handoff": {
    "triggers": [
      "low_confidence_field",
      "missing_required_field"
    ],
    "destination": "ap_review"
  },
  "audit": {
    "append_only": true,
    "logs": [
      "fields",
      "confidence"
    ]
  }
}

New to this? Read the AgentAz specification guide — Trust Levels, DNA patterns, and how it complements your runtime.

AgentAz™ is open source under Apache-2.0 — schema (frozen v1.0.0) and source on GitHub.

Governance matrix

A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.

Agent goal	Bounded by the authority spec above
Trust Level	A2 — Recommend
Tool access	Least privilege — execution tools absent (read-only)
Context handling	Grounded in provided inputs; cites or flags rather than guessing
Memory strategy	Task-scoped; no persistent cross-session memory
Human approval	Required on low confidence field, missing required field → ap review
Audit trail	Append-only log (fields, confidence)
Cost & loop bounds	≤ $0.2 per loop · ≤ 8 reasoning turns
Recovery / escalation	Escalates to ap review

Agent component mapping

A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.

Agent	Primary reasoner — Recommend authority (A2)
Tools	read document, extract fields, validate schema, flag low confidence — execution tools absent (read-only)
Memory	Task-scoped working context; no persistent cross-session memory
Guardrails	Worst-case classified (A2); no execution tools; ≤ $0.2/loop · ≤ 8 turns
Evaluator	Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned
Handoff	Escalates to ap review on low confidence field, missing required field

Failure modes

Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.

Misreads a total or tax field, producing a wrong figure.

Detection: Low-confidence fields are flagged and arithmetic checks (line items vs total) run.
Mitigation: Extraction only — no posting or payment; a human verifies.
Recovery: The reviewer corrects the field before any downstream use.

A duplicate invoice is extracted and processed twice.

Detection: An invoice-number and vendor dedup check runs.
Mitigation: Duplicates are flagged rather than silently passed.
Recovery: The duplicate is held for review.

A wrong currency or date format is parsed.

Detection: Currency and locale normalization runs with conflicts flagged.
Mitigation: Ambiguous formats are flagged rather than guessed.
Recovery: Canonical values are requested.

Evaluation

Field-level accuracy on financial fields is primary — a wrong total or tax figure flows straight into the books.

Field accuracy	Share of extracted fields matching ground truth, reported per field type.
Financial-field accuracy	Accuracy on totals, tax, and amounts specifically — weighted higher.
Arithmetic consistency	Share where line items reconcile to the stated total.
Duplicate-detection recall	Of duplicate invoices, the share flagged.
Latency	Time to extract per invoice.

Recommended approach. Use a labeled set of invoices — clean digital plus messy scans — with known field values; report accuracy per field and separately on financial fields. Include known duplicates to test dedup recall.

When to use

Use it when

You process many invoices and want them turned into structured data automatically.
You need per-field confidence and validation, not just raw OCR text.
You want low-confidence or inconsistent documents routed to a human rather than passed through.
You're feeding an AP, ERP, or bookkeeping system that needs clean, validated fields.

Avoid it when

You want a value filled in for every field regardless of legibility — it won't guess.
You expect it to silently reconcile totals that don't add up.
You can't route low-confidence extractions to human review.
You can't handle invoice data with appropriate confidentiality.

System prompt

system-prompt.md

You are an Invoice Data Extraction Agent. You convert ONE invoice document into validated, structured JSON with per-field confidence. You are judged on accurate extraction and on never fabricating a value or hiding an inconsistency.

== CORE PRINCIPLES ==
1. Extract, don't invent. Pull only what is actually on the document. If a field is missing, illegible, or ambiguous, return null with low confidence and a flag — never a guessed value.
2. Confidence on everything. Attach a confidence (0.0-1.0) to each field based on legibility and clarity. Low-confidence fields are flagged for human review.
3. Validate, don't 'fix'. Check that line items, tax, and totals reconcile. If they don't, do NOT silently adjust numbers — extract what's printed, flag the discrepancy, and route for review.

== HARD RULES (NON-NEGOTIABLE) ==
- NO FABRICATION: Never produce a value for a field you can't read or that isn't present. Missing/illegible = null + low confidence + flag.
- NO SILENT CORRECTION: If the math doesn't add up, report both the printed values and the discrepancy. Never overwrite a number to make it reconcile.
- ROUTE LOW CONFIDENCE: If overall or key-field confidence is below threshold, mark the document for human review rather than passing it through as clean.
- PRESERVE PRINTED VALUES: Report amounts/dates exactly as printed (normalize format, not value). Note the currency.
- DATA: Treat invoice contents as confidential financial data; keep in scope.

== METHOD ==
- OCR/parse the document. Extract header fields (vendor, invoice #, dates, currency), line items (desc, qty, unit price, amount), and totals (subtotal, tax, total).
- Score confidence per field. Validate: line items sum to subtotal; subtotal + tax = total. Flag mismatches and low-confidence fields.

== OUTPUT FORMAT (return ONE JSON object) ==
{
  "vendor": { "value": "<v|null>", "confidence": <0.0-1.0> },
  "invoice_number": { "value": "<v|null>", "confidence": <0.0-1.0> },
  "invoice_date": { "value": "<ISO|null>", "confidence": <0.0-1.0> },
  "currency": "<code|null>",
  "line_items": [ { "description": "<v>", "qty": <n|null>, "unit_price": <n|null>, "amount": <n|null>, "confidence": <0.0-1.0> } ],
  "subtotal": { "value": <n|null>, "confidence": <0.0-1.0> },
  "tax": { "value": <n|null>, "confidence": <0.0-1.0> },
  "total": { "value": <n|null>, "confidence": <0.0-1.0> },
  "validation": { "math_ok": <bool>, "issues": ["<e.g. line items sum 980 != subtotal 1080>"] },
  "overall_confidence": <0.0-1.0>,
  "needs_review": <bool>,
  "review_reasons": ["<illegible fields / math mismatch / low confidence>"]
}
Set needs_review=true on any math mismatch, illegible key field, or low overall confidence. Never emit a fabricated value.

Was this useful?

Simulate run

Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.

Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run.

Setup guide

Install and connect intake/output

Install the agent and connect its document source and structured-output destination.

shell

pipx install invoice-extract-agent
invoice-extract-agent connect --intake email,upload --out erp
invoice-extract-agent doctor

Configure confidence & review thresholds

Review routing and the no-guess posture are enforced here.

shell

cp .env.example .env
ANTHROPIC_API_KEY=sk-ant-...
REVIEW_BELOW_CONFIDENCE=0.8
NEVER_GUESS_MISSING=true
ROUTE_MATH_MISMATCH_TO_REVIEW=true

Define the field schema

Specify the fields and validation rules you need.

shell

# schema.yml
header: [vendor, invoice_number, invoice_date, currency]
line_items: [description, qty, unit_price, amount]
totals: [subtotal, tax, total]
validate: [line_items_sum_to_subtotal, subtotal_plus_tax_equals_total]

Backtest on labeled invoices

Replay invoices with known-correct values to measure field accuracy before going live.

shell

invoice-extract-agent backtest --set ./labeled --explain
# reports per-field accuracy + fabricated-value count (must be 0)

Wire into AP intake

Route incoming invoices through extraction; low-confidence ones go to a review queue.

shell

# invoice intake -> extract -> validated JSON to ERP; needs_review=true -> review queue

Architecture

Document intakeReceives the invoice (PDF, image, or scan) under confidential handling and prepares it for extraction.

OCR / parse layerExtracts text and layout from digital or scanned documents, preserving positions that help locate fields.

Field extractionIdentifies header fields, line items, and totals, capturing the printed values without inventing absent ones.

Confidence scoringAssigns a confidence to each field from legibility and clarity, so weak extractions are visible rather than hidden.

Validation engineChecks that line items, tax, and totals reconcile and records any discrepancy instead of adjusting numbers.

Review gateA deterministic gate routes documents with low confidence, illegible key fields, or math mismatches to human review.

Structured outputEmits validated JSON with per-field confidence and review flags for downstream AP/ERP systems.

Tools required

get_documentFetch the invoice document (PDF/image/scan) for extraction under confidential handling.

ocr_extractRun OCR/parsing to obtain text and layout from digital or scanned invoices.

parse_fieldsIdentify header fields, line items, and totals from the parsed content.

confidence_scoreAssign a confidence to each extracted field based on legibility and clarity.

validate_totalsCheck that line items sum to subtotal and subtotal + tax = total, recording discrepancies.

currency_normalizeNormalize date and amount formats and capture the currency without altering values.

emit_jsonProduce the validated structured JSON with per-field confidence.

flag_for_reviewMark the document for human review on low confidence, illegible key fields, or math mismatch.

Workflow

1. Intake the document
Load the invoice under confidential handling and prepare it for extraction.
2. OCR / parse
Extract text and layout from the digital or scanned document.
3. Extract fields
Capture header fields, line items, and totals as printed, leaving missing/illegible ones null.
4. Score confidence
Assign a confidence to each field so weak reads are visible.
5. Validate the math
Check line items, tax, and totals reconcile; record any discrepancy without altering values.
6. Gate for review
Route low-confidence, illegible-key-field, or mismatched documents to human review.
7. Emit structured JSON
Output validated JSON with per-field confidence and review flags for downstream systems.

Examples

Clean digital invoice → full validated JSON

A crisp PDF invoice. The agent extracts all fields at high confidence and the math reconciles.

Input

input

Digital PDF: Vendor 'Acme Supply', Invoice INV-204, 2026-05-10, 2 lines: 10 @ $40 = $400, 5 @ $20 = $100; subtotal $500, tax $40, total $540.

Output

output

{
  "vendor": { "value": "Acme Supply", "confidence": 0.99 },
  "invoice_number": { "value": "INV-204", "confidence": 0.99 },
  "invoice_date": { "value": "2026-05-10", "confidence": 0.98 },
  "currency": "USD",
  "line_items": [
    { "description": "Widget A", "qty": 10, "unit_price": 40, "amount": 400, "confidence": 0.97 },
    { "description": "Widget B", "qty": 5, "unit_price": 20, "amount": 100, "confidence": 0.97 }
  ],
  "subtotal": { "value": 500, "confidence": 0.98 },
  "tax": { "value": 40, "confidence": 0.97 },
  "total": { "value": 540, "confidence": 0.99 },
  "validation": { "math_ok": true, "issues": [] },
  "overall_confidence": 0.98,
  "needs_review": false,
  "review_reasons": []
}

Note: The clean case: a digital invoice yields high per-field confidence and the validation confirms line items sum to subtotal and subtotal + tax = total. It passes through without review — the volume that should be automatic.

Low-quality scan, illegible field → null + flag, don't guess

A faxed scan where the invoice number is unreadable. The agent extracts what it can and refuses to guess the rest.

Input

input

Scanned/faxed invoice: vendor readable, line items and totals readable, but the invoice number is smudged/illegible.

Output

output

{
  "vendor": { "value": "Globex Industrial", "confidence": 0.9 },
  "invoice_number": { "value": null, "confidence": 0.2 },
  "invoice_date": { "value": "2026-04-22", "confidence": 0.85 },
  "currency": "USD",
  "line_items": [
    { "description": "Maintenance service", "qty": 1, "unit_price": 1200, "amount": 1200, "confidence": 0.88 }
  ],
  "subtotal": { "value": 1200, "confidence": 0.9 },
  "tax": { "value": 96, "confidence": 0.85 },
  "total": { "value": 1296, "confidence": 0.9 },
  "validation": { "math_ok": true, "issues": [] },
  "overall_confidence": 0.78,
  "needs_review": true,
  "review_reasons": ["Invoice number illegible (confidence 0.2) — left null rather than guessed"]
}

Note: The math is fine, but the invoice number is unreadable. Critically, the agent returns null with low confidence for that field and flags it for review — it does NOT invent a plausible-looking invoice number. A guessed identifier would corrupt downstream matching and payment.

Totals don't reconcile → flag the discrepancy, never silently fix

The line items don't sum to the printed total. The agent reports both and routes to review instead of 'correcting' the numbers.

Input

input

Invoice: line items sum to $980, but the printed total is $1,080 (subtotal $1,000, tax $80). The $20 line-item gap doesn't reconcile.

Output

output

{
  "vendor": { "value": "Northstar Parts", "confidence": 0.95 },
  "invoice_number": { "value": "NS-7741", "confidence": 0.96 },
  "invoice_date": { "value": "2026-05-02", "confidence": 0.95 },
  "currency": "USD",
  "line_items": [
    { "description": "Part X", "qty": 20, "unit_price": 24, "amount": 480, "confidence": 0.94 },
    { "description": "Part Y", "qty": 10, "unit_price": 50, "amount": 500, "confidence": 0.94 }
  ],
  "subtotal": { "value": 1000, "confidence": 0.9 },
  "tax": { "value": 80, "confidence": 0.92 },
  "total": { "value": 1080, "confidence": 0.95 },
  "validation": { "math_ok": false, "issues": ["Line items sum to 980 but subtotal is printed as 1000 (gap 20)", "subtotal 1000 + tax 80 = 1080 matches printed total, but line-item sum is inconsistent"] },
  "overall_confidence": 0.9,
  "needs_review": true,
  "review_reasons": ["Line items (980) do not reconcile to subtotal (1000) — discrepancy of 20; not auto-adjusted"]
}

Note: The defining defensive case: the line items don't sum to the subtotal. A naive extractor might quietly change a number to make it balance. This agent reports the printed values exactly, records the specific discrepancy, and routes to review — surfacing a real data problem (a missing line, an OCR error, or a vendor mistake) instead of masking it.

Implementation notes

Never emit a fabricated value: a missing or illegible field must be null with low confidence and a flag, because a guessed invoice number or amount corrupts everything downstream.
Report printed values exactly and never silently reconcile mismatched totals; a discrepancy is a signal (missing line, OCR error, vendor mistake), not something to paper over.
Attach per-field confidence and route low-confidence or illegible-key-field documents to human review rather than passing them through as clean.
Validate the arithmetic (line items → subtotal → total) and surface the exact issue, which catches both extraction errors and genuine invoice problems.
Normalize format (dates, number formatting, currency) without altering the underlying value.
Backtest against labeled invoices with field accuracy and a hard-zero fabricated-value metric before automating downstream posting.
A cheaper OCR/parse pass handles clean digital invoices, so the strong model is reserved for messy scans and validation reasoning.

Variations

Basic

Extract to JSON

Extracts invoice fields and line items into structured JSON with per-field confidence. Single document, on demand.

Advanced

Validated extraction

Adds total/tax validation, no-guess null handling, confidence thresholds, and automatic routing of low-confidence or mismatched documents to review.

Enterprise

Document pipeline at scale

Adds batch processing, multi-format and multi-language support, ERP/AP integration, human-in-the-loop review queues, and accuracy monitoring over time.

Download the Agent Blueprint

The complete blueprint, zipped — including a runnable run.py you can execute with one API key (Anthropic or OpenAI).

Download Blueprint (.zip)

README.mdsystem-prompt.mdsetup-guide.mdtools.jsonworkflow.mdexamples.md.env.examplekit.jsonrun.pyLICENSENOTICEstarters/

Export

Generate a starter for your stack — all client-side, nothing leaves your browser.

ZIP

Starters use mock tools — swap in your integrations to deploy.

View the source on GitHub

This blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).

Invoice Data Extraction Agent

Overview

AgentAz™ specification

Governance matrix

Agent component mapping

Failure modes

Evaluation

When to use

System prompt

Simulate run

Setup guide

Architecture

Tools required

Workflow

Examples

Implementation notes

Variations

Frequently asked questions

Does it guess values it can't read?

What happens when the totals don't add up?

How do I know which fields to trust?

What about low-quality scans?

Can it feed our ERP/AP system automatically?

Is invoice data handled securely?

Related kits

Form-to-JSON Extraction Agent

Purchase Order Matching Agent

Expense Audit & Compliance Agent

Access Request & Provisioning Agent