AgentKits

Invoice Data Extraction Agent

Production Blueprint
0New

Includes Agent Blueprint + Implementation Guide

An agent that turns invoices — clean PDFs or messy scans — into validated, structured JSON your systems can trust: vendor, number, date, line items, tax, and totals. It attaches a confidence to every field, checks that the math reconciles, and flags anything illegible or inconsistent for a human instead of guessing. It is defensive by design: it never fabricates a value for a missing or unreadable field, never silently 'fixes' totals that don't add up, attaches per-field confidence, routes low-confidence extractions to review, and keeps financial data in scope.

document-processinginvoiceocrdata-extractionaccounts-payableautonomous-agentstructured-dataidpagentazagent-governancetrust-levelproduction-readiness
StackClaude, LangGraph, OpenAI
DifficultyIntermediate
Setup40 min
Version2.0.0 · 2026-06-21

Overview

Extracts invoices into validated JSON: vendor, number, date, line items, tax, and totals.

Attaches a confidence to every field and validates that line items reconcile to the stated total.

Flags illegible fields and math mismatches for human review instead of guessing or silently correcting.

Defensive: never fabricates a missing value, never auto-fixes a discrepancy, and keeps financial data in scope.

AgentAz™ specification

A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.

Trust Level ?A2 — Recommend
DNA PatternExtraction (Extract → Verify)
Worst-Case ActionExtracts an incorrect field value, surfaced with a low-confidence flag for human review. It cannot post, pay, or write invoice data to any financial system — execution tools are absent from its registry.
Authority BoundaryReads an invoice, extracts structured fields, validates them against a schema, and flags low-confidence values for review. It never posts to a ledger, pays, or writes to financial systems. A human verifies before any downstream use.
Verification TestAttempt to call a ledger-write, payment, or post tool → confirm it is absent from the agent's registry.
Production Readiness6/6 dimensions passing. Tool isolation: payment/post tools absent. Human gates: a human verifies before use. Confidence escalation: low-confidence fields flagged. Cost ceiling: bounded per document. Audit trail: extracted fields and confidence logged. Escalation path: low-confidence extractions routed to review.
Last Reviewed2026-06-24

Machine-readable contract (agentaz.json), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:

agentaz.json
{
  "$schema": "./agentaz.schema.json",
  "version": "2.0.0",
  "last_reviewed": "2026-06-24",
  "agent_id": "invoice-extraction-agent",
  "trust_level": "A2",
  "dna_pattern": "Extraction",
  "worst_case_action": "Extracts a wrong field for human review. Cannot post, pay, or write to financial systems.",
  "authority_boundary": "Extracts and validates invoice fields; payment/post tools absent.",
  "tags": [
    "document-processing",
    "invoice",
    "extraction",
    "read-only",
    "human-review"
  ],
  "tool_boundary": {
    "allowed_tools": [
      "read_document",
      "extract_fields",
      "validate_schema",
      "flag_low_confidence"
    ],
    "execution_tools_absent": true
  },
  "output_boundary": {
    "format": "structured_json",
    "never_emits": [
      "ledger_write",
      "payment",
      "post"
    ]
  },
  "cost_boundary": {
    "max_usd_per_trace_loop": 0.2,
    "alert_threshold_usd": 0.14
  },
  "loop_boundary": {
    "max_reasoning_turns": 8
  },
  "human_handoff": {
    "triggers": [
      "low_confidence_field",
      "missing_required_field"
    ],
    "destination": "ap_review"
  },
  "audit": {
    "append_only": true,
    "logs": [
      "fields",
      "confidence"
    ]
  }
}

New to this? Read the AgentAz specification guide — Trust Levels, DNA patterns, and how it complements your runtime.

AgentAz™ is open source under Apache-2.0 — schema (frozen v1.0.0) and source on GitHub.

Governance matrix

A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.

Agent goalBounded by the authority spec above
Trust LevelA2 — Recommend
Tool accessLeast privilege — execution tools absent (read-only)
Context handlingGrounded in provided inputs; cites or flags rather than guessing
Memory strategyTask-scoped; no persistent cross-session memory
Human approvalRequired on low confidence field, missing required field → ap review
Audit trailAppend-only log (fields, confidence)
Cost & loop bounds≤ $0.2 per loop · ≤ 8 reasoning turns
Recovery / escalationEscalates to ap review

Agent component mapping

A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.

AgentPrimary reasoner — Recommend authority (A2)
Toolsread document, extract fields, validate schema, flag low confidence — execution tools absent (read-only)
MemoryTask-scoped working context; no persistent cross-session memory
GuardrailsWorst-case classified (A2); no execution tools; ≤ $0.2/loop · ≤ 8 turns
EvaluatorConfidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned
HandoffEscalates to ap review on low confidence field, missing required field

Failure modes

Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.

Misreads a total or tax field, producing a wrong figure.

Detection
Low-confidence fields are flagged and arithmetic checks (line items vs total) run.
Mitigation
Extraction only — no posting or payment; a human verifies.
Recovery
The reviewer corrects the field before any downstream use.

A duplicate invoice is extracted and processed twice.

Detection
An invoice-number and vendor dedup check runs.
Mitigation
Duplicates are flagged rather than silently passed.
Recovery
The duplicate is held for review.

A wrong currency or date format is parsed.

Detection
Currency and locale normalization runs with conflicts flagged.
Mitigation
Ambiguous formats are flagged rather than guessed.
Recovery
Canonical values are requested.

Evaluation

Field-level accuracy on financial fields is primary — a wrong total or tax figure flows straight into the books.

Field accuracyShare of extracted fields matching ground truth, reported per field type.
Financial-field accuracyAccuracy on totals, tax, and amounts specifically — weighted higher.
Arithmetic consistencyShare where line items reconcile to the stated total.
Duplicate-detection recallOf duplicate invoices, the share flagged.
LatencyTime to extract per invoice.

Recommended approach. Use a labeled set of invoices — clean digital plus messy scans — with known field values; report accuracy per field and separately on financial fields. Include known duplicates to test dedup recall.

When to use

Use it when

  • You process many invoices and want them turned into structured data automatically.
  • You need per-field confidence and validation, not just raw OCR text.
  • You want low-confidence or inconsistent documents routed to a human rather than passed through.
  • You're feeding an AP, ERP, or bookkeeping system that needs clean, validated fields.

Avoid it when

  • You want a value filled in for every field regardless of legibility — it won't guess.
  • You expect it to silently reconcile totals that don't add up.
  • You can't route low-confidence extractions to human review.
  • You can't handle invoice data with appropriate confidentiality.

System prompt

system-prompt.md
You are an Invoice Data Extraction Agent. You convert ONE invoice document into validated, structured JSON with per-field confidence. You are judged on accurate extraction and on never fabricating a value or hiding an inconsistency.

== CORE PRINCIPLES ==
1. Extract, don't invent. Pull only what is actually on the document. If a field is missing, illegible, or ambiguous, return null with low confidence and a flag — never a guessed value.
2. Confidence on everything. Attach a confidence (0.0-1.0) to each field based on legibility and clarity. Low-confidence fields are flagged for human review.
3. Validate, don't 'fix'. Check that line items, tax, and totals reconcile. If they don't, do NOT silently adjust numbers — extract what's printed, flag the discrepancy, and route for review.

== HARD RULES (NON-NEGOTIABLE) ==
- NO FABRICATION: Never produce a value for a field you can't read or that isn't present. Missing/illegible = null + low confidence + flag.
- NO SILENT CORRECTION: If the math doesn't add up, report both the printed values and the discrepancy. Never overwrite a number to make it reconcile.
- ROUTE LOW CONFIDENCE: If overall or key-field confidence is below threshold, mark the document for human review rather than passing it through as clean.
- PRESERVE PRINTED VALUES: Report amounts/dates exactly as printed (normalize format, not value). Note the currency.
- DATA: Treat invoice contents as confidential financial data; keep in scope.

== METHOD ==
- OCR/parse the document. Extract header fields (vendor, invoice #, dates, currency), line items (desc, qty, unit price, amount), and totals (subtotal, tax, total).
- Score confidence per field. Validate: line items sum to subtotal; subtotal + tax = total. Flag mismatches and low-confidence fields.

== OUTPUT FORMAT (return ONE JSON object) ==
{
  "vendor": { "value": "<v|null>", "confidence": <0.0-1.0> },
  "invoice_number": { "value": "<v|null>", "confidence": <0.0-1.0> },
  "invoice_date": { "value": "<ISO|null>", "confidence": <0.0-1.0> },
  "currency": "<code|null>",
  "line_items": [ { "description": "<v>", "qty": <n|null>, "unit_price": <n|null>, "amount": <n|null>, "confidence": <0.0-1.0> } ],
  "subtotal": { "value": <n|null>, "confidence": <0.0-1.0> },
  "tax": { "value": <n|null>, "confidence": <0.0-1.0> },
  "total": { "value": <n|null>, "confidence": <0.0-1.0> },
  "validation": { "math_ok": <bool>, "issues": ["<e.g. line items sum 980 != subtotal 1080>"] },
  "overall_confidence": <0.0-1.0>,
  "needs_review": <bool>,
  "review_reasons": ["<illegible fields / math mismatch / low confidence>"]
}
Set needs_review=true on any math mismatch, illegible key field, or low overall confidence. Never emit a fabricated value.
Was this useful?

Simulate run

Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.

Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run.

Setup guide

Install and connect intake/output

Install the agent and connect its document source and structured-output destination.

shell
pipx install invoice-extract-agent
invoice-extract-agent connect --intake email,upload --out erp
invoice-extract-agent doctor

Configure confidence & review thresholds

Review routing and the no-guess posture are enforced here.

shell
cp .env.example .env
ANTHROPIC_API_KEY=sk-ant-...
REVIEW_BELOW_CONFIDENCE=0.8
NEVER_GUESS_MISSING=true
ROUTE_MATH_MISMATCH_TO_REVIEW=true

Define the field schema

Specify the fields and validation rules you need.

shell
# schema.yml
header: [vendor, invoice_number, invoice_date, currency]
line_items: [description, qty, unit_price, amount]
totals: [subtotal, tax, total]
validate: [line_items_sum_to_subtotal, subtotal_plus_tax_equals_total]

Backtest on labeled invoices

Replay invoices with known-correct values to measure field accuracy before going live.

shell
invoice-extract-agent backtest --set ./labeled --explain
# reports per-field accuracy + fabricated-value count (must be 0)

Wire into AP intake

Route incoming invoices through extraction; low-confidence ones go to a review queue.

shell
# invoice intake -> extract -> validated JSON to ERP; needs_review=true -> review queue

Architecture

Tools required

get_documentFetch the invoice document (PDF/image/scan) for extraction under confidential handling.
ocr_extractRun OCR/parsing to obtain text and layout from digital or scanned invoices.
parse_fieldsIdentify header fields, line items, and totals from the parsed content.
confidence_scoreAssign a confidence to each extracted field based on legibility and clarity.
validate_totalsCheck that line items sum to subtotal and subtotal + tax = total, recording discrepancies.
currency_normalizeNormalize date and amount formats and capture the currency without altering values.
emit_jsonProduce the validated structured JSON with per-field confidence.
flag_for_reviewMark the document for human review on low confidence, illegible key fields, or math mismatch.

Workflow

  1. 1. Intake the document

    Load the invoice under confidential handling and prepare it for extraction.

  2. 2. OCR / parse

    Extract text and layout from the digital or scanned document.

  3. 3. Extract fields

    Capture header fields, line items, and totals as printed, leaving missing/illegible ones null.

  4. 4. Score confidence

    Assign a confidence to each field so weak reads are visible.

  5. 5. Validate the math

    Check line items, tax, and totals reconcile; record any discrepancy without altering values.

  6. 6. Gate for review

    Route low-confidence, illegible-key-field, or mismatched documents to human review.

  7. 7. Emit structured JSON

    Output validated JSON with per-field confidence and review flags for downstream systems.

Examples

Clean digital invoice → full validated JSON

A crisp PDF invoice. The agent extracts all fields at high confidence and the math reconciles.

Input

input
Digital PDF: Vendor 'Acme Supply', Invoice INV-204, 2026-05-10, 2 lines: 10 @ $40 = $400, 5 @ $20 = $100; subtotal $500, tax $40, total $540.

Output

output
{
  "vendor": { "value": "Acme Supply", "confidence": 0.99 },
  "invoice_number": { "value": "INV-204", "confidence": 0.99 },
  "invoice_date": { "value": "2026-05-10", "confidence": 0.98 },
  "currency": "USD",
  "line_items": [
    { "description": "Widget A", "qty": 10, "unit_price": 40, "amount": 400, "confidence": 0.97 },
    { "description": "Widget B", "qty": 5, "unit_price": 20, "amount": 100, "confidence": 0.97 }
  ],
  "subtotal": { "value": 500, "confidence": 0.98 },
  "tax": { "value": 40, "confidence": 0.97 },
  "total": { "value": 540, "confidence": 0.99 },
  "validation": { "math_ok": true, "issues": [] },
  "overall_confidence": 0.98,
  "needs_review": false,
  "review_reasons": []
}

Note: The clean case: a digital invoice yields high per-field confidence and the validation confirms line items sum to subtotal and subtotal + tax = total. It passes through without review — the volume that should be automatic.

Low-quality scan, illegible field → null + flag, don't guess

A faxed scan where the invoice number is unreadable. The agent extracts what it can and refuses to guess the rest.

Input

input
Scanned/faxed invoice: vendor readable, line items and totals readable, but the invoice number is smudged/illegible.

Output

output
{
  "vendor": { "value": "Globex Industrial", "confidence": 0.9 },
  "invoice_number": { "value": null, "confidence": 0.2 },
  "invoice_date": { "value": "2026-04-22", "confidence": 0.85 },
  "currency": "USD",
  "line_items": [
    { "description": "Maintenance service", "qty": 1, "unit_price": 1200, "amount": 1200, "confidence": 0.88 }
  ],
  "subtotal": { "value": 1200, "confidence": 0.9 },
  "tax": { "value": 96, "confidence": 0.85 },
  "total": { "value": 1296, "confidence": 0.9 },
  "validation": { "math_ok": true, "issues": [] },
  "overall_confidence": 0.78,
  "needs_review": true,
  "review_reasons": ["Invoice number illegible (confidence 0.2) — left null rather than guessed"]
}

Note: The math is fine, but the invoice number is unreadable. Critically, the agent returns null with low confidence for that field and flags it for review — it does NOT invent a plausible-looking invoice number. A guessed identifier would corrupt downstream matching and payment.

Totals don't reconcile → flag the discrepancy, never silently fix

The line items don't sum to the printed total. The agent reports both and routes to review instead of 'correcting' the numbers.

Input

input
Invoice: line items sum to $980, but the printed total is $1,080 (subtotal $1,000, tax $80). The $20 line-item gap doesn't reconcile.

Output

output
{
  "vendor": { "value": "Northstar Parts", "confidence": 0.95 },
  "invoice_number": { "value": "NS-7741", "confidence": 0.96 },
  "invoice_date": { "value": "2026-05-02", "confidence": 0.95 },
  "currency": "USD",
  "line_items": [
    { "description": "Part X", "qty": 20, "unit_price": 24, "amount": 480, "confidence": 0.94 },
    { "description": "Part Y", "qty": 10, "unit_price": 50, "amount": 500, "confidence": 0.94 }
  ],
  "subtotal": { "value": 1000, "confidence": 0.9 },
  "tax": { "value": 80, "confidence": 0.92 },
  "total": { "value": 1080, "confidence": 0.95 },
  "validation": { "math_ok": false, "issues": ["Line items sum to 980 but subtotal is printed as 1000 (gap 20)", "subtotal 1000 + tax 80 = 1080 matches printed total, but line-item sum is inconsistent"] },
  "overall_confidence": 0.9,
  "needs_review": true,
  "review_reasons": ["Line items (980) do not reconcile to subtotal (1000) — discrepancy of 20; not auto-adjusted"]
}

Note: The defining defensive case: the line items don't sum to the subtotal. A naive extractor might quietly change a number to make it balance. This agent reports the printed values exactly, records the specific discrepancy, and routes to review — surfacing a real data problem (a missing line, an OCR error, or a vendor mistake) instead of masking it.

Implementation notes

  • Never emit a fabricated value: a missing or illegible field must be null with low confidence and a flag, because a guessed invoice number or amount corrupts everything downstream.
  • Report printed values exactly and never silently reconcile mismatched totals; a discrepancy is a signal (missing line, OCR error, vendor mistake), not something to paper over.
  • Attach per-field confidence and route low-confidence or illegible-key-field documents to human review rather than passing them through as clean.
  • Validate the arithmetic (line items → subtotal → total) and surface the exact issue, which catches both extraction errors and genuine invoice problems.
  • Normalize format (dates, number formatting, currency) without altering the underlying value.
  • Backtest against labeled invoices with field accuracy and a hard-zero fabricated-value metric before automating downstream posting.
  • A cheaper OCR/parse pass handles clean digital invoices, so the strong model is reserved for messy scans and validation reasoning.

Variations

Basic

Extract to JSON

Extracts invoice fields and line items into structured JSON with per-field confidence. Single document, on demand.

Advanced

Validated extraction

Adds total/tax validation, no-guess null handling, confidence thresholds, and automatic routing of low-confidence or mismatched documents to review.

Enterprise

Document pipeline at scale

Adds batch processing, multi-format and multi-language support, ERP/AP integration, human-in-the-loop review queues, and accuracy monitoring over time.

Download the Agent Blueprint

The complete blueprint, zipped — including a runnable run.py you can execute with one API key (Anthropic or OpenAI).

Download Blueprint (.zip)
README.mdsystem-prompt.mdsetup-guide.mdtools.jsonworkflow.mdexamples.md.env.examplekit.jsonrun.pyLICENSENOTICEstarters/

Export

Generate a starter for your stack — all client-side, nothing leaves your browser.

ZIP

Starters use mock tools — swap in your integrations to deploy.

View the source on GitHub

This blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).

Frequently asked questions