Overview
Three-way match done consistently: invoice vs. purchase order vs. goods receipt, with quantities, prices, and terms checked against tolerance.
Catches the costly exceptions: price variances, over-receipts, duplicate invoices, and invoices with no PO — each flag cites the exact variance.
Approves clean matches within tolerance for payment, and routes mismatches and exceptions to a human.
Defensive: never pays a duplicate, never approves over tolerance or without a matching PO and receipt, and flags fraud as evidence, not accusation.
AgentAz™ specification
A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.
Machine-readable contract (agentaz.json), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:
{
"$schema": "./agentaz.schema.json",
"version": "2.0.0",
"last_reviewed": "2026-06-24",
"agent_id": "po-matching-agent",
"trust_level": "A2",
"dna_pattern": "Synthesis",
"worst_case_action": "Produces an incorrect match for human review. Cannot approve, post, or pay.",
"authority_boundary": "Three-way matches PO/receipt/invoice and flags discrepancies; approval/payment tools absent.",
"tags": [
"supply-chain",
"procurement",
"reconciliation",
"read-only",
"human-review"
],
"tool_boundary": {
"allowed_tools": [
"read_po",
"read_receipt",
"read_invoice",
"match",
"flag_discrepancy"
],
"execution_tools_absent": true
},
"output_boundary": {
"format": "structured_json",
"never_emits": [
"approve_match",
"ledger_post",
"payment"
]
},
"cost_boundary": {
"max_usd_per_trace_loop": 0.22,
"alert_threshold_usd": 0.15
},
"loop_boundary": {
"max_reasoning_turns": 8
},
"human_handoff": {
"triggers": [
"partial_match",
"quantity_mismatch",
"price_mismatch"
],
"destination": "ap_review"
},
"audit": {
"append_only": true,
"logs": [
"matches",
"discrepancies"
]
}
}New to this? Read the AgentAz specification guide — Trust Levels, DNA patterns, and how it complements your runtime.
AgentAz™ is open source under Apache-2.0 — schema (frozen v1.0.0) and source on GitHub.
Governance matrix
A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.
| Agent goal | Bounded by the authority spec above |
|---|---|
| Trust Level | A2 — Recommend |
| Tool access | Least privilege — execution tools absent (read-only) |
| Context handling | Grounded in provided inputs; cites or flags rather than guessing |
| Memory strategy | Task-scoped; no persistent cross-session memory |
| Human approval | Required on partial match, quantity mismatch, price mismatch → ap review |
| Audit trail | Append-only log (matches, discrepancies) |
| Cost & loop bounds | ≤ $0.22 per loop · ≤ 8 reasoning turns |
| Recovery / escalation | Escalates to ap review |
Agent component mapping
A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.
| Agent | Primary reasoner — Recommend authority (A2) |
|---|---|
| Tools | read po, read receipt, read invoice, match, flag discrepancy — execution tools absent (read-only) |
| Memory | Task-scoped working context; no persistent cross-session memory |
| Guardrails | Worst-case classified (A2); no execution tools; ≤ $0.22/loop · ≤ 8 turns |
| Evaluator | Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned |
| Handoff | Escalates to ap review on partial match, quantity mismatch, price mismatch |
Failure modes
Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.
False match — approves a PO and invoice pair that doesn't tie out.
- Detection
- Match confidence is scored and partial or fuzzy matches are flagged.
- Mitigation
- It never approves, posts, or pays — a match is a recommendation.
- Recovery
- AP rejects it and the discrepancy re-opens.
Misses a price or quantity mismatch.
- Detection
- Line-level price and quantity checks run; the residual must reconcile.
- Mitigation
- All mismatches are surfaced, never hidden.
- Recovery
- The mismatch is flagged to AP for resolution.
A partial receipt is matched as complete.
- Detection
- Received quantity is checked against ordered quantity.
- Mitigation
- Partials are flagged as partial.
- Recovery
- The match is held until the receipt completes.
Evaluation
Match precision and mismatch recall together are the core metrics — a false match approves a bad pair, a missed mismatch lets a price or quantity error through.
| Match accuracy | Share of proposed PO–invoice–receipt matches that are correct. |
|---|---|
| Precision | Of matches proposed, the share that truly tie out — false-match resistance. |
| Mismatch recall | Of genuine price or quantity mismatches, the share it surfaces. |
| Partial-receipt handling | Share of partial receipts correctly flagged as partial rather than complete. |
| Latency | Time to match a batch. |
Recommended approach. Use a labeled set of three-way-match cases with seeded mismatches and partial receipts; measure precision and mismatch recall. The residual must reconcile, and nothing is approved or paid during evaluation.
When to use
Use it when
- Accounts payable processes a high volume of PO-backed invoices and the matching work is repetitive.
- You have POs, goods receipts, and invoices the agent can reconcile, plus a tolerance policy.
- You want consistent, documented three-way matching with a clear approval trail.
- You want to auto-clear clean matches and surface only the genuine exceptions, duplicates, and variances to humans.
Avoid it when
- You lack structured PO/receipt/invoice data for the agent to match against.
- You expect it to approve non-PO or over-tolerance invoices autonomously — those require a human.
- Your spend is mostly non-PO/maverick, where there's nothing to three-way match.
- You can't keep approval gates on exceptions, duplicates, and large variances.
System prompt
You are a Purchase Order Matching Agent in an accounts-payable workflow. For ONE invoice, you perform a three-way match against its purchase order (PO) and goods receipt and decide: approve for payment, hold, or escalate. You are judged on catching real mismatches, duplicates, and fraud, and on never approving an invoice you shouldn't.
== CORE PRINCIPLES ==
1. Match before you pay. Approve only when the invoice reconciles to a PO and a goods receipt within tolerance on quantity, price, and terms. No PO + receipt match means no auto-approval.
2. Evidence-cited exceptions. Every flag states the exact variance (e.g. "invoice unit price $12.50 vs PO $10.00, +25%, tolerance 5%"). Do not invent matches or variances.
3. Duplicates are never paid. Treat possible duplicate invoices as a hard stop for auto-payment, with the matching evidence, routed to a human.
== HARD RULES (NON-NEGOTIABLE) ==
- TOLERANCE-GATED APPROVAL: Auto-approve ONLY when quantity, price, and terms match within the configured tolerance AND the invoice total is at or below the auto-approval cap. Anything over tolerance or over cap requires a human.
- REQUIRE PO + RECEIPT: Do not auto-approve an invoice without a matching PO and a goods receipt confirming the goods/services were received. Missing either = hold/escalate.
- NO DUPLICATE PAYMENT: If the invoice may duplicate one already received/paid (same number, or same vendor+amount+PO), do not approve — flag with evidence and escalate.
- NO UNFOUNDED FRAUD CLAIMS: Suspected fraud is flagged as an evidence-based indicator and routed to a human; never assert wrongdoing.
- DATA: Treat vendor and financial data as sensitive; keep it in scope.
== METHOD ==
- Load the invoice, its PO, and the goods receipt. Compare line items: quantity invoiced vs ordered vs received; unit price invoiced vs PO; terms.
- Run tolerance checks and a duplicate-invoice check. Decide per line and for the invoice.
== DECISION POLICY (calibrated confidence 0.0-1.0) ==
- APPROVE: full three-way match within tolerance, no duplicate, total <= cap, confidence >= 0.85.
- HOLD: a specific line is over tolerance or a receipt/PO detail is missing — hold the invoice (or the line) and state what's needed.
- ESCALATE: duplicate suspicion, no PO, large variance, possible fraud, or conflicting data.
== COST CONTROL ==
Pull only the PO/receipt this invoice needs; reuse loaded data across lines. Cap tool calls; if exceeded, escalate with current findings.
== OUTPUT FORMAT (return ONE JSON object) ==
{
"invoice_id": "<id>",
"decision": "APPROVE|HOLD|ESCALATE",
"confidence": <0.0-1.0>,
"match": { "po": "<id or 'missing'>", "receipt": "<id or 'missing'>", "status": "matched|partial|unmatched" },
"line_findings": [ { "line": "<item>", "check": "qty|price|terms", "result": "ok|variance", "detail": "<exact variance vs tolerance, or empty>" } ],
"duplicate": { "suspected": <bool>, "evidence": "<matching invoice, or empty>" },
"approved_amount_usd": <number>,
"actions": [ { "tool": "<tool>", "args": { ... }, "requires_approval": <bool> } ],
"vendor_note": "<neutral status, if any>",
"escalation": { "needed": <bool>, "reason": "<duplicate/no-po/variance/fraud, or empty>" }
}
If there is no matching PO+receipt, a duplicate is suspected, or a variance exceeds tolerance, do NOT APPROVE.Simulate run
Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.
Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run.
Setup guide
Install and connect your ERP/AP system
Install the agent and connect it to the system holding POs, receipts, and invoices.
pipx install po-match-agent po-match-agent connect --erp netsuite po-match-agent doctor
Configure tolerance and cap
Tolerance, the PO+receipt requirement, and the cap are enforced deterministically, not by the model.
cp .env.example .env ANTHROPIC_API_KEY=sk-ant-... PRICE_TOLERANCE_PCT=5 QTY_TOLERANCE_PCT=0 AUTO_APPROVE_CAP_USD=10000 REQUIRE_PO_AND_RECEIPT=true
Define matching & duplicate rules
Provide the match keys and duplicate-detection rule.
# match.yml match_on: [po_number, line_item, quantity, unit_price] duplicate_keys: [invoice_number, [vendor, amount, po_number]] block_payment_on_duplicate: true
Backtest on past invoices
Replay matched/paid invoices to compare the agent's decisions to actual outcomes.
po-match-agent backtest --range 90d --explain # reports match accuracy + any duplicate or over-tolerance approvals (must be 0)
Wire into AP intake
Route incoming invoices to the agent. Start in assist mode, enable auto-approval for clean within-cap matches once backtests are clean.
# invoice intake -> POST https://your-host/ap/match (HMAC) # promote MODE=act for clean three-way matches within the cap
Architecture
Tools required
Workflow
1. Intake the invoice
Load the invoice and its PO reference and normalize the line items.
2. Retrieve PO and receipt
Pull the purchase order and the goods receipt the invoice must match against.
3. Run the three-way match
Compare quantity ordered/received/invoiced, unit price vs PO, and terms for each line.
4. Apply tolerance & duplicate checks
Test each variance against tolerance and check for a duplicate invoice in history.
5. Decide per line and invoice
Approve clean lines, hold lines over tolerance, and decide the invoice outcome within the cap.
6. Act through the gate
Auto-approve clean matches for payment; hold variance lines; escalate duplicates, no-PO, and large variances with evidence.
7. Record the trail
Log the match result with cited variances for finance controls and audit.
Examples
Clean three-way match within tolerance → approve for payment
Invoice matches its PO and receipt on quantity and price within tolerance, under the cap. The routine, payable case.
Input
Invoice INV-9001 $4,800: 40 units @ $120, PO-551 (40 @ $120), receipt confirms 40 received. Tolerance 5%, cap $10k.
Output
{
"invoice_id": "INV-9001",
"decision": "APPROVE",
"confidence": 0.95,
"match": { "po": "PO-551", "receipt": "GR-551", "status": "matched" },
"line_findings": [
{ "line": "40 units widget-A", "check": "qty", "result": "ok", "detail": "ordered 40 / received 40 / invoiced 40" },
{ "line": "40 units widget-A", "check": "price", "result": "ok", "detail": "invoice $120 = PO $120" }
],
"duplicate": { "suspected": false, "evidence": "" },
"approved_amount_usd": 4800,
"actions": [ { "tool": "approve_for_payment", "args": { "invoice_id": "INV-9001", "amount_usd": 4800 }, "requires_approval": false } ],
"vendor_note": "Invoice INV-9001 ($4,800) matched to PO-551 and approved for payment.",
"escalation": { "needed": false, "reason": "" }
}Note: Full three-way match: quantity ordered/received/invoiced all agree, price equals the PO, under the cap, no duplicate. It approves for payment and documents the match. This is the volume the kit clears so AP only touches exceptions.
Price variance over tolerance → hold the line, cite the variance
Quantities match, but the invoiced unit price exceeds the PO price beyond tolerance. The agent holds and cites the exact variance.
Input
Invoice INV-9044 $5,400: 40 units @ $135, PO-560 (40 @ $120), receipt confirms 40. Tolerance 5%.
Output
{
"invoice_id": "INV-9044",
"decision": "HOLD",
"confidence": 0.92,
"match": { "po": "PO-560", "receipt": "GR-560", "status": "partial" },
"line_findings": [
{ "line": "40 units widget-B", "check": "qty", "result": "ok", "detail": "ordered 40 / received 40 / invoiced 40" },
{ "line": "40 units widget-B", "check": "price", "result": "variance", "detail": "invoice $135 vs PO $120 = +12.5%, exceeds 5% tolerance" }
],
"duplicate": { "suspected": false, "evidence": "" },
"approved_amount_usd": 0,
"actions": [ { "tool": "escalate_exception", "args": { "invoice_id": "INV-9044", "reason": "price_variance", "detail": "+12.5% vs PO" }, "requires_approval": false } ],
"vendor_note": "Invoice INV-9044 is on hold: the unit price ($135) is 12.5% above the PO price ($120), beyond our 5% tolerance. AP will review.",
"escalation": { "needed": false, "reason": "Price variance +12.5% exceeds tolerance — needs AP review or a PO amendment." }
}Note: Quantities and receipt are fine, but the price is 12.5% over the PO against a 5% tolerance — so the agent holds rather than pays, and cites the exact variance and threshold. This is the line-level catch that prevents silent overbilling.
Duplicate invoice / no PO → escalate, never pay
An invoice has no PO reference and looks like a duplicate of one already paid. The agent refuses to approve and escalates with evidence.
Input
Invoice INV-9100 $7,200, vendor Acme, no PO referenced. duplicate_invoice_check finds INV-8841 (Acme, $7,200, PO-540) already paid last month.
Output
{
"invoice_id": "INV-9100",
"decision": "ESCALATE",
"confidence": 0.8,
"match": { "po": "missing", "receipt": "missing", "status": "unmatched" },
"line_findings": [],
"duplicate": { "suspected": true, "evidence": "Matches INV-8841 (Acme, $7,200, PO-540) already paid 2026-05" },
"approved_amount_usd": 0,
"actions": [ { "tool": "escalate_exception", "args": { "invoice_id": "INV-9100", "reason": "duplicate_and_no_po", "evidence": ["INV-8841"] }, "requires_approval": false } ],
"vendor_note": "Invoice INV-9100 is on hold pending review — it has no PO and appears to match a previously paid invoice. AP will follow up; this may be a re-send.",
"escalation": { "needed": true, "reason": "No PO and a likely duplicate of paid INV-8841 — must not be auto-paid." }
}Note: The defining defensive case: no PO to match against and strong duplicate evidence (same vendor, amount, and PO as an already-paid invoice). The agent treats it as a hard stop on payment, escalates with the specific matching invoice, and keeps the vendor note neutral — it may be an honest re-send. Duplicate and non-PO invoices are exactly where automated payment goes wrong.
Implementation notes
- Enforce tolerance, the PO+receipt requirement, the duplicate block, and the approval cap in a deterministic gate; the model matches, the gate controls what can be paid.
- Cite the exact variance (values, percentage, and the tolerance) on every flag — a finding without numbers isn't auditable.
- Make duplicate detection a hard stop on auto-payment; duplicate payments are among the most common and recoverable AP losses.
- Require a goods receipt, not just a PO: paying for goods that were ordered but never received is a classic leakage point.
- Treat suspected fraud as an evidence-based indicator routed to a human, with neutral vendor-facing language.
- Backtest against paid-invoice history with duplicate/over-tolerance approvals as a hard-zero metric before enabling auto-approval.
- Spend the strong model on variance judgment and the escalation decision — a cheaper model can extract and align line items.
Variations
Basic
Match & flag assistant
Performs the three-way match, applies tolerance, runs duplicate detection, and returns flagged exceptions with cited variances for an AP clerk. No auto-approval.
Advanced
Guarded auto-approval
Auto-approves clean within-tolerance, within-cap matches for payment, holds variance lines, and escalates duplicates, no-PO invoices, and large variances.
Enterprise
Governed AP automation
Adds ERP integration, multi-currency and partial-receipt handling, vendor-level analytics, full audit trails and SLAs, fraud-pattern detection, and tuning from AP outcomes.
Download the Agent Blueprint
Export
This blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).
Frequently asked questions
Only when the invoice fully matches its PO and goods receipt within tolerance, isn't a duplicate, and is within your configured cap. Anything over tolerance, over cap, without a PO/receipt, or possibly duplicate is held or escalated to a human.
It checks each invoice against received/paid history on the invoice number and on vendor+amount+PO combinations, and treats a likely match as a hard stop on payment, escalating with the specific matching invoice as evidence.
It holds the affected line (or invoice) and cites the exact variance against the tolerance — for example, invoiced price 12.5% over the PO against a 5% tolerance — so AP can review or amend the PO.
Yes, for auto-approval. Matching only the PO would let it pay for goods that were ordered but never received, so it requires the receipt confirming delivery.
No. It surfaces evidence-based indicators (like a duplicate) and routes them to a human, keeping vendor-facing language neutral. It never asserts wrongdoing.
Start in assist mode where it only recommends, backtest against paid-invoice history to confirm zero duplicate or over-tolerance approvals, then enable auto-approval for clean within-cap matches.