AgentKits

Expense Audit & Compliance Agent

Production Blueprint
0New

Includes Agent Blueprint + Implementation Guide

An agent that audits submitted expense reports the way a diligent finance reviewer would: it checks each line item against your written policy, verifies receipts, catches duplicates and out-of-policy spend, and surfaces patterns that look like fraud — then approves within limits, holds specific items for review, or escalates. It is defensive by design: it never auto-approves above a configured cap or with required documentation missing, never accuses an employee of fraud without cited evidence, grounds every flag in a specific policy rule, and routes rejections and suspicious patterns to a human.

expense-auditfinancecompliancefraud-detectionaccounts-payableautonomous-agentpolicyspendagentazagent-governancetrust-levelproduction-readiness
StackClaude, LangGraph, OpenAI
DifficultyAdvanced
Setup45 min
Version2.0.0 · 2026-06-21

Overview

Line-by-line audit against your actual policy: limits, categories, receipt rules, and per-diems — each flag cites the rule it breaks.

Catches what manual review misses: duplicate submissions, out-of-policy items, and suspicious patterns across reports.

Decides within limits: clean reports auto-approve; specific items are held for review; rejections and fraud signals go to a human.

Defensive: no auto-approval over the cap or with missing receipts, and no fraud accusation without cited evidence.

AgentAz™ specification

A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.

Trust Level ?A2 — Recommend
DNA PatternEvaluation (Research → Evaluate)
Worst-Case ActionFlags a compliant expense or misses a non-compliant one, surfaced for human review. It cannot approve, reject, reimburse, or pay an expense — execution tools are absent.
Authority BoundaryReviews expense reports against policy, flags violations and anomalies, and surfaces them for review. It never approves, rejects, or reimburses. A human in finance decides.
Verification TestAttempt to call an approve, reject, or payment tool → confirm it is absent from the agent's registry.
Production Readiness6/6 dimensions passing. Tool isolation: approval/payment tools absent. Human gates: finance decides. Confidence escalation: ambiguous items flagged. Cost ceiling: bounded. Audit trail: flags and policy refs logged. Escalation path: violations routed to finance.
Last Reviewed2026-06-24

Machine-readable contract (agentaz.json), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:

agentaz.json
{
  "$schema": "./agentaz.schema.json",
  "version": "2.0.0",
  "last_reviewed": "2026-06-24",
  "agent_id": "expense-report-audit-agent",
  "trust_level": "A2",
  "dna_pattern": "Evaluation",
  "worst_case_action": "Flags an expense incorrectly for human review. Cannot approve, reject, or reimburse.",
  "authority_boundary": "Audits expenses against policy and flags issues; no approval or payment tools present.",
  "tags": [
    "finance",
    "expense-audit",
    "compliance",
    "read-only",
    "human-review"
  ],
  "tool_boundary": {
    "allowed_tools": [
      "read_expense",
      "check_policy",
      "detect_anomaly",
      "flag_violation"
    ],
    "execution_tools_absent": true
  },
  "output_boundary": {
    "format": "structured_json",
    "never_emits": [
      "expense_approve",
      "expense_reject",
      "payment"
    ]
  },
  "cost_boundary": {
    "max_usd_per_trace_loop": 0.25,
    "alert_threshold_usd": 0.16
  },
  "loop_boundary": {
    "max_reasoning_turns": 8
  },
  "human_handoff": {
    "triggers": [
      "policy_violation",
      "anomaly",
      "low_confidence"
    ],
    "destination": "finance_review"
  },
  "audit": {
    "append_only": true,
    "logs": [
      "flags",
      "policy_refs"
    ]
  }
}

New to this? Read the AgentAz specification guide — Trust Levels, DNA patterns, and how it complements your runtime.

AgentAz™ is open source under Apache-2.0 — schema (frozen v1.0.0) and source on GitHub.

Governance matrix

A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.

Agent goalBounded by the authority spec above
Trust LevelA2 — Recommend
Tool accessLeast privilege — execution tools absent (read-only)
Context handlingGrounded in provided inputs; cites or flags rather than guessing
Memory strategyTask-scoped; no persistent cross-session memory
Human approvalRequired on policy violation, anomaly, low confidence → finance review
Audit trailAppend-only log (flags, policy refs)
Cost & loop bounds≤ $0.25 per loop · ≤ 8 reasoning turns
Recovery / escalationEscalates to finance review

Agent component mapping

A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.

AgentPrimary reasoner — Recommend authority (A2)
Toolsread expense, check policy, detect anomaly, flag violation — execution tools absent (read-only)
MemoryTask-scoped working context; no persistent cross-session memory
GuardrailsWorst-case classified (A2); no execution tools; ≤ $0.25/loop · ≤ 8 turns
EvaluatorConfidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned
HandoffEscalates to finance review on policy violation, anomaly, low confidence

Failure modes

Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.

Misses a genuine policy violation (false negative).

Detection
Every report is screened against the full policy set, not sampled.
Mitigation
Positioned as full-coverage screening with a human deciding exceptions.
Recovery
The missed rule is added post-audit and the report can be re-screened.

Flags a compliant expense as a violation (false positive).

Detection
Each finding carries confidence and cites the policy clause.
Mitigation
Findings are recommendations a human approves; it never auto-rejects.
Recovery
The approver clears it and the rule is tuned.

A receipt is fabricated or altered.

Detection
The agent flags anomalies but never asserts authenticity.
Mitigation
A human verifies authenticity.
Recovery
Suspicious items are escalated to finance.

Evaluation

Violation recall is what matters — missing a genuine policy breach is the failure — against a tolerable false-positive rate.

Violation recallOf genuine policy violations, the share it catches.
PrecisionOf items flagged, the share that are real violations — noise resistance.
Policy coverageShare of policy rules actually exercised by the screen.
Citation accuracyWhether each flag cites the correct policy clause.
LatencyTime to audit a report.

Recommended approach. Build a set of expense reports annotated against the full policy, with seeded violations and compliant edge cases; measure recall and precision and verify each flag cites the right clause. Include altered-receipt cases to confirm it flags rather than asserts authenticity.

When to use

Use it when

  • Finance/AP reviews a high volume of expense reports and most of the work is policy-checking and receipt-matching.
  • You have a written expense policy the agent can audit against and access to receipts/report data.
  • You want consistent, documented audits with an approval trail for compliance.
  • You want to auto-clear clean reports and surface only the genuine exceptions and fraud signals to humans.

Avoid it when

  • You have no written, structured policy for the agent to audit against.
  • You expect it to make final fraud or termination determinations — those are human decisions.
  • You can't give it receipt/report access to actually verify line items.
  • You are unwilling to keep approval gates on large amounts and rejections.

System prompt

system-prompt.md
You are an Expense Audit Agent in a finance operation. You audit ONE expense report against the company's written policy and decide: approve, hold specific items, reject, or escalate. You are judged on catching real policy violations and fraud, fairness and accuracy, and never approving spend you shouldn't or accusing someone without evidence.

== CORE PRINCIPLES ==
1. Policy-grounded. Every flag must cite the specific policy rule it violates (limit, category, receipt requirement, per-diem). Do not invent rules or violations; if the policy is silent, it is not a violation.
2. Evidence over suspicion. Base duplicate/fraud flags on concrete evidence (matching receipt, overlapping dates, identical amounts). Never label an employee 'fraud' without cited evidence; flag patterns for human review instead.
3. Audit each line. Approve the compliant items and flag only the specific non-compliant ones — don't reject a whole report over one bad line.

== HARD RULES (NON-NEGOTIABLE) ==
- APPROVAL LIMITS: Auto-approve ONLY when every line is within policy, required receipts are present, and the total is at or below the configured auto-approval cap. Anything above the cap, or with a policy exception, requires human approval.
- RECEIPTS REQUIRED: Do not approve an item that policy requires a receipt for if the receipt is missing or unreadable — hold it.
- NO UNFOUNDED ACCUSATIONS: Suspected duplicates/fraud are flagged with the evidence and routed to a human; never assert intent or wrongdoing.
- PII/DATA: Treat employee and financial data as sensitive; keep it in scope; redact where not needed.
- FAIRNESS: Apply the same policy consistently to every report.

== METHOD ==
- Load the report and the applicable policy. For each line: check category, amount vs. limit, receipt presence/validity, and per-diem/date rules.
- Run duplicate detection (same amount+date+merchant, or the same receipt across reports) and basic anomaly checks (e.g. mileage + flight for the same leg, weekend/personal patterns).
- Decide per line: ok / flag (with rule cited) / hold (missing doc). Then decide the report outcome.

== DECISION POLICY (calibrated confidence 0.0-1.0) ==
- APPROVE: all lines compliant, receipts present, total <= cap, confidence >= 0.85.
- HOLD: specific items missing receipts or needing a minor fix — approve the rest, hold those.
- REJECT_WITH_REASONS: clear policy violations; cite each. (Recommendation for a human to confirm.)
- ESCALATE: total over cap, suspected duplicate/fraud, policy exception, or conflicting evidence.

== COST CONTROL ==
Check only what each line needs; reuse the policy already loaded. Cap tool calls; if exceeded, approve the clearly-clean lines and escalate the rest.

== OUTPUT FORMAT (return ONE JSON object) ==
{
  "report_id": "<id>",
  "decision": "APPROVE|HOLD|REJECT_WITH_REASONS|ESCALATE",
  "confidence": <0.0-1.0>,
  "total_usd": <number>,
  "line_findings": [ { "item": "<line>", "status": "ok|flag|hold", "rule": "<policy rule cited, or empty>", "note": "<short>" } ],
  "fraud_signals": ["<evidence-based pattern, or empty>"],
  "approved_amount_usd": <number>,
  "actions": [ { "tool": "<tool>", "args": { ... }, "requires_approval": <bool> } ],
  "employee_note": "<neutral, factual; no accusation>",
  "escalation": { "needed": <bool>, "reason": "<cap/fraud/exception, or empty>" }
}
If evidence is mixed, prefer HOLD or ESCALATE over REJECT, and never accuse without cited evidence.
Was this useful?

Simulate run

Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.

Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run.

Setup guide

Install and connect your expense system

Install the agent and connect it to your expense/AP platform.

shell
pipx install expense-audit-agent
expense-audit-agent connect --system concur
expense-audit-agent doctor

Configure limits and mode

The auto-approval cap and receipt rules are enforced deterministically, not by the model.

shell
cp .env.example .env
ANTHROPIC_API_KEY=sk-ant-...
AUTO_APPROVE_CAP_USD=250
REQUIRE_RECEIPT_OVER_USD=25
MODE=assist   # assist (recommend) | act (auto within cap)

Load your expense policy

Provide the structured policy the agent audits against. This is the only basis for flags.

shell
# policy.yml
limits: { meals: 60, hotel_per_night: 300, mileage_per_mile: 0.67 }
receipt_required_over: 25
disallowed: ["alcohol_over_limit", "personal", "first_class_without_approval"]
per_diem: { domestic: 75 }

Backtest on past reports

Replay audited reports to compare the agent's findings to actual outcomes before going live.

shell
expense-audit-agent backtest --range 90d --explain
# reports approve/flag accuracy and any missed violations

Wire into the approval flow

Route submitted reports to the agent. Start in assist mode, enable auto-approval within the cap once backtests are clean.

shell
# submission webhook -> POST https://your-host/expense/audit (HMAC)
# promote MODE=act for within-cap clean reports

Architecture

Tools required

get_reportFetch the expense report: line items, amounts, categories, dates, attached receipts, and submitter context.
policy_lookupReturn the applicable expense policy rules (limits, categories, receipt thresholds, per-diems) for the audit.
receipt_verifyCheck that required receipts are present/readable and that amount and merchant match the claimed line.
duplicate_checkDetect duplicate submissions — the same receipt, amount+date+merchant, across this and prior reports.
fraud_signalsSurface evidence-based anomaly patterns (overlapping travel, split charges, personal/weekend spend) for human review.
categorizeMap each line to the correct expense category and the policy rule that governs it.
approve_expenseApprove compliant lines/reports. Hard-capped: rejects amounts above the configured auto-approval cap or with missing required receipts.
escalate_to_financeRoute to a human reviewer with the cited findings for over-cap totals, suspected fraud, or policy exceptions.

Workflow

  1. 1. Intake the report

    Load the report, receipts, and submitter context; load the applicable policy.

  2. 2. Audit each line

    Check category, amount vs. limit, receipt presence/validity, and per-diem/date rules for every line, citing the rule on any flag.

  3. 3. Verify receipts

    Confirm required receipts are present and readable and match the line; hold items that lack required proof.

  4. 4. Detect duplicates & anomalies

    Run duplicate detection and pattern checks across this and prior reports, gathering evidence rather than asserting intent.

  5. 5. Decide per line and report

    Approve compliant lines, hold those missing docs, flag violations with the rule, and decide the report outcome within the cap.

  6. 6. Act through the gate

    Auto-approve within limits; route over-cap totals, rejections, and fraud signals to a human with the evidence.

  7. 7. Record the trail

    Log each decision with the cited rule and outcome for compliance, and feed overrides back to improve the checks.

Examples

Clean report within policy → auto-approve

A standard travel report: every line within limits, receipts attached, total under the cap.

Input

input
Report EXP-3310, total $214: hotel $180 (receipt), taxi $19 (receipt), lunch $15 (receipt). Policy: hotel<=300/night, meals<=60, receipts over $25.

Output

output
{
  "report_id": "EXP-3310",
  "decision": "APPROVE",
  "confidence": 0.93,
  "total_usd": 214,
  "line_findings": [
    { "item": "Hotel $180", "status": "ok", "rule": "hotel_per_night<=300", "note": "within limit, receipt present" },
    { "item": "Taxi $19", "status": "ok", "rule": "", "note": "under receipt threshold, allowed" },
    { "item": "Lunch $15", "status": "ok", "rule": "meals<=60", "note": "within limit" }
  ],
  "fraud_signals": [],
  "approved_amount_usd": 214,
  "actions": [ { "tool": "approve_expense", "args": { "report_id": "EXP-3310", "amount_usd": 214 }, "requires_approval": false } ],
  "employee_note": "Your expense report EXP-3310 ($214) has been approved.",
  "escalation": { "needed": false, "reason": "" }
}

Note: Every line is checked against the cited policy rule, receipts are present, total is under the $250 cap — so it auto-approves and documents the rule for each line. This is the volume the kit clears instantly so humans only see exceptions.

Out-of-policy item + missing receipt → hold the line, approve the rest

Mostly clean, but one meal is over the limit and a $40 item has no receipt. The agent holds those two lines and approves the rest.

Input

input
Report EXP-3402, total $268: hotel $190 (receipt), dinner $92 (receipt), supplies $40 (NO receipt). Policy: meals<=60, receipts required over $25.

Output

output
{
  "report_id": "EXP-3402",
  "decision": "HOLD",
  "confidence": 0.9,
  "total_usd": 268,
  "line_findings": [
    { "item": "Hotel $190", "status": "ok", "rule": "hotel_per_night<=300", "note": "within limit" },
    { "item": "Dinner $92", "status": "flag", "rule": "meals<=60", "note": "$32 over the per-meal limit; needs approval or adjustment" },
    { "item": "Supplies $40", "status": "hold", "rule": "receipt_required_over=25", "note": "no receipt attached for a $40 item" }
  ],
  "fraud_signals": [],
  "approved_amount_usd": 190,
  "actions": [ { "tool": "approve_expense", "args": { "report_id": "EXP-3402", "lines": ["hotel"], "amount_usd": 190 }, "requires_approval": false } ],
  "employee_note": "Approved the hotel ($190). Two items need attention: the dinner ($92) exceeds the $60 meal limit, and the $40 supplies item needs a receipt. Please add the receipt and confirm the meal overage with your manager.",
  "escalation": { "needed": false, "reason": "" }
}

Note: Instead of rejecting the whole report over two lines, the agent approves the compliant hotel and holds exactly the two problem items — each with the cited rule. The employee note is specific and actionable, not punitive.

Duplicate receipt across reports → escalate with evidence, no accusation

A receipt submitted on this report also appears on a prior one. The agent flags the duplicate with evidence and escalates — without accusing the employee.

Input

input
Report EXP-3450: flight $610 with receipt #A-7782. duplicate_check finds receipt #A-7782 already reimbursed on EXP-3119 last month.

Output

output
{
  "report_id": "EXP-3450",
  "decision": "ESCALATE",
  "confidence": 0.8,
  "total_usd": 610,
  "line_findings": [
    { "item": "Flight $610", "status": "flag", "rule": "duplicate_submission", "note": "receipt #A-7782 matches one already reimbursed on EXP-3119 (2026-05)" }
  ],
  "fraud_signals": ["Identical receipt #A-7782 ($610) appears on EXP-3119 (already reimbursed) and EXP-3450 — possible duplicate submission"],
  "approved_amount_usd": 0,
  "actions": [ { "tool": "escalate_to_finance", "args": { "queue": "duplicate_review", "evidence": ["EXP-3119", "EXP-3450", "receipt#A-7782"] }, "requires_approval": false } ],
  "employee_note": "We're reviewing report EXP-3450; the flight receipt appears to match one already reimbursed. Finance will follow up — this may simply be an accidental re-submission.",
  "escalation": { "needed": true, "reason": "Possible duplicate reimbursement — same receipt on two reports." }
}

Note: The defining defensive case: the agent has concrete evidence (same receipt number on two reports) but treats it as a possible duplicate to review, not proven fraud. It escalates with the evidence, holds the $610, and the employee note explicitly allows for an honest mistake. Evidence and fairness, never accusation.

Implementation notes

  • Enforce the auto-approval cap and receipt requirements in a deterministic gate; the model audits, the gate controls what can be approved without a human.
  • Cite the specific policy rule on every flag. A finding without a rule is an opinion, not an audit — and citations make the trail defensible.
  • Treat duplicates and anomalies as evidence to review, never as proven fraud; route them to a human and keep employee-facing language neutral.
  • Audit per line and approve the compliant parts — rejecting whole reports over a single bad line creates friction and rework.
  • Backtest against historically audited reports and track missed-violation and false-flag rates before enabling auto-approval.
  • Keep employee and financial data in scope with PII discipline, and apply the policy identically to everyone for fairness and audit.
  • Reserve the strong model for anomaly judgment and the report decision; a cheaper model can match receipts and categorize lines.

Variations

Basic

Audit & flag assistant

Audits each line against policy, verifies receipts, and returns flagged items with the cited rule and a recommendation for a reviewer. No auto-approval.

Advanced

Guarded auto-approval

Auto-approves clean reports within the cap, holds specific non-compliant lines, runs duplicate/anomaly detection, and escalates fraud signals and over-cap totals.

Enterprise

Governed spend audit

Adds multi-policy support, ERP/AP integration, full audit trails and SLAs, fraud-pattern analytics across employees, and check tuning from reviewer outcomes.

Download the Agent Blueprint

The complete blueprint, zipped — including a runnable run.py you can execute with one API key (Anthropic or OpenAI).

Download Blueprint (.zip)
README.mdsystem-prompt.mdsetup-guide.mdtools.jsonworkflow.mdexamples.md.env.examplekit.jsonrun.pyLICENSENOTICEstarters/

Export

Generate a starter for your stack — all client-side, nothing leaves your browser.

ZIP

Starters use mock tools — swap in your integrations to deploy.

View the source on GitHub

This blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).

Frequently asked questions