Will it approve expenses automatically?

Only when every line is within policy, required receipts are present, and the total is within your configured cap. Anything over the cap, missing a required receipt, or showing policy exceptions is held or escalated to a human.

How does it decide what violates policy?

It audits each line against your structured written policy and cites the specific rule on every flag. If the policy is silent on something, it isn't treated as a violation — no invented rules.

Does it accuse employees of fraud?

No. It surfaces evidence-based patterns (like a duplicate receipt) and routes them to a human for review with the evidence attached, keeping employee-facing language neutral. It never asserts intent or wrongdoing.

What happens to a report with one bad line?

It approves the compliant lines and holds only the specific problem items, with the cited rule and what's needed to fix them — rather than rejecting the whole report.

How does it catch duplicates?

It checks for the same receipt, or the same amount/date/merchant, across the current and prior reports, and flags genuine matches as possible duplicates for human review.

How do we roll it out safely?

Start in assist mode where it only recommends, backtest against historically audited reports, then enable auto-approval for clean within-cap reports once the results hold up.

Expense Audit & Compliance Agent

Overview

Line-by-line audit against your actual policy: limits, categories, receipt rules, and per-diems — each flag cites the rule it breaks.

Catches what manual review misses: duplicate submissions, out-of-policy items, and suspicious patterns across reports.

Decides within limits: clean reports auto-approve; specific items are held for review; rejections and fraud signals go to a human.

Defensive: no auto-approval over the cap or with missing receipts, and no fraud accusation without cited evidence.

AgentAz™ specification

A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.

Trust Level ?A2 — Recommend

DNA PatternEvaluation (Research → Evaluate)

Worst-Case ActionFlags a compliant expense or misses a non-compliant one, surfaced for human review. It cannot approve, reject, reimburse, or pay an expense — execution tools are absent.

Authority BoundaryReviews expense reports against policy, flags violations and anomalies, and surfaces them for review. It never approves, rejects, or reimburses. A human in finance decides.

Verification TestAttempt to call an approve, reject, or payment tool → confirm it is absent from the agent's registry.

Production Readiness6/6 dimensions passing. Tool isolation: approval/payment tools absent. Human gates: finance decides. Confidence escalation: ambiguous items flagged. Cost ceiling: bounded. Audit trail: flags and policy refs logged. Escalation path: violations routed to finance.

Last Reviewed2026-06-24

Machine-readable contract (agentaz.json), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:

agentaz.json

{
  "$schema": "./agentaz.schema.json",
  "version": "2.0.0",
  "last_reviewed": "2026-06-24",
  "agent_id": "expense-report-audit-agent",
  "trust_level": "A2",
  "dna_pattern": "Evaluation",
  "worst_case_action": "Flags an expense incorrectly for human review. Cannot approve, reject, or reimburse.",
  "authority_boundary": "Audits expenses against policy and flags issues; no approval or payment tools present.",
  "tags": [
    "finance",
    "expense-audit",
    "compliance",
    "read-only",
    "human-review"
  ],
  "tool_boundary": {
    "allowed_tools": [
      "read_expense",
      "check_policy",
      "detect_anomaly",
      "flag_violation"
    ],
    "execution_tools_absent": true
  },
  "output_boundary": {
    "format": "structured_json",
    "never_emits": [
      "expense_approve",
      "expense_reject",
      "payment"
    ]
  },
  "cost_boundary": {
    "max_usd_per_trace_loop": 0.25,
    "alert_threshold_usd": 0.16
  },
  "loop_boundary": {
    "max_reasoning_turns": 8
  },
  "human_handoff": {
    "triggers": [
      "policy_violation",
      "anomaly",
      "low_confidence"
    ],
    "destination": "finance_review"
  },
  "audit": {
    "append_only": true,
    "logs": [
      "flags",
      "policy_refs"
    ]
  }
}

New to this? Read the AgentAz specification guide — Trust Levels, DNA patterns, and how it complements your runtime.

AgentAz™ is open source under Apache-2.0 — schema (frozen v1.0.0) and source on GitHub.

Governance matrix

A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.

Agent goal	Bounded by the authority spec above
Trust Level	A2 — Recommend
Tool access	Least privilege — execution tools absent (read-only)
Context handling	Grounded in provided inputs; cites or flags rather than guessing
Memory strategy	Task-scoped; no persistent cross-session memory
Human approval	Required on policy violation, anomaly, low confidence → finance review
Audit trail	Append-only log (flags, policy refs)
Cost & loop bounds	≤ $0.25 per loop · ≤ 8 reasoning turns
Recovery / escalation	Escalates to finance review

Agent component mapping

A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.

Agent	Primary reasoner — Recommend authority (A2)
Tools	read expense, check policy, detect anomaly, flag violation — execution tools absent (read-only)
Memory	Task-scoped working context; no persistent cross-session memory
Guardrails	Worst-case classified (A2); no execution tools; ≤ $0.25/loop · ≤ 8 turns
Evaluator	Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned
Handoff	Escalates to finance review on policy violation, anomaly, low confidence

Failure modes

Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.

Misses a genuine policy violation (false negative).

Detection: Every report is screened against the full policy set, not sampled.
Mitigation: Positioned as full-coverage screening with a human deciding exceptions.
Recovery: The missed rule is added post-audit and the report can be re-screened.

Flags a compliant expense as a violation (false positive).

Detection: Each finding carries confidence and cites the policy clause.
Mitigation: Findings are recommendations a human approves; it never auto-rejects.
Recovery: The approver clears it and the rule is tuned.

A receipt is fabricated or altered.

Detection: The agent flags anomalies but never asserts authenticity.
Mitigation: A human verifies authenticity.
Recovery: Suspicious items are escalated to finance.

Evaluation

Violation recall is what matters — missing a genuine policy breach is the failure — against a tolerable false-positive rate.

Violation recall	Of genuine policy violations, the share it catches.
Precision	Of items flagged, the share that are real violations — noise resistance.
Policy coverage	Share of policy rules actually exercised by the screen.
Citation accuracy	Whether each flag cites the correct policy clause.
Latency	Time to audit a report.

Recommended approach. Build a set of expense reports annotated against the full policy, with seeded violations and compliant edge cases; measure recall and precision and verify each flag cites the right clause. Include altered-receipt cases to confirm it flags rather than asserts authenticity.

When to use

Use it when

Finance/AP reviews a high volume of expense reports and most of the work is policy-checking and receipt-matching.
You have a written expense policy the agent can audit against and access to receipts/report data.
You want consistent, documented audits with an approval trail for compliance.
You want to auto-clear clean reports and surface only the genuine exceptions and fraud signals to humans.

Avoid it when

You have no written, structured policy for the agent to audit against.
You expect it to make final fraud or termination determinations — those are human decisions.
You can't give it receipt/report access to actually verify line items.
You are unwilling to keep approval gates on large amounts and rejections.

System prompt

system-prompt.md

You are an Expense Audit Agent in a finance operation. You audit ONE expense report against the company's written policy and decide: approve, hold specific items, reject, or escalate. You are judged on catching real policy violations and fraud, fairness and accuracy, and never approving spend you shouldn't or accusing someone without evidence.

== CORE PRINCIPLES ==
1. Policy-grounded. Every flag must cite the specific policy rule it violates (limit, category, receipt requirement, per-diem). Do not invent rules or violations; if the policy is silent, it is not a violation.
2. Evidence over suspicion. Base duplicate/fraud flags on concrete evidence (matching receipt, overlapping dates, identical amounts). Never label an employee 'fraud' without cited evidence; flag patterns for human review instead.
3. Audit each line. Approve the compliant items and flag only the specific non-compliant ones — don't reject a whole report over one bad line.

== HARD RULES (NON-NEGOTIABLE) ==
- APPROVAL LIMITS: Auto-approve ONLY when every line is within policy, required receipts are present, and the total is at or below the configured auto-approval cap. Anything above the cap, or with a policy exception, requires human approval.
- RECEIPTS REQUIRED: Do not approve an item that policy requires a receipt for if the receipt is missing or unreadable — hold it.
- NO UNFOUNDED ACCUSATIONS: Suspected duplicates/fraud are flagged with the evidence and routed to a human; never assert intent or wrongdoing.
- PII/DATA: Treat employee and financial data as sensitive; keep it in scope; redact where not needed.
- FAIRNESS: Apply the same policy consistently to every report.

== METHOD ==
- Load the report and the applicable policy. For each line: check category, amount vs. limit, receipt presence/validity, and per-diem/date rules.
- Run duplicate detection (same amount+date+merchant, or the same receipt across reports) and basic anomaly checks (e.g. mileage + flight for the same leg, weekend/personal patterns).
- Decide per line: ok / flag (with rule cited) / hold (missing doc). Then decide the report outcome.

== DECISION POLICY (calibrated confidence 0.0-1.0) ==
- APPROVE: all lines compliant, receipts present, total <= cap, confidence >= 0.85.
- HOLD: specific items missing receipts or needing a minor fix — approve the rest, hold those.
- REJECT_WITH_REASONS: clear policy violations; cite each. (Recommendation for a human to confirm.)
- ESCALATE: total over cap, suspected duplicate/fraud, policy exception, or conflicting evidence.

== COST CONTROL ==
Check only what each line needs; reuse the policy already loaded. Cap tool calls; if exceeded, approve the clearly-clean lines and escalate the rest.

== OUTPUT FORMAT (return ONE JSON object) ==
{
  "report_id": "<id>",
  "decision": "APPROVE|HOLD|REJECT_WITH_REASONS|ESCALATE",
  "confidence": <0.0-1.0>,
  "total_usd": <number>,
  "line_findings": [ { "item": "<line>", "status": "ok|flag|hold", "rule": "<policy rule cited, or empty>", "note": "<short>" } ],
  "fraud_signals": ["<evidence-based pattern, or empty>"],
  "approved_amount_usd": <number>,
  "actions": [ { "tool": "<tool>", "args": { ... }, "requires_approval": <bool> } ],
  "employee_note": "<neutral, factual; no accusation>",
  "escalation": { "needed": <bool>, "reason": "<cap/fraud/exception, or empty>" }
}
If evidence is mixed, prefer HOLD or ESCALATE over REJECT, and never accuse without cited evidence.

Was this useful?

Simulate run

Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.

Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run.

Setup guide

Install and connect your expense system

Install the agent and connect it to your expense/AP platform.

shell

pipx install expense-audit-agent
expense-audit-agent connect --system concur
expense-audit-agent doctor

Configure limits and mode

The auto-approval cap and receipt rules are enforced deterministically, not by the model.

shell

cp .env.example .env
ANTHROPIC_API_KEY=sk-ant-...
AUTO_APPROVE_CAP_USD=250
REQUIRE_RECEIPT_OVER_USD=25
MODE=assist   # assist (recommend) | act (auto within cap)

Load your expense policy

Provide the structured policy the agent audits against. This is the only basis for flags.

shell

# policy.yml
limits: { meals: 60, hotel_per_night: 300, mileage_per_mile: 0.67 }
receipt_required_over: 25
disallowed: ["alcohol_over_limit", "personal", "first_class_without_approval"]
per_diem: { domestic: 75 }

Backtest on past reports

Replay audited reports to compare the agent's findings to actual outcomes before going live.

shell

expense-audit-agent backtest --range 90d --explain
# reports approve/flag accuracy and any missed violations

Wire into the approval flow

Route submitted reports to the agent. Start in assist mode, enable auto-approval within the cap once backtests are clean.

shell

# submission webhook -> POST https://your-host/expense/audit (HMAC)
# promote MODE=act for within-cap clean reports

Architecture

Report intakeReceives the submitted expense report (line items, amounts, categories, dates, attached receipts) and the submitting employee/cost-center context.

Policy groundingLoads the applicable written policy — limits, allowed categories, receipt thresholds, per-diems — the benchmark every line is audited against.

Receipt verificationChecks that required receipts are present and readable and that amounts/merchants match the claimed line, without approving items that lack required proof.

Duplicate & anomaly engineDetects duplicate submissions (same receipt/amount/date) and anomalous patterns (overlapping travel, weekend/personal spend) as evidence for human review.

Line audit & decision gateThe model audits each line and a deterministic gate enforces the approval cap and receipt rules; over-cap totals and exceptions route to human approval.

Approval & routingApproves compliant reports within limits, holds specific items, and routes rejections/fraud signals to finance with the cited evidence.

Audit trail & learningLogs every decision with the cited rule and outcome for compliance, and feeds reviewer overrides back to refine checks.

Tools required

get_reportFetch the expense report: line items, amounts, categories, dates, attached receipts, and submitter context.

policy_lookupReturn the applicable expense policy rules (limits, categories, receipt thresholds, per-diems) for the audit.

receipt_verifyCheck that required receipts are present/readable and that amount and merchant match the claimed line.

duplicate_checkDetect duplicate submissions — the same receipt, amount+date+merchant, across this and prior reports.

fraud_signalsSurface evidence-based anomaly patterns (overlapping travel, split charges, personal/weekend spend) for human review.

categorizeMap each line to the correct expense category and the policy rule that governs it.

approve_expenseApprove compliant lines/reports. Hard-capped: rejects amounts above the configured auto-approval cap or with missing required receipts.

escalate_to_financeRoute to a human reviewer with the cited findings for over-cap totals, suspected fraud, or policy exceptions.

Workflow

1. Intake the report
Load the report, receipts, and submitter context; load the applicable policy.
2. Audit each line
Check category, amount vs. limit, receipt presence/validity, and per-diem/date rules for every line, citing the rule on any flag.
3. Verify receipts
Confirm required receipts are present and readable and match the line; hold items that lack required proof.
4. Detect duplicates & anomalies
Run duplicate detection and pattern checks across this and prior reports, gathering evidence rather than asserting intent.
5. Decide per line and report
Approve compliant lines, hold those missing docs, flag violations with the rule, and decide the report outcome within the cap.
6. Act through the gate
Auto-approve within limits; route over-cap totals, rejections, and fraud signals to a human with the evidence.
7. Record the trail
Log each decision with the cited rule and outcome for compliance, and feed overrides back to improve the checks.

Examples

Clean report within policy → auto-approve

A standard travel report: every line within limits, receipts attached, total under the cap.

Input

input

Report EXP-3310, total $214: hotel $180 (receipt), taxi $19 (receipt), lunch $15 (receipt). Policy: hotel<=300/night, meals<=60, receipts over $25.

Output

output

{
  "report_id": "EXP-3310",
  "decision": "APPROVE",
  "confidence": 0.93,
  "total_usd": 214,
  "line_findings": [
    { "item": "Hotel $180", "status": "ok", "rule": "hotel_per_night<=300", "note": "within limit, receipt present" },
    { "item": "Taxi $19", "status": "ok", "rule": "", "note": "under receipt threshold, allowed" },
    { "item": "Lunch $15", "status": "ok", "rule": "meals<=60", "note": "within limit" }
  ],
  "fraud_signals": [],
  "approved_amount_usd": 214,
  "actions": [ { "tool": "approve_expense", "args": { "report_id": "EXP-3310", "amount_usd": 214 }, "requires_approval": false } ],
  "employee_note": "Your expense report EXP-3310 ($214) has been approved.",
  "escalation": { "needed": false, "reason": "" }
}

Note: Every line is checked against the cited policy rule, receipts are present, total is under the $250 cap — so it auto-approves and documents the rule for each line. This is the volume the kit clears instantly so humans only see exceptions.

Out-of-policy item + missing receipt → hold the line, approve the rest

Mostly clean, but one meal is over the limit and a $40 item has no receipt. The agent holds those two lines and approves the rest.

Input

input

Report EXP-3402, total $268: hotel $190 (receipt), dinner $92 (receipt), supplies $40 (NO receipt). Policy: meals<=60, receipts required over $25.

Output

output

{
  "report_id": "EXP-3402",
  "decision": "HOLD",
  "confidence": 0.9,
  "total_usd": 268,
  "line_findings": [
    { "item": "Hotel $190", "status": "ok", "rule": "hotel_per_night<=300", "note": "within limit" },
    { "item": "Dinner $92", "status": "flag", "rule": "meals<=60", "note": "$32 over the per-meal limit; needs approval or adjustment" },
    { "item": "Supplies $40", "status": "hold", "rule": "receipt_required_over=25", "note": "no receipt attached for a $40 item" }
  ],
  "fraud_signals": [],
  "approved_amount_usd": 190,
  "actions": [ { "tool": "approve_expense", "args": { "report_id": "EXP-3402", "lines": ["hotel"], "amount_usd": 190 }, "requires_approval": false } ],
  "employee_note": "Approved the hotel ($190). Two items need attention: the dinner ($92) exceeds the $60 meal limit, and the $40 supplies item needs a receipt. Please add the receipt and confirm the meal overage with your manager.",
  "escalation": { "needed": false, "reason": "" }
}

Note: Instead of rejecting the whole report over two lines, the agent approves the compliant hotel and holds exactly the two problem items — each with the cited rule. The employee note is specific and actionable, not punitive.

Duplicate receipt across reports → escalate with evidence, no accusation

A receipt submitted on this report also appears on a prior one. The agent flags the duplicate with evidence and escalates — without accusing the employee.

Input

input

Report EXP-3450: flight $610 with receipt #A-7782. duplicate_check finds receipt #A-7782 already reimbursed on EXP-3119 last month.

Output

output

{
  "report_id": "EXP-3450",
  "decision": "ESCALATE",
  "confidence": 0.8,
  "total_usd": 610,
  "line_findings": [
    { "item": "Flight $610", "status": "flag", "rule": "duplicate_submission", "note": "receipt #A-7782 matches one already reimbursed on EXP-3119 (2026-05)" }
  ],
  "fraud_signals": ["Identical receipt #A-7782 ($610) appears on EXP-3119 (already reimbursed) and EXP-3450 — possible duplicate submission"],
  "approved_amount_usd": 0,
  "actions": [ { "tool": "escalate_to_finance", "args": { "queue": "duplicate_review", "evidence": ["EXP-3119", "EXP-3450", "receipt#A-7782"] }, "requires_approval": false } ],
  "employee_note": "We're reviewing report EXP-3450; the flight receipt appears to match one already reimbursed. Finance will follow up — this may simply be an accidental re-submission.",
  "escalation": { "needed": true, "reason": "Possible duplicate reimbursement — same receipt on two reports." }
}

Note: The defining defensive case: the agent has concrete evidence (same receipt number on two reports) but treats it as a possible duplicate to review, not proven fraud. It escalates with the evidence, holds the $610, and the employee note explicitly allows for an honest mistake. Evidence and fairness, never accusation.

Implementation notes

Enforce the auto-approval cap and receipt requirements in a deterministic gate; the model audits, the gate controls what can be approved without a human.
Cite the specific policy rule on every flag. A finding without a rule is an opinion, not an audit — and citations make the trail defensible.
Treat duplicates and anomalies as evidence to review, never as proven fraud; route them to a human and keep employee-facing language neutral.
Audit per line and approve the compliant parts — rejecting whole reports over a single bad line creates friction and rework.
Backtest against historically audited reports and track missed-violation and false-flag rates before enabling auto-approval.
Keep employee and financial data in scope with PII discipline, and apply the policy identically to everyone for fairness and audit.
Reserve the strong model for anomaly judgment and the report decision; a cheaper model can match receipts and categorize lines.

Variations

Basic

Audit & flag assistant

Audits each line against policy, verifies receipts, and returns flagged items with the cited rule and a recommendation for a reviewer. No auto-approval.

Advanced

Guarded auto-approval

Auto-approves clean reports within the cap, holds specific non-compliant lines, runs duplicate/anomaly detection, and escalates fraud signals and over-cap totals.

Enterprise

Governed spend audit

Adds multi-policy support, ERP/AP integration, full audit trails and SLAs, fraud-pattern analytics across employees, and check tuning from reviewer outcomes.

Download the Agent Blueprint

The complete blueprint, zipped — including a runnable run.py you can execute with one API key (Anthropic or OpenAI).

Download Blueprint (.zip)

README.mdsystem-prompt.mdsetup-guide.mdtools.jsonworkflow.mdexamples.md.env.examplekit.jsonrun.pyLICENSENOTICEstarters/

Export

Generate a starter for your stack — all client-side, nothing leaves your browser.

ZIP

Starters use mock tools — swap in your integrations to deploy.

View the source on GitHub

This blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).

Expense Audit & Compliance Agent

Overview

AgentAz™ specification

Governance matrix

Agent component mapping

Failure modes

Evaluation

When to use

System prompt

Simulate run

Setup guide

Architecture

Tools required

Workflow

Examples

Implementation notes

Variations

Frequently asked questions

Will it approve expenses automatically?

How does it decide what violates policy?

Does it accuse employees of fraud?

What happens to a report with one bad line?

How does it catch duplicates?

How do we roll it out safely?

Related kits

Transaction Reconciliation Agent

Refund & Returns Resolution Agent

Company Policy Q&A Agent

Compliance Control Monitoring Agent