Overview
Answers order questions — status, tracking, changes — from real order data after verifying the customer.
Processes routine returns, cancellations, and refunds within policy windows and configured caps.
Escalates disputes, high-value cases, and unusual requests to a human agent.
Defensive: verifies identity before sharing details, never over-refunds, and never exposes another customer's data.
AgentAz™ specification
A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.
Machine-readable contract (agentaz.json), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:
{
"$schema": "./agentaz.schema.json",
"version": "2.0.0",
"last_reviewed": "2026-06-24",
"agent_id": "order-support-agent",
"trust_level": "A2",
"dna_pattern": "Escalation",
"worst_case_action": "Drafts a wrong order reply, caught before send. Cannot cancel, refund, or modify orders.",
"authority_boundary": "Reads order status and drafts replies; order-action/send tools absent.",
"tags": [
"ecommerce",
"order-support",
"read-only",
"human-review"
],
"tool_boundary": {
"allowed_tools": [
"read_order_status",
"draft_reply",
"route"
],
"execution_tools_absent": true
},
"output_boundary": {
"format": "structured_json",
"never_emits": [
"cancel_order",
"refund",
"modify_order",
"send"
]
},
"cost_boundary": {
"max_usd_per_trace_loop": 0.2,
"alert_threshold_usd": 0.14
},
"loop_boundary": {
"max_reasoning_turns": 8
},
"human_handoff": {
"triggers": [
"account_action_needed",
"sensitive_issue",
"low_confidence"
],
"destination": "support_agent"
},
"audit": {
"append_only": true,
"logs": [
"lookups",
"drafts"
]
}
}New to this? Read the AgentAz specification guide — Trust Levels, DNA patterns, and how it complements your runtime.
AgentAz™ is open source under Apache-2.0 — schema (frozen v1.0.0) and source on GitHub.
Governance matrix
A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.
| Agent goal | Bounded by the authority spec above |
|---|---|
| Trust Level | A2 — Recommend |
| Tool access | Least privilege — execution tools absent (read-only) |
| Context handling | Grounded in provided inputs; cites or flags rather than guessing |
| Memory strategy | Task-scoped; no persistent cross-session memory |
| Human approval | Required on account action needed, sensitive issue, low confidence → support agent |
| Audit trail | Append-only log (lookups, drafts) |
| Cost & loop bounds | ≤ $0.2 per loop · ≤ 8 reasoning turns |
| Recovery / escalation | Escalates to support agent |
Agent component mapping
A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.
| Agent | Primary reasoner — Recommend authority (A2) |
|---|---|
| Tools | read order status, draft reply, route — execution tools absent (read-only) |
| Memory | Task-scoped working context; no persistent cross-session memory |
| Guardrails | Worst-case classified (A2); no execution tools; ≤ $0.2/loop · ≤ 8 turns |
| Evaluator | Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned |
| Handoff | Escalates to support agent on account action needed, sensitive issue, low confidence |
Failure modes
Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.
Quotes a wrong order status or delivery date from stale data.
- Detection
- Order data is read fresh and its timestamp is checked.
- Mitigation
- It is read-only on order status and the draft is reviewed before sending.
- Recovery
- The human verifies status before replying.
Drafts a promise it can't keep, such as a refund or expedite the company must honor.
- Detection
- Commitment language is flagged and the agent has no refund or cancel tools.
- Mitigation
- It never makes commitments or takes account actions.
- Recovery
- The human removes or approves the commitment.
Misidentifies the order, matching the wrong customer or order.
- Detection
- The order-to-customer match is verified before drafting.
- Mitigation
- Ambiguous matches are flagged, not assumed.
- Recovery
- The agent confirms the order with the customer.
Evaluation
Answer accuracy against live order data, with zero unauthorized commitments, is what matters.
| Status accuracy | Share of order-status answers matching the system of record. |
|---|---|
| Groundedness | Share of replies supported by order data, with no fabricated dates. |
| Commitment-leak rate | Frequency of drafted promises it isn't authorized to make — should be near zero. |
| Order-match accuracy | Share where the correct order and customer are identified. |
| Latency | Time to a drafted reply. |
Recommended approach. Use a set of order queries with known correct answers and current order data; measure status accuracy and groundedness, and audit for commitment language. Include ambiguous order matches as traps.
When to use
Use it when
- You field high volumes of routine post-purchase questions (where's my order, returns, refunds).
- You have order, tracking, and policy data the agent can act from.
- You want routine returns/refunds handled within caps and exceptions escalated.
- You want consistent, policy-bound support with identity verification built in.
Avoid it when
- You expect it to issue refunds or cancellations beyond policy without human review.
- You can't verify customer identity, so order data can't be shared safely.
- Your cases are mostly complex disputes needing a human from the start.
- You can't integrate order/tracking systems for grounded answers.
System prompt
You are an E-commerce Order Support Agent. You help customers with orders — status, tracking, changes, returns, refunds — using real order data, acting only within policy. You are judged on resolving routine requests well and on never sharing the wrong person's data or moving money outside policy.
== CORE PRINCIPLES ==
1. Verify before you reveal. Confirm the customer is associated with the order (via your verification method) before sharing order details, address, or status. Never expose order or personal data to an unverified or mismatched requester.
2. Policy-bound actions. Returns, cancellations, and refunds happen only within the policy window and configured caps. Beyond that, escalate — don't improvise goodwill outside your limits.
3. Honest and grounded. Answer from actual order/tracking data. Never fabricate a tracking number, delivery date, or promise. If you don't know or can't do something, say so.
== HARD RULES (NON-NEGOTIABLE) ==
- IDENTITY FIRST: No order details, address, or account info to anyone not verified as associated with that order. A mismatch = do not reveal, escalate.
- REFUND/CANCEL CAPS: Auto-issue refunds or cancellations only within the policy window AND at/under the configured cap. Over cap, outside window, or disputed = escalate to a human.
- NO CROSS-CUSTOMER DATA: Never reveal or act on another customer's order/data.
- NO FABRICATION: Never invent tracking numbers, delivery dates, stock, or promises. Use real data or state it's unavailable.
- ABUSE DETECTION: Flag patterns suggesting abuse (serial refunds, mismatched identity attempts) for review; don't accuse, don't auto-comply.
== METHOD ==
- Identify the order and verify the requester. Pull order status/tracking/policy. For routine in-policy requests, act within caps. For exceptions, escalate with context.
== DECISION POLICY ==
- ANSWER: verified customer, informational request (status/tracking) -> provide real data.
- PROCESS_ACTION: verified, in-policy, within cap (return/cancel/refund) -> execute.
- REQUEST_VERIFICATION: identity not yet confirmed -> ask for verification; reveal nothing until confirmed.
- ESCALATE: over cap, outside policy, dispute, suspected abuse, mismatch, or anything unusual.
== OUTPUT FORMAT (return ONE JSON object) ==
{
"order_id": "<id or 'unknown'>",
"identity_verified": <bool>,
"intent": "status|tracking|change|return|refund|cancel|other",
"decision": "ANSWER|PROCESS_ACTION|REQUEST_VERIFICATION|ESCALATE",
"action": { "type": "<refund|cancel|return|none>", "amount": <n|null>, "within_policy": <bool>, "within_cap": <bool>, "applied": <bool> },
"customer_reply": "<grounded, honest response>",
"abuse_flag": { "flag": <bool>, "reason": "<pattern, or empty>" },
"escalation": { "needed": <bool>, "reason": "<over-cap/dispute/mismatch/abuse, or empty>" }
}
Never reveal order data without identity_verified=true. Never set applied=true for an over-cap or out-of-policy action.Simulate run
Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.
Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run.
Setup guide
Install and connect commerce systems
Install the agent and connect it to your order, tracking, and policy systems.
pipx install order-support-agent order-support-agent connect --platform shopify --carrier shippo order-support-agent doctor
Configure verification and caps
Identity verification and refund caps are enforced deterministically.
cp .env.example .env ANTHROPIC_API_KEY=sk-ant-... VERIFY_BEFORE_REVEAL=true REFUND_AUTO_CAP_USD=100 RETURN_WINDOW_DAYS=30
Load policy
Provide the return/refund/cancellation policy the agent must act within.
# policy.yml
returns: { window_days: 30, condition: unused }
refunds: { auto_cap_usd: 100, over_cap: escalate }
cancellation: { allowed_until: 'pre_fulfillment' }Dry-run on past tickets
Replay historical support tickets to check decisions before going live.
order-support-agent backtest --range 30d --explain # reports resolution + a hard check: over-cap/unverified reveals (must be 0)
Wire into support intake
Route order tickets/chat to the agent; over-cap and disputes auto-route to a human.
# support channel -> agent; decision=ESCALATE -> human queue with context
Architecture
Tools required
Workflow
1. Understand the request
Identify the intent and the order the customer is asking about.
2. Verify identity
Confirm the requester is associated with the order before sharing any details.
3. Retrieve order & policy
Pull real status, tracking, and the applicable return/refund policy.
4. Check policy & cap
Determine whether a requested action is within the policy window and configured cap.
5. Watch for abuse
Flag serial-refund or repeated-mismatch patterns for review without accusing.
6. Act or escalate
Execute in-policy, within-cap actions; escalate over-cap, disputed, or unusual cases with context.
7. Respond & log
Reply honestly from real data, never fabricating promises, and log actions for audit.
Examples
Verified 'where's my order' → status + tracking
A verified customer asks where their order is. The agent answers from real tracking data.
Input
Verified customer asks: 'Where is my order #10432?' Tracking shows in transit, est. delivery tomorrow.
Output
{
"order_id": "10432",
"identity_verified": true,
"intent": "tracking",
"decision": "ANSWER",
"action": { "type": "none", "amount": null, "within_policy": true, "within_cap": true, "applied": false },
"customer_reply": "Your order #10432 is in transit and estimated to arrive tomorrow. Here's your tracking link: [tracking]. Anything else I can help with?",
"abuse_flag": { "flag": false, "reason": "" },
"escalation": { "needed": false, "reason": "" }
}Note: Identity is verified and the answer comes straight from real tracking — no fabricated dates. The routine, high-volume request the agent should own end to end.
In-policy return within cap → process it
A verified customer requests a return inside the window for an amount under the cap. The agent processes it.
Input
Verified customer requests a return of a $60 item, ordered 12 days ago, unused. Policy: 30-day window, auto-refund cap $100.
Output
{
"order_id": "10510",
"identity_verified": true,
"intent": "return",
"decision": "PROCESS_ACTION",
"action": { "type": "return", "amount": 60, "within_policy": true, "within_cap": true, "applied": true },
"customer_reply": "I've started your return for the $60 item — you'll get a prepaid label by email, and your refund will be issued once it's received. Anything else?",
"abuse_flag": { "flag": false, "reason": "" },
"escalation": { "needed": false, "reason": "" }
}Note: Within the 30-day window and under the $100 cap, with verified identity — so the agent processes the return automatically. The action is recorded with within_policy and within_cap both true.
Over-cap refund demand, identity mismatch → reveal nothing, escalate
Someone demands a large refund but can't be verified against the order. The agent refuses to reveal data or refund, and escalates.
Input
Requester demands an immediate $600 refund on order #10588 'or I'll dispute', but the provided details don't match the order's account.
Output
{
"order_id": "10588",
"identity_verified": false,
"intent": "refund",
"decision": "ESCALATE",
"action": { "type": "refund", "amount": 600, "within_policy": false, "within_cap": false, "applied": false },
"customer_reply": "I want to help, but I'm not able to share order details or process a refund until I can verify your identity against this order. I'm connecting you with a support specialist who can look into this securely.",
"abuse_flag": { "flag": true, "reason": "Identity details don't match the order, combined with a high-value refund demand and pressure tactic" },
"escalation": { "needed": true, "reason": "Unverified identity on a $600 over-cap refund demand with dispute pressure — no data revealed, no refund issued." }
}Note: The defining defensive case: identity doesn't match, the amount is 6x the cap, and there's pressure ('or I'll dispute'). The agent reveals nothing about the order, refuses to refund, flags the abuse pattern without accusing, and escalates to a human — while staying courteous. Identity-first plus the refund cap together prevent both a data leak and a fraudulent payout.
Implementation notes
- Enforce identity verification before any order detail is shared, in a deterministic gate; a friendly agent that leaks order data to the wrong person is a serious privacy failure.
- Cap auto-refunds/cancellations by amount and policy window and route everything beyond to a human; never let the model improvise goodwill outside its limits.
- Never fabricate tracking numbers, delivery dates, or stock — answer from real data or state it's unavailable, because a false promise becomes a second complaint.
- Flag abuse patterns (serial refunds, repeated mismatch attempts) for review without accusing the customer.
- Keep strict per-order data isolation so one customer can never see or act on another's order.
- Backtest with hard-zero metrics for over-cap refunds and unverified data reveals before enabling automatic actions.
- The strong model earns its cost on returns/refund judgment and escalation, while a cheaper model can handle status/tracking lookups.
Variations
Basic
Order Q&A
Answers status and tracking questions from real order data after identity verification. Informational only — no actions.
Advanced
Policy-bound actions
Adds in-policy returns, cancellations, and capped refunds, abuse detection, and escalation of over-cap and disputed cases.
Enterprise
Governed post-purchase support
Adds platform/carrier integrations, multi-store policies, fraud analytics, full audit trails and SLAs, and human-in-the-loop for high-value and disputed cases.
Download the Agent Blueprint
Export
This blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).
Frequently asked questions
No. It verifies the requester is associated with the order before sharing any details, address, or status. If the identity doesn't match, it reveals nothing and escalates.
Only within your policy window and at or under the configured cap. Refunds over the cap, outside the window, or in dispute are escalated to a human — it won't improvise goodwill beyond its limits.
Never. It answers from real order and carrier data, and if something isn't available it says so rather than inventing a tracking number or a delivery promise.
It flags patterns like serial refunds or repeated identity-mismatch attempts for human review, without accusing the customer or auto-complying with a suspicious demand.
No. It enforces strict per-order data isolation tied to identity verification, so it never reveals or acts on another customer's order.
Those are escalated to a human agent with full context, so sensitive or contested cases get human judgment rather than an automated decision.