AgentKits

Phishing Triage & Response Agent

Flagship BlueprintAgentAz™ Enhanced
0New

Includes Agent Blueprint + Implementation Guide

An agent that works the user-reported phishing queue the way a good security analyst would: it enriches the reported email's indicators, detonates URLs and attachments in a sandbox, checks sender authentication, scopes how many other people got it, and then quarantines, blocks, and responds — or escalates. It is defensive by design: it never detonates anything outside the sandbox, never mass-purges across the org without approval, never marks an email 'safe' on weak evidence, and escalates targeted spear-phishing and business-email-compromise to humans with the full evidence trail.

phishingemail-securitysocincident-responsethreat-intelautonomous-agentbecsecurityagentazagent-governancetrust-levelproduction-readiness
StackClaude, LangGraph, OpenAI
DifficultyAdvanced
Setup50 min
Version2.0.0 · 2026-06-21

Overview

Enrich → detonate → scope → respond: turns a reported email into a verdict, a campaign view, and a contained incident.

Safe analysis: URLs and attachments are detonated only in a sandbox, and sender auth (SPF/DKIM/DMARC) and reputation ground the verdict.

Campaign-aware: it finds who else received the message so response covers the whole blast, not just one inbox.

Defensive: no org-wide purge without approval, no 'safe' verdict on weak evidence, and targeted BEC/spear-phishing escalates to humans.

AgentAz™ specification

A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.

Trust Level ?A3 — Human-Approved
DNA PatternEscalation (Research → Evaluate → Plan → Escalate)
Worst-Case ActionStages an incorrect containment action (such as a quarantine) that an analyst approves before it runs, or misclassifies a reported email. It cannot quarantine, block, or delete on its own — those actions require human approval.
Authority BoundaryAnalyzes a reported email, extracts indicators, scores risk, and stages a recommended containment action for analyst approval. It never quarantines, blocks senders, or deletes mail autonomously. An analyst approves any action.
Verification TestStage a containment action → confirm it requires explicit analyst approval and is not auto-executed; confirm destructive actions are gated.
Production Readiness6/6 dimensions passing. Tool isolation: containment actions gated behind approval. Human gates: an analyst approves. Confidence escalation: uncertain verdicts routed up. Cost ceiling: bounded per report. Audit trail: indicators and decisions logged. Escalation path: confirmed threats routed to the SOC.
Last Reviewed2026-06-24

Machine-readable contract (agentaz.json), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:

agentaz.json
{
  "$schema": "./agentaz.schema.json",
  "version": "2.0.0",
  "last_reviewed": "2026-06-24",
  "agent_id": "phishing-triage-agent",
  "trust_level": "A3",
  "dna_pattern": "Escalation",
  "worst_case_action": "Stages an incorrect quarantine for analyst approval, or misclassifies an email. Cannot auto-contain.",
  "authority_boundary": "Analyzes reported phishing and stages containment for approval; no autonomous quarantine/block.",
  "tags": [
    "security",
    "phishing",
    "soc",
    "human-approval"
  ],
  "tool_boundary": {
    "allowed_tools": [
      "read_report",
      "extract_indicators",
      "score_risk",
      "stage_containment"
    ],
    "approval_required_tools": [
      "quarantine",
      "block_sender"
    ],
    "execution_tools_absent": false
  },
  "output_boundary": {
    "format": "structured_json",
    "never_without_approval": [
      "quarantine",
      "block_sender",
      "delete_mail"
    ]
  },
  "cost_boundary": {
    "max_usd_per_trace_loop": 0.25,
    "alert_threshold_usd": 0.16
  },
  "loop_boundary": {
    "max_reasoning_turns": 8
  },
  "human_handoff": {
    "triggers": [
      "confirmed_threat",
      "uncertain_verdict",
      "containment_proposed"
    ],
    "destination": "soc_analyst"
  },
  "audit": {
    "append_only": true,
    "logs": [
      "indicators",
      "verdict",
      "staged_actions",
      "approvals"
    ]
  }
}

New to this? Read the AgentAz specification guide — Trust Levels, DNA patterns, and how it complements your runtime.

This is a flagship reference blueprint for AgentAz v1.0.0. AgentAz™ is open source under Apache-2.0 (spec text under CC‑BY‑4.0) — schema and source on GitHub.

Governance matrix

A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.

Agent goalBounded by the authority spec above
Trust LevelA3 — Human-Approved
Tool accessScoped tools; high-risk actions gated behind approval
Context handlingGrounded in provided inputs; cites or flags rather than guessing
Memory strategyTask-scoped; no persistent cross-session memory
Human approvalRequired on confirmed threat, uncertain verdict, containment proposed → soc analyst
Audit trailAppend-only log (indicators, verdict, staged actions, approvals)
Cost & loop bounds≤ $0.25 per loop · ≤ 8 reasoning turns
Recovery / escalationEscalates to soc analyst

Agent component mapping

A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.

AgentPrimary reasoner — Human-Approved authority (A3)
Toolsread report, extract indicators, score risk, stage containment; approval-gated: quarantine, block sender
MemoryTask-scoped working context; no persistent cross-session memory
GuardrailsWorst-case classified (A3); high-risk actions gated; ≤ $0.25/loop · ≤ 8 turns
EvaluatorConfidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned
HandoffEscalates to soc analyst on confirmed threat, uncertain verdict, containment proposed

Failure modes

Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.

Misclassifies a benign email as phishing, triggering unnecessary containment.

Detection
A confidence score is attached and containment is staged, not auto-run.
Mitigation
An analyst approves any quarantine or block.
Recovery
The staged action is reversed and the message is un-quarantined.

Misses a real phish (false negative).

Detection
Uncertain verdicts are routed up; indicators are extracted regardless of verdict.
Mitigation
Human-in-the-loop on uncertain cases; the agent never auto-clears.
Recovery
A post-report retro adds the missed indicator set to the rules.

Indicator extraction from a weaponized attachment.

Detection
Analysis runs in a sandbox with no execution.
Mitigation
Static indicator extraction only; live payloads are never opened.
Recovery
The contained sample is escalated to the SOC.

Evaluation

Recall on true phishing is primary — a missed phish is the costly error — balanced against false-positive containment.

Verdict accuracyShare of reported emails classified correctly as phishing or benign.
RecallOf true phishing emails, the share correctly identified — weighted high.
PrecisionOf emails flagged for containment, the share genuinely malicious — false-positive resistance.
Indicator accuracyCorrectness of the extracted indicators of compromise.
LatencyTime to a verdict per reported email.

Recommended approach. Use a labeled corpus of reported emails — phishing, benign, and edge cases — and measure recall and precision separately. Verify staged containment never auto-runs, and check extracted indicators against known IOC feeds.

When to use

Use it when

  • Your security team triages a high volume of user-reported suspicious emails and most of the work is enrichment and obvious-case handling.
  • You have a mail platform and a sandbox/threat-intel the agent can use for safe detonation and reputation lookups.
  • You want consistent verdicts with an evidence trail and campaign scoping, plus a reply back to the reporter.
  • You want to auto-handle the clear phishing and clear-safe cases while routing targeted attacks to humans.

Avoid it when

  • You have no sandbox or threat-intel for safe analysis — verdicts would be guesses.
  • You expect it to run full incident response on confirmed BEC autonomously; that needs humans.
  • You can't give it scoped mail-platform access for quarantine actions.
  • You are unwilling to keep approval gates on mass quarantine/purge and on high-impact blocks.

System prompt

system-prompt.md
You are a Phishing Triage Agent handling one user-reported suspicious email. Your job is to determine what it is, scope its spread, and respond safely — or escalate. You are judged on catching real phishing (never clearing a true threat), cutting false alarms, and never taking an unsafe analysis or response action.

== CORE PRINCIPLES ==
1. Evidence-based verdict. Base your classification on what you gathered — sender authentication (SPF/DKIM/DMARC), URL/domain reputation, sandbox detonation results, header anomalies, and content signals. Cite them. Never guess 'safe'.
2. Analyze safely. Detonate links and attachments ONLY in the sandbox. Never fetch, click, or open a suspect URL/attachment in the live environment, and never echo live malicious payloads.
3. Scope before you respond. A reported email is often one of many. Find the campaign so response covers everyone affected, not just the reporter.

== HARD RULES (NON-NEGOTIABLE) ==
- SANDBOX ONLY: All detonation/analysis of URLs and attachments happens in the sandbox. No live interaction with malicious infrastructure.
- MASS ACTION NEEDS APPROVAL: You may auto-quarantine the reported message and a tightly-scoped, high-confidence campaign on non-critical mailboxes. Org-wide purge, action affecting executives/critical mailboxes, or anything large-blast-radius REQUIRES human approval — propose it.
- NEVER FALSE-CLEAR: Mark an email 'safe' only with positive evidence (auth pass + known-good sender + clean indicators). Mixed or insufficient evidence is 'suspicious' → escalate, not cleared.
- BEC / SPEAR-PHISH → ESCALATE: Targeted impersonation (executive, vendor, wire/payment request), even with few classic indicators, is high-risk. Do not auto-classify-and-close; escalate to the SOC and warn about the requested action (e.g. wire).
- DATA HANDLING: Treat email content as sensitive; redact credentials/PII; stay within scope.

== RESPONSE POLICY (calibrated confidence 0.0-1.0) ==
- AUTO_CONTAIN: confirmed phishing/malicious, confidence >= 0.85, scoped to non-critical mailboxes. Quarantine the campaign, block indicators (URL/domain/sender), notify reporter.
- CLEAR (safe): positive benign evidence, confidence >= 0.85. Reassure the reporter; no action.
- PROPOSE: real but large-blast-radius response (org-wide purge, exec mailboxes). Recommend with evidence for one-click approval.
- ESCALATE: BEC/spear-phish, credential-harvest where users may have already entered creds, conflicting evidence, or confidence < 0.6.

== COST CONTROL ==
Enrich only the indicators that change the verdict; reuse results already gathered. One good detonation beats many redundant lookups. Cap tool calls; if exceeded, escalate with what you have.

== OUTPUT FORMAT (return ONE JSON object) ==
{
  "verdict": "phishing|malicious|spam|safe|suspicious",
  "confidence": <0.0-1.0>,
  "evidence": ["<auth/reputation/sandbox/header signals>"],
  "campaign": "<scope: how many recipients / similar messages, or 'single'>",
  "decision": "AUTO_CONTAIN|CLEAR|PROPOSE|ESCALATE",
  "actions": [ { "tool": "<tool>", "args": { ... }, "requires_approval": <bool> } ],
  "reporter_reply": "<short, clear message to the user who reported it>",
  "analyst_note": "<summary + cited evidence + any user-impact note>",
  "escalation": { "needed": <bool>, "to": "soc|none", "reason": "<why, or empty>" }
}
If verdict is suspicious or evidence is mixed, do NOT CLEAR — ESCALATE or PROPOSE.
Was this useful?

Simulate run

Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.

Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run.

Setup guide

Install and connect mail + sandbox

Install the agent and connect it to your mail platform, sandbox, and threat-intel.

shell
pipx install phishing-triage-agent
phishing-triage-agent connect --mail o365 --sandbox cuckoo --intel virustotal
phishing-triage-agent doctor   # verifies sandbox isolation + scoped mail access

Set response authority and caps

Define what may auto-contain. Mass/exec actions stay approval-gated. Enforced outside the model.

shell
cp .env.example .env
ANTHROPIC_API_KEY=sk-ant-...
MAX_TOOL_CALLS=8
AUTO_CONTAIN_SCOPE=non_critical_mailboxes
MODE=advise   # advise (recommend) | act (auto within scope)

Mark critical mailboxes & rules

Tell the agent which mailboxes are sensitive so it never auto-purges them, and how to scope campaigns.

shell
# .phishing.yml
critical_mailboxes: ["exec/*", "finance/*", "legal/*"]
require_approval: ["org_wide_purge", "critical_mailbox_action"]
always_escalate: ["bec", "wire_request", "credential_harvest_clicked"]

Backtest on past reports

Replay resolved phishing reports to measure verdict accuracy — especially missed true positives.

shell
phishing-triage-agent backtest --range 30d --explain
# reports phishing/safe accuracy, FP rate, and missed-true-positive count (target ~0)

Wire to the abuse mailbox

Route user reports to the agent. Start in advise mode, enable act mode for scoped containment once backtests are clean.

shell
# 'Report Phishing' button / abuse@ -> POST https://your-host/phishing/report (HMAC)
# promote MODE=act after a clean backtest with zero missed true positives

Architecture

Tools required

get_reported_emailRetrieve the reported email with full headers, raw body, URLs, and attachments from the mail platform/abuse mailbox.
sender_auth_checkEvaluate SPF, DKIM, and DMARC alignment and sender/domain age and reputation.
url_reputationLook up URLs/domains against threat-intel for reputation, known-phishing lists, and first-seen age.
detonate_sandboxOpen URLs and attachments in an isolated sandbox to observe redirects, credential-harvest pages, and malware behavior. Never runs live.
search_campaignSearch the mail environment for similar/related messages to scope how many recipients received the same attack.
quarantine_emailQuarantine/remove messages. Auto-allowed for scoped, high-confidence campaigns on non-critical mailboxes; mass/exec actions are approval-gated.
block_indicatorBlock a malicious URL, domain, or sender at the mail gateway/proxy to stop further delivery and clicks.
escalate_to_socRoute to the SOC with the evidence package for BEC/spear-phish, credential-harvest exposure, or uncertain high-risk cases.

Workflow

  1. 1. Intake the report

    Pull the reported email with full headers and raw body; extract sender, URLs, and attachments for analysis.

  2. 2. Check authentication & reputation

    Evaluate SPF/DKIM/DMARC and look up URL/domain/sender reputation — the fastest legitimacy signals.

  3. 3. Detonate safely

    Detonate URLs and attachments in the sandbox to reveal credential-harvest pages or malware, never touching live infrastructure.

  4. 4. Scope the campaign

    Search the environment for similar messages to find every affected recipient, not just the reporter.

  5. 5. Decide the verdict

    Classify on the combined evidence; only positive benign evidence clears an email, and mixed/insufficient evidence is suspicious.

  6. 6. Contain or escalate

    Auto-contain scoped high-confidence phishing on non-critical mailboxes; propose mass/exec actions for approval; escalate BEC/spear-phish to the SOC.

  7. 7. Reply & learn

    Send a clear reply to the reporter, log the evidence and outcome, and feed analyst agreement back to tune verdicts.

Examples

Credential-phishing with lookalike domain → contain the campaign

A classic Microsoft-login phish from a lookalike domain. Sandbox confirms a credential-harvest page; several colleagues got it too.

Input

input
Reported email: 'Your mailbox is full, re-validate here' from no-reply@micros0ft-login.com, link to hxxps://micros0ft-login.com/verify.

Output

output
{
  "verdict": "phishing",
  "confidence": 0.95,
  "evidence": ["sender_auth_check: SPF/DKIM/DMARC all fail; domain micros0ft-login.com registered 3 days ago", "detonate_sandbox: link loads a fake Microsoft login that POSTs credentials to an external host", "url_reputation: domain on two phishing feeds"],
  "campaign": "search_campaign: 23 recipients across 4 teams received the same message in the last hour (none in critical mailboxes)",
  "decision": "AUTO_CONTAIN",
  "actions": [
    { "tool": "quarantine_email", "args": { "scope": "campaign", "recipients": 23, "mailboxes": "non_critical" }, "requires_approval": false },
    { "tool": "block_indicator", "args": { "type": "domain", "value": "micros0ft-login.com" }, "requires_approval": false }
  ],
  "reporter_reply": "Thanks for reporting this — it's a phishing attempt using a fake Microsoft login page. We've removed it from affected inboxes and blocked the sender. You did the right thing; never enter your password on links like this.",
  "analyst_note": "Confirmed credential-phishing (auth fails, sandbox shows harvest page, domain 3 days old). Quarantined 23 copies, blocked domain. No critical mailboxes hit. Recommend checking if anyone submitted creds.",
  "escalation": { "needed": false, "to": "none", "reason": "" }
}

Note: High-confidence phishing with strong evidence (auth fails + sandbox harvest page + young domain), scoped to non-critical mailboxes, so the agent auto-contains the whole 23-recipient campaign and blocks the domain — not just the one reported copy. The reporter gets a clear, encouraging reply.

Legit newsletter mistaken for phishing → clear with evidence

An anxious user reports a real marketing email. The agent confirms it's safe on positive evidence and reassures them — without taking action.

Input

input
Reported email: a product newsletter from news@updates.realvendor.com with an 'unsubscribe' link.

Output

output
{
  "verdict": "safe",
  "confidence": 0.9,
  "evidence": ["sender_auth_check: SPF/DKIM/DMARC all pass, aligned with realvendor.com", "url_reputation: links resolve to realvendor.com, clean reputation, domain 6 years old", "detonate_sandbox: unsubscribe link loads a normal preference page, no credential prompt"],
  "campaign": "single / bulk-marketing pattern, not targeted",
  "decision": "CLEAR",
  "actions": [],
  "reporter_reply": "Good instinct to check! This one is legitimate — it's a marketing newsletter from RealVendor, and its sender authentication and links all check out. No action needed; you can unsubscribe using the link if you'd prefer not to receive it.",
  "analyst_note": "Benign bulk marketing: auth passes and aligns, clean links, established domain, no harvest behavior. No action.",
  "escalation": { "needed": false, "to": "none", "reason": "" }
}

Note: The agent only clears on positive evidence — auth passes and aligns, established clean domain, sandbox shows a normal page. It reassures the reporter and reinforces the good behavior of reporting, which keeps users engaged with security rather than discouraged.

CEO-impersonation wire request (BEC) → escalate, don't auto-close

A text-only email with few classic indicators asks finance to wire money urgently. Low on malware signals, high on risk. The agent escalates.

Input

input
Reported email to a finance staffer: 'Are you at your desk? I need you to process an urgent wire to a new vendor before EOD — keep this confidential. — [CEO name]', reply-to a Gmail address. No links or attachments.

Output

output
{
  "verdict": "suspicious",
  "confidence": 0.55,
  "evidence": ["sender_auth_check: From spoofs the CEO display name; reply-to is an external gmail.com, not the corporate domain", "no URLs/attachments to detonate — text-only social-engineering", "content: urgency + confidentiality + new-vendor wire request = classic BEC pattern", "campaign: targeted to one finance employee, not bulk"],
  "campaign": "single, targeted",
  "decision": "ESCALATE",
  "actions": [
    { "tool": "escalate_to_soc", "args": { "type": "bec", "priority": "high", "note": "Do not action the wire; verify CEO via known channel" }, "requires_approval": false }
  ],
  "reporter_reply": "Thanks for flagging this — please do NOT process any payment from this email. It has signs of a business-email-compromise (CEO-impersonation) scam. Verify any such request with [CEO] through a known phone number, not by replying. Our security team is now reviewing it.",
  "analyst_note": "Likely BEC: display-name spoof, external reply-to, urgency + confidentiality + new-vendor wire. No malware indicators to 'detonate' — the risk is the financial action. Escalated to SOC; advised the user not to wire.",
  "escalation": { "needed": true, "to": "soc", "reason": "Targeted CEO-impersonation wire request (BEC) — financial-fraud risk, low classic indicators." }
}

Note: The defining case: there's no malicious URL or attachment to detonate, so a signature-driven tool would shrug. The agent recognizes the BEC social-engineering pattern, escalates to the SOC despite only moderate confidence, and — most importantly — tells the user not to wire the money and to verify out-of-band. The real payload here is the financial action, not malware.

Implementation notes

  • Keep all detonation sandbox-only and enforce it outside the model; the agent must never interact with live malicious infrastructure.
  • Require positive benign evidence to clear an email. Mixed or insufficient evidence is 'suspicious' and escalates — false clears are the most damaging error.
  • Scope the campaign before responding so containment covers every affected recipient, not just the reporter; gate org-wide and exec-mailbox actions behind approval.
  • Treat BEC/spear-phishing as escalate-by-default: these have few classic indicators but the highest impact, and the right move is a human plus a warning about the requested action.
  • Always reply to the reporter — clearly, and encouragingly even when it's a false alarm. Reporting culture is a security asset worth protecting.
  • Backtest on resolved reports and track missed-true-positive rate as the primary safety metric before enabling any auto-containment.
  • Reserve the strong model for the verdict and BEC judgment; a cheaper model can parse headers and run reputation lookups.

Variations

Basic

Triage & enrichment assistant

Enriches the reported email, detonates indicators in the sandbox, returns a verdict with evidence and a suggested response for an analyst. No autonomous containment.

Advanced

Guarded auto-containment

Auto-quarantines scoped, high-confidence campaigns on non-critical mailboxes and blocks indicators, with campaign scoping, reporter replies, and approval-gated mass/exec actions.

Enterprise

Governed phishing response

Adds multi-tenant mail integration, critical-mailbox policies, SOC/IR routing, full evidence audit, BEC analytics, and verdict calibration from analyst feedback at scale.

Download the Agent Blueprint

The complete blueprint, zipped — including a runnable run.py you can execute with one API key (Anthropic or OpenAI).

Download Blueprint (.zip)
README.mdsystem-prompt.mdsetup-guide.mdtools.jsonworkflow.mdexamples.md.env.examplekit.jsonrun.pyLICENSENOTICEstarters/

Export

Generate a starter for your stack — all client-side, nothing leaves your browser.

ZIP

Starters use mock tools — swap in your integrations to deploy.

View the source on GitHub

This flagship blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).

Frequently asked questions