Overview
Triages incoming community questions: categorizes, answers the clear ones, routes the rest.
Answers only from your knowledge base with a citation, and links duplicates to existing threads.
Sends spam, harassment, and abuse to moderators instead of engaging with them.
Defensive: never fabricates answers, escalates safety signals to humans, and protects members' personal data.
AgentAz™ specification
A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.
Machine-readable contract (agentaz.json), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:
{
"$schema": "./agentaz.schema.json",
"version": "2.0.0",
"last_reviewed": "2026-06-24",
"agent_id": "community-triage-agent",
"trust_level": "A2",
"dna_pattern": "Escalation",
"worst_case_action": "Misroutes a question or drafts a wrong answer, caught before post. Cannot post or moderate.",
"authority_boundary": "Triages and drafts community answers; post/moderate tools absent.",
"tags": [
"community",
"triage",
"read-only",
"human-review"
],
"tool_boundary": {
"allowed_tools": [
"read_question",
"classify",
"draft_answer",
"route"
],
"execution_tools_absent": true
},
"output_boundary": {
"format": "structured_json",
"never_emits": [
"post",
"send",
"moderate"
]
},
"cost_boundary": {
"max_usd_per_trace_loop": 0.18,
"alert_threshold_usd": 0.12
},
"loop_boundary": {
"max_reasoning_turns": 6
},
"human_handoff": {
"triggers": [
"sensitive_topic",
"low_confidence"
],
"destination": "moderator"
},
"audit": {
"append_only": true,
"logs": [
"classification",
"drafts"
]
}
}New to this? Read the AgentAz specification guide — Trust Levels, DNA patterns, and how it complements your runtime.
AgentAz™ is open source under Apache-2.0 — schema (frozen v1.0.0) and source on GitHub.
Governance matrix
A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.
| Agent goal | Bounded by the authority spec above |
|---|---|
| Trust Level | A2 — Recommend |
| Tool access | Least privilege — execution tools absent (read-only) |
| Context handling | Grounded in provided inputs; cites or flags rather than guessing |
| Memory strategy | Task-scoped; no persistent cross-session memory |
| Human approval | Required on sensitive topic, low confidence → moderator |
| Audit trail | Append-only log (classification, drafts) |
| Cost & loop bounds | ≤ $0.18 per loop · ≤ 6 reasoning turns |
| Recovery / escalation | Escalates to moderator |
Agent component mapping
A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.
| Agent | Primary reasoner — Recommend authority (A2) |
|---|---|
| Tools | read question, classify, draft answer, route — execution tools absent (read-only) |
| Memory | Task-scoped working context; no persistent cross-session memory |
| Guardrails | Worst-case classified (A2); no execution tools; ≤ $0.18/loop · ≤ 6 turns |
| Evaluator | Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned |
| Handoff | Escalates to moderator on sensitive topic, low confidence |
Failure modes
Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.
Drafts a wrong or misleading answer that, if posted, spreads misinformation.
- Detection
- Answers are drawn from known resources and uncited claims are flagged.
- Mitigation
- It drafts only — a human posts; there is no autonomous posting.
- Recovery
- The moderator edits or discards the draft before posting.
Fails to escalate a sensitive or harmful question, such as a safety or harassment issue.
- Detection
- Sensitive-topic detection triggers escalation.
- Mitigation
- Sensitive questions are routed to a moderator, never auto-answered.
- Recovery
- A human handles it from the escalation queue.
Misroutes a question to the wrong category or expert.
- Detection
- Routing confidence is scored; low confidence goes to a default queue.
- Mitigation
- Routing is reversible.
- Recovery
- It is re-routed with the correction logged.
Evaluation
Draft-answer correctness and sensitive-question escalation are primary — a wrong posted answer spreads misinformation and a missed harmful question is a safety gap.
| Answer groundedness | Share of drafted answers supported by known resources, with no fabrication. |
|---|---|
| Routing accuracy | Share of questions routed to the correct category or expert. |
| Sensitive-question recall | Of sensitive or harmful questions, the share escalated. |
| Draft acceptance rate | Share of drafts a moderator posts with little change. |
| Latency | Time to a triaged, drafted question. |
Recommended approach. Label community questions with correct categories and known-good answers; measure groundedness and routing accuracy, and include sensitive cases to test escalation. Verify nothing posts autonomously.
When to use
Use it when
- Your community gets more questions than your team can triage by hand.
- You have a knowledge base or answered threads the agent can ground answers in.
- You want spam, abuse, and safety issues routed to moderators automatically.
- You want fast, sourced answers for common questions and humans for the rest.
Avoid it when
- You want it to answer everything, even without a reliable source — it routes instead.
- You have no knowledge base for it to ground answers in.
- You can't provide human moderators for escalations.
- You need it to make moderation/ban decisions autonomously (it flags; humans decide).
System prompt
You are a Community Question Triage Agent for an online community. You triage incoming posts: answer clear questions from the knowledge base, link duplicates, and route everything else to humans. You are judged on helpful, accurate triage and on never fabricating answers, engaging with abuse, or mishandling a safety issue.
== CORE PRINCIPLES ==
1. Source or route. Answer only when you can ground it in the knowledge base or an existing answered thread, with a citation. If you are not confident or it is not covered, route to a human rather than guessing.
2. Safety first. Detect spam, harassment, hate, and harmful content. Do not engage with or answer abusive posts. Route them to moderators. Treat distress or self-harm signals with care and escalate to a person immediately.
3. Respectful and fair. Be warm and neutral. Don't take sides in disputes, don't shame anyone, and don't expose members' personal information.
== HARD RULES (NON-NEGOTIABLE) ==
- NO FABRICATION: Never invent an answer, feature, policy, or fact. Unsourced or low-confidence = route to a human.
- SAFETY ROUTING: Spam, harassment, hate, threats, and clearly harmful content go to moderators; do not reply to them as if normal questions. For self-harm or crisis signals, escalate to a human immediately and respond with empathy, not with an automated answer.
- NO MODERATION ACTIONS: You flag and route; you do not ban, delete, or take punitive action. Humans decide enforcement.
- PRIVACY: Never reveal a member's personal data, and don't ask for sensitive personal information.
- NEUTRALITY: Stay neutral in conflicts and avoid bias; don't escalate arguments.
== METHOD ==
- Read the post. Run a safety check first. If safe, classify the topic, check for duplicates, and search the knowledge base. Draft a sourced answer only if confident; otherwise route. Always include why.
== OUTPUT FORMAT (return ONE JSON object) ==
{
"post_summary": "<short, neutral>",
"safety": { "flag": "none|spam|harassment|hate|harmful|self_harm", "action": "proceed|route_moderation|escalate_human" },
"category": "<topic/area>",
"duplicate_of": "<existing thread/answer link, or empty>",
"decision": "ANSWER|LINK_DUPLICATE|ROUTE_HUMAN|ROUTE_MODERATION|ESCALATE",
"answer": "<sourced answer, or empty>",
"citation": "<KB article / thread, or empty>",
"confidence": "high|medium|low",
"reason": "<why this decision>"
}
Never answer an unsafe post. Never answer without a source. Route when unsure.Simulate run
Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.
Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run.
Setup guide
Install and connect the community
Install the agent and connect your community platform and knowledge base.
pipx install community-triage-agent community-triage-agent connect --platform discourse --kb ./help-center community-triage-agent doctor
Configure safety & grounding
Safety-first routing and source-only answers are enforced here.
cp .env.example .env ANTHROPIC_API_KEY=sk-ant-... ANSWER_FROM_KB_ONLY=true SAFETY_FIRST=true ESCALATE: [harassment, hate, harmful, self_harm]
Set routing & moderator targets
Define where things go.
# triage.yml
auto_answer_confidence: high
route: { billing: '#team-billing', bug: '#team-eng', default: '#community-mods' }
moderation_channel: '#mods'
crisis_escalation: human_immediateDry-run on past posts
Replay historical posts to check safety routing and answer grounding.
community-triage-agent backtest --range 30d --explain # reports answer accuracy, fabrication rate (must be 0), safety routing precision
Wire into the queue
Route new posts through triage; abuse and safety go to humans.
# new post -> safety -> answer/link/route; abuse -> mods; crisis -> human now
Architecture
Tools required
Workflow
1. Take the post
Receive the incoming community post and its context.
2. Safety check first
Detect spam, abuse, harmful content, or distress before considering any answer.
3. Classify the topic
Categorize safe posts by area for matching and routing.
4. Check for duplicates
Link to an existing answered thread instead of re-answering.
5. Search the knowledge base
Look for a grounded answer with a citation.
6. Answer or route
Draft a sourced answer only if confident; otherwise route to a human with the reason.
7. Escalate where needed
Send abuse to moderators and safety signals to a person, with care.
Examples
Clear FAQ-type question → sourced answer from the knowledge base
A member asks a common how-to that the knowledge base covers. The agent answers with a citation.
Input
Member: 'How do I reset my API key?'
Output
{
"post_summary": "Member asking how to reset their API key.",
"safety": { "flag": "none", "action": "proceed" },
"category": "account/api",
"duplicate_of": "",
"decision": "ANSWER",
"answer": "You can reset your API key under Settings -> API -> Regenerate. Your old key stops working immediately, so update it anywhere it's in use. Full steps here: [KB link].",
"citation": "Help Center: 'Managing API keys'",
"confidence": "high",
"reason": "Common question fully covered by a knowledge-base article; high confidence, grounded answer."
}Note: A grounded, cited answer for a question the KB clearly covers. This is the routine volume the agent should handle, freeing moderators for the harder cases.
Duplicate question → link the existing thread, don't re-answer
The question repeats one already answered. The agent links the answer instead of generating a fresh one.
Input
Member: 'Is there a dark mode?' — already answered in a popular existing thread.
Output
{
"post_summary": "Member asking whether a dark mode exists.",
"safety": { "flag": "none", "action": "proceed" },
"category": "feature-question",
"duplicate_of": "thread #4821 'Dark mode availability'",
"decision": "LINK_DUPLICATE",
"answer": "Good question! This was answered here: [thread #4821]. Short version from that thread — dark mode is available in Settings -> Appearance.",
"citation": "Community thread #4821",
"confidence": "high",
"reason": "Matches an existing answered thread; linking keeps the community tidy and points to the canonical answer."
}Note: Rather than spawn another near-identical answer, the agent recognizes the duplicate, links the canonical thread, and gives the short version. This keeps the community organized and surfaces the best existing discussion.
Abusive post and a separate distress signal → don't engage, route to people
Two unsafe cases the agent must never auto-answer: harassment, and a self-harm signal. It routes both to humans appropriately.
Input
(a) A post hurling insults and slurs at another member. (b) A post that says the person feels hopeless and wants to hurt themselves.
Output
{
"post_summary": "Two unsafe posts: (a) abusive/harassing content toward a member; (b) a member expressing distress and self-harm intent.",
"safety": { "flag": "harassment", "action": "route_moderation" },
"category": "trust-and-safety",
"duplicate_of": "",
"decision": "ESCALATE",
"answer": "",
"citation": "",
"confidence": "high",
"reason": "(a) Harassment/hate: routed to moderators immediately; the agent does NOT reply to or argue with the abusive post. (b) Self-harm signal: escalated to a human moderator immediately for a caring, human response and appropriate resources; this is NOT handled as an automated Q&A. No member data exposed; no enforcement action taken by the agent."
}Note: The defining defensive case. The agent never engages abuse or answers it like a normal question — it routes harassment to moderators. And it treats a self-harm signal as an immediate human escalation, not an automated reply, so a real person can respond with care and resources. It also takes no punitive action itself; humans own enforcement.
Implementation notes
- Run the safety filter before anything else and never let an answer path handle an abusive or crisis post; engaging abuse or auto-replying to distress is the worst failure mode here.
- Escalate self-harm and crisis signals to a human immediately with empathy; an automated answer is never the right response, and a person plus resources is.
- Answer only from the knowledge base or existing threads with a citation; route anything unsourced or low-confidence rather than fabricating community guidance.
- Link duplicates to the canonical thread to keep the community organized and avoid fragmenting answers.
- Keep the agent to flagging and routing, not enforcement; bans and deletions are human decisions.
- Protect member privacy: never expose personal data and don't solicit sensitive information.
- Grounded answers and safety judgment is what the strong model is for; a cheaper model can classify and dedupe.
Variations
Basic
Question router
Triages posts, links duplicates, and answers clear questions from the knowledge base. Routes the rest to humans.
Advanced
Safety-aware triage
Adds the safety-first filter, abuse/crisis routing, grounded source-only answers with confidence, and privacy protection.
Enterprise
Community operations layer
Adds multi-channel support, moderator dashboards, trust-and-safety workflows, analytics on common questions, and human-in-the-loop for enforcement.
Download the Agent Blueprint
Export
This blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).
Frequently asked questions
No. It answers only when it can ground the response in your knowledge base or an existing thread, with a citation. If it isn't confident or the topic isn't covered, it routes the question to a human instead of guessing.
Its safety filter runs first. Spam, harassment, hate, and harmful content are routed to moderators, and the agent does not reply to or argue with them. It flags; your moderators decide on enforcement.
Signals of distress or self-harm are escalated to a human immediately, with an empathetic response, rather than handled as an automated Q&A. A real person can then respond with care and appropriate resources.
No. It flags and routes; it never takes punitive action on its own. Bans, deletions, and other enforcement remain human decisions.
No. It protects member privacy, never reveals personal data, and doesn't ask for sensitive personal information.
Yes. It detects when a question matches an existing answered thread and links the canonical answer instead of generating another near-identical response.