AgentKits

Community Question Triage Agent

Production Blueprint
0New

Includes Agent Blueprint + Implementation Guide

An agent that keeps a busy community manageable: it triages incoming questions (forum, Discord, community support), answers the clear ones from your knowledge base, links duplicates to existing threads, and routes everything it shouldn't handle to a human. It is built defensively: it never fabricates an answer (it answers only when confident from sourced material, otherwise it routes), it sends spam, harassment, and abuse to moderators instead of engaging, it treats safety signals such as self-harm with care and escalates them to people, and it protects members' personal information.

communitymoderationsupporttriageknowledge-baseautonomous-agenttrust-and-safetyforumagentazagent-governancetrust-levelproduction-readiness
StackClaude, LangGraph, OpenAI
DifficultyAdvanced
Setup45 min
Version2.0.0 · 2026-06-21

Overview

Triages incoming community questions: categorizes, answers the clear ones, routes the rest.

Answers only from your knowledge base with a citation, and links duplicates to existing threads.

Sends spam, harassment, and abuse to moderators instead of engaging with them.

Defensive: never fabricates answers, escalates safety signals to humans, and protects members' personal data.

AgentAz™ specification

A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.

Trust Level ?A2 — Recommend
DNA PatternEscalation (Research → Evaluate → Plan → Escalate)
Worst-Case ActionMisroutes a community question or drafts a wrong suggested answer, caught before a human posts. It cannot post, send, or moderate autonomously — those tools are absent from its registry.
Authority BoundaryReads community questions, classifies and prioritizes them, drafts a suggested answer from known resources, and routes or escalates. A human posts and moderates. It never posts, bans, or sends on its own.
Verification TestAttempt to call a post, send, or moderate tool → confirm it is absent from the agent's registry.
Production Readiness6/6 dimensions passing. Tool isolation: post/moderate tools absent. Human gates: a human posts. Confidence escalation: sensitive or uncertain questions escalated. Cost ceiling: bounded per question. Audit trail: classification and drafts logged. Escalation path: sensitive topics routed to a moderator.
Last Reviewed2026-06-24

Machine-readable contract (agentaz.json), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:

agentaz.json
{
  "$schema": "./agentaz.schema.json",
  "version": "2.0.0",
  "last_reviewed": "2026-06-24",
  "agent_id": "community-triage-agent",
  "trust_level": "A2",
  "dna_pattern": "Escalation",
  "worst_case_action": "Misroutes a question or drafts a wrong answer, caught before post. Cannot post or moderate.",
  "authority_boundary": "Triages and drafts community answers; post/moderate tools absent.",
  "tags": [
    "community",
    "triage",
    "read-only",
    "human-review"
  ],
  "tool_boundary": {
    "allowed_tools": [
      "read_question",
      "classify",
      "draft_answer",
      "route"
    ],
    "execution_tools_absent": true
  },
  "output_boundary": {
    "format": "structured_json",
    "never_emits": [
      "post",
      "send",
      "moderate"
    ]
  },
  "cost_boundary": {
    "max_usd_per_trace_loop": 0.18,
    "alert_threshold_usd": 0.12
  },
  "loop_boundary": {
    "max_reasoning_turns": 6
  },
  "human_handoff": {
    "triggers": [
      "sensitive_topic",
      "low_confidence"
    ],
    "destination": "moderator"
  },
  "audit": {
    "append_only": true,
    "logs": [
      "classification",
      "drafts"
    ]
  }
}

New to this? Read the AgentAz specification guide — Trust Levels, DNA patterns, and how it complements your runtime.

AgentAz™ is open source under Apache-2.0 — schema (frozen v1.0.0) and source on GitHub.

Governance matrix

A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.

Agent goalBounded by the authority spec above
Trust LevelA2 — Recommend
Tool accessLeast privilege — execution tools absent (read-only)
Context handlingGrounded in provided inputs; cites or flags rather than guessing
Memory strategyTask-scoped; no persistent cross-session memory
Human approvalRequired on sensitive topic, low confidence → moderator
Audit trailAppend-only log (classification, drafts)
Cost & loop bounds≤ $0.18 per loop · ≤ 6 reasoning turns
Recovery / escalationEscalates to moderator

Agent component mapping

A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.

AgentPrimary reasoner — Recommend authority (A2)
Toolsread question, classify, draft answer, route — execution tools absent (read-only)
MemoryTask-scoped working context; no persistent cross-session memory
GuardrailsWorst-case classified (A2); no execution tools; ≤ $0.18/loop · ≤ 6 turns
EvaluatorConfidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned
HandoffEscalates to moderator on sensitive topic, low confidence

Failure modes

Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.

Drafts a wrong or misleading answer that, if posted, spreads misinformation.

Detection
Answers are drawn from known resources and uncited claims are flagged.
Mitigation
It drafts only — a human posts; there is no autonomous posting.
Recovery
The moderator edits or discards the draft before posting.

Fails to escalate a sensitive or harmful question, such as a safety or harassment issue.

Detection
Sensitive-topic detection triggers escalation.
Mitigation
Sensitive questions are routed to a moderator, never auto-answered.
Recovery
A human handles it from the escalation queue.

Misroutes a question to the wrong category or expert.

Detection
Routing confidence is scored; low confidence goes to a default queue.
Mitigation
Routing is reversible.
Recovery
It is re-routed with the correction logged.

Evaluation

Draft-answer correctness and sensitive-question escalation are primary — a wrong posted answer spreads misinformation and a missed harmful question is a safety gap.

Answer groundednessShare of drafted answers supported by known resources, with no fabrication.
Routing accuracyShare of questions routed to the correct category or expert.
Sensitive-question recallOf sensitive or harmful questions, the share escalated.
Draft acceptance rateShare of drafts a moderator posts with little change.
LatencyTime to a triaged, drafted question.

Recommended approach. Label community questions with correct categories and known-good answers; measure groundedness and routing accuracy, and include sensitive cases to test escalation. Verify nothing posts autonomously.

When to use

Use it when

  • Your community gets more questions than your team can triage by hand.
  • You have a knowledge base or answered threads the agent can ground answers in.
  • You want spam, abuse, and safety issues routed to moderators automatically.
  • You want fast, sourced answers for common questions and humans for the rest.

Avoid it when

  • You want it to answer everything, even without a reliable source — it routes instead.
  • You have no knowledge base for it to ground answers in.
  • You can't provide human moderators for escalations.
  • You need it to make moderation/ban decisions autonomously (it flags; humans decide).

System prompt

system-prompt.md
You are a Community Question Triage Agent for an online community. You triage incoming posts: answer clear questions from the knowledge base, link duplicates, and route everything else to humans. You are judged on helpful, accurate triage and on never fabricating answers, engaging with abuse, or mishandling a safety issue.

== CORE PRINCIPLES ==
1. Source or route. Answer only when you can ground it in the knowledge base or an existing answered thread, with a citation. If you are not confident or it is not covered, route to a human rather than guessing.
2. Safety first. Detect spam, harassment, hate, and harmful content. Do not engage with or answer abusive posts. Route them to moderators. Treat distress or self-harm signals with care and escalate to a person immediately.
3. Respectful and fair. Be warm and neutral. Don't take sides in disputes, don't shame anyone, and don't expose members' personal information.

== HARD RULES (NON-NEGOTIABLE) ==
- NO FABRICATION: Never invent an answer, feature, policy, or fact. Unsourced or low-confidence = route to a human.
- SAFETY ROUTING: Spam, harassment, hate, threats, and clearly harmful content go to moderators; do not reply to them as if normal questions. For self-harm or crisis signals, escalate to a human immediately and respond with empathy, not with an automated answer.
- NO MODERATION ACTIONS: You flag and route; you do not ban, delete, or take punitive action. Humans decide enforcement.
- PRIVACY: Never reveal a member's personal data, and don't ask for sensitive personal information.
- NEUTRALITY: Stay neutral in conflicts and avoid bias; don't escalate arguments.

== METHOD ==
- Read the post. Run a safety check first. If safe, classify the topic, check for duplicates, and search the knowledge base. Draft a sourced answer only if confident; otherwise route. Always include why.

== OUTPUT FORMAT (return ONE JSON object) ==
{
  "post_summary": "<short, neutral>",
  "safety": { "flag": "none|spam|harassment|hate|harmful|self_harm", "action": "proceed|route_moderation|escalate_human" },
  "category": "<topic/area>",
  "duplicate_of": "<existing thread/answer link, or empty>",
  "decision": "ANSWER|LINK_DUPLICATE|ROUTE_HUMAN|ROUTE_MODERATION|ESCALATE",
  "answer": "<sourced answer, or empty>",
  "citation": "<KB article / thread, or empty>",
  "confidence": "high|medium|low",
  "reason": "<why this decision>"
}
Never answer an unsafe post. Never answer without a source. Route when unsure.
Was this useful?

Simulate run

Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.

Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run.

Setup guide

Install and connect the community

Install the agent and connect your community platform and knowledge base.

shell
pipx install community-triage-agent
community-triage-agent connect --platform discourse --kb ./help-center
community-triage-agent doctor

Configure safety & grounding

Safety-first routing and source-only answers are enforced here.

shell
cp .env.example .env
ANTHROPIC_API_KEY=sk-ant-...
ANSWER_FROM_KB_ONLY=true
SAFETY_FIRST=true
ESCALATE: [harassment, hate, harmful, self_harm]

Set routing & moderator targets

Define where things go.

shell
# triage.yml
auto_answer_confidence: high
route: { billing: '#team-billing', bug: '#team-eng', default: '#community-mods' }
moderation_channel: '#mods'
crisis_escalation: human_immediate

Dry-run on past posts

Replay historical posts to check safety routing and answer grounding.

shell
community-triage-agent backtest --range 30d --explain
# reports answer accuracy, fabrication rate (must be 0), safety routing precision

Wire into the queue

Route new posts through triage; abuse and safety go to humans.

shell
# new post -> safety -> answer/link/route; abuse -> mods; crisis -> human now

Architecture

Tools required

get_postRetrieve the incoming community post and its context.
safety_filterDetect spam, harassment, hate, harmful content, and distress before any answer is considered.
classify_topicCategorize a safe post by area for matching and routing.
detect_duplicateFind an existing answered thread the post duplicates.
kb_searchSearch the knowledge base and prior answers for a grounded response.
draft_answerCompose a sourced answer with a citation when confidence is high.
route_to_humanRoute uncovered or low-confidence questions to the right human team.
escalate_moderationSend abuse to moderators and crisis/self-harm signals to a person immediately.

Workflow

  1. 1. Take the post

    Receive the incoming community post and its context.

  2. 2. Safety check first

    Detect spam, abuse, harmful content, or distress before considering any answer.

  3. 3. Classify the topic

    Categorize safe posts by area for matching and routing.

  4. 4. Check for duplicates

    Link to an existing answered thread instead of re-answering.

  5. 5. Search the knowledge base

    Look for a grounded answer with a citation.

  6. 6. Answer or route

    Draft a sourced answer only if confident; otherwise route to a human with the reason.

  7. 7. Escalate where needed

    Send abuse to moderators and safety signals to a person, with care.

Examples

Clear FAQ-type question → sourced answer from the knowledge base

A member asks a common how-to that the knowledge base covers. The agent answers with a citation.

Input

input
Member: 'How do I reset my API key?'

Output

output
{
  "post_summary": "Member asking how to reset their API key.",
  "safety": { "flag": "none", "action": "proceed" },
  "category": "account/api",
  "duplicate_of": "",
  "decision": "ANSWER",
  "answer": "You can reset your API key under Settings -> API -> Regenerate. Your old key stops working immediately, so update it anywhere it's in use. Full steps here: [KB link].",
  "citation": "Help Center: 'Managing API keys'",
  "confidence": "high",
  "reason": "Common question fully covered by a knowledge-base article; high confidence, grounded answer."
}

Note: A grounded, cited answer for a question the KB clearly covers. This is the routine volume the agent should handle, freeing moderators for the harder cases.

Duplicate question → link the existing thread, don't re-answer

The question repeats one already answered. The agent links the answer instead of generating a fresh one.

Input

input
Member: 'Is there a dark mode?' — already answered in a popular existing thread.

Output

output
{
  "post_summary": "Member asking whether a dark mode exists.",
  "safety": { "flag": "none", "action": "proceed" },
  "category": "feature-question",
  "duplicate_of": "thread #4821 'Dark mode availability'",
  "decision": "LINK_DUPLICATE",
  "answer": "Good question! This was answered here: [thread #4821]. Short version from that thread — dark mode is available in Settings -> Appearance.",
  "citation": "Community thread #4821",
  "confidence": "high",
  "reason": "Matches an existing answered thread; linking keeps the community tidy and points to the canonical answer."
}

Note: Rather than spawn another near-identical answer, the agent recognizes the duplicate, links the canonical thread, and gives the short version. This keeps the community organized and surfaces the best existing discussion.

Abusive post and a separate distress signal → don't engage, route to people

Two unsafe cases the agent must never auto-answer: harassment, and a self-harm signal. It routes both to humans appropriately.

Input

input
(a) A post hurling insults and slurs at another member. (b) A post that says the person feels hopeless and wants to hurt themselves.

Output

output
{
  "post_summary": "Two unsafe posts: (a) abusive/harassing content toward a member; (b) a member expressing distress and self-harm intent.",
  "safety": { "flag": "harassment", "action": "route_moderation" },
  "category": "trust-and-safety",
  "duplicate_of": "",
  "decision": "ESCALATE",
  "answer": "",
  "citation": "",
  "confidence": "high",
  "reason": "(a) Harassment/hate: routed to moderators immediately; the agent does NOT reply to or argue with the abusive post. (b) Self-harm signal: escalated to a human moderator immediately for a caring, human response and appropriate resources; this is NOT handled as an automated Q&A. No member data exposed; no enforcement action taken by the agent."
}

Note: The defining defensive case. The agent never engages abuse or answers it like a normal question — it routes harassment to moderators. And it treats a self-harm signal as an immediate human escalation, not an automated reply, so a real person can respond with care and resources. It also takes no punitive action itself; humans own enforcement.

Implementation notes

  • Run the safety filter before anything else and never let an answer path handle an abusive or crisis post; engaging abuse or auto-replying to distress is the worst failure mode here.
  • Escalate self-harm and crisis signals to a human immediately with empathy; an automated answer is never the right response, and a person plus resources is.
  • Answer only from the knowledge base or existing threads with a citation; route anything unsourced or low-confidence rather than fabricating community guidance.
  • Link duplicates to the canonical thread to keep the community organized and avoid fragmenting answers.
  • Keep the agent to flagging and routing, not enforcement; bans and deletions are human decisions.
  • Protect member privacy: never expose personal data and don't solicit sensitive information.
  • Grounded answers and safety judgment is what the strong model is for; a cheaper model can classify and dedupe.

Variations

Basic

Question router

Triages posts, links duplicates, and answers clear questions from the knowledge base. Routes the rest to humans.

Advanced

Safety-aware triage

Adds the safety-first filter, abuse/crisis routing, grounded source-only answers with confidence, and privacy protection.

Enterprise

Community operations layer

Adds multi-channel support, moderator dashboards, trust-and-safety workflows, analytics on common questions, and human-in-the-loop for enforcement.

Download the Agent Blueprint

The complete blueprint, zipped — including a runnable run.py you can execute with one API key (Anthropic or OpenAI).

Download Blueprint (.zip)
README.mdsystem-prompt.mdsetup-guide.mdtools.jsonworkflow.mdexamples.md.env.examplekit.jsonrun.pyLICENSENOTICEstarters/

Export

Generate a starter for your stack — all client-side, nothing leaves your browser.

ZIP

Starters use mock tools — swap in your integrations to deploy.

View the source on GitHub

This blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).

Frequently asked questions