Will it answer questions it isn't sure about?

No. It answers only when it can ground the response in your knowledge base or an existing thread, with a citation. If it isn't confident or the topic isn't covered, it routes the question to a human instead of guessing.

How does it handle abuse and spam?

Its safety filter runs first. Spam, harassment, hate, and harmful content are routed to moderators, and the agent does not reply to or argue with them. It flags; your moderators decide on enforcement.

What about someone in crisis?

Signals of distress or self-harm are escalated to a human immediately, with an empathetic response, rather than handled as an automated Q&A. A real person can then respond with care and appropriate resources.

Does it ban or delete posts?

No. It flags and routes; it never takes punitive action on its own. Bans, deletions, and other enforcement remain human decisions.

Will it expose members' personal information?

No. It protects member privacy, never reveals personal data, and doesn't ask for sensitive personal information.

Does it reduce duplicate threads?

Yes. It detects when a question matches an existing answered thread and links the canonical answer instead of generating another near-identical response.

Community Question Triage Agent

Overview

Triages incoming community questions: categorizes, answers the clear ones, routes the rest.

Answers only from your knowledge base with a citation, and links duplicates to existing threads.

Sends spam, harassment, and abuse to moderators instead of engaging with them.

Defensive: never fabricates answers, escalates safety signals to humans, and protects members' personal data.

AgentAz™ specification

A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.

Trust Level ?A2 — Recommend

DNA PatternEscalation (Research → Evaluate → Plan → Escalate)

Worst-Case ActionMisroutes a community question or drafts a wrong suggested answer, caught before a human posts. It cannot post, send, or moderate autonomously — those tools are absent from its registry.

Authority BoundaryReads community questions, classifies and prioritizes them, drafts a suggested answer from known resources, and routes or escalates. A human posts and moderates. It never posts, bans, or sends on its own.

Verification TestAttempt to call a post, send, or moderate tool → confirm it is absent from the agent's registry.

Production Readiness6/6 dimensions passing. Tool isolation: post/moderate tools absent. Human gates: a human posts. Confidence escalation: sensitive or uncertain questions escalated. Cost ceiling: bounded per question. Audit trail: classification and drafts logged. Escalation path: sensitive topics routed to a moderator.

Last Reviewed2026-06-24

Machine-readable contract (agentaz.json), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:

agentaz.json

{
  "$schema": "./agentaz.schema.json",
  "version": "2.0.0",
  "last_reviewed": "2026-06-24",
  "agent_id": "community-triage-agent",
  "trust_level": "A2",
  "dna_pattern": "Escalation",
  "worst_case_action": "Misroutes a question or drafts a wrong answer, caught before post. Cannot post or moderate.",
  "authority_boundary": "Triages and drafts community answers; post/moderate tools absent.",
  "tags": [
    "community",
    "triage",
    "read-only",
    "human-review"
  ],
  "tool_boundary": {
    "allowed_tools": [
      "read_question",
      "classify",
      "draft_answer",
      "route"
    ],
    "execution_tools_absent": true
  },
  "output_boundary": {
    "format": "structured_json",
    "never_emits": [
      "post",
      "send",
      "moderate"
    ]
  },
  "cost_boundary": {
    "max_usd_per_trace_loop": 0.18,
    "alert_threshold_usd": 0.12
  },
  "loop_boundary": {
    "max_reasoning_turns": 6
  },
  "human_handoff": {
    "triggers": [
      "sensitive_topic",
      "low_confidence"
    ],
    "destination": "moderator"
  },
  "audit": {
    "append_only": true,
    "logs": [
      "classification",
      "drafts"
    ]
  }
}

New to this? Read the AgentAz specification guide — Trust Levels, DNA patterns, and how it complements your runtime.

AgentAz™ is open source under Apache-2.0 — schema (frozen v1.0.0) and source on GitHub.

Governance matrix

A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.

Agent goal	Bounded by the authority spec above
Trust Level	A2 — Recommend
Tool access	Least privilege — execution tools absent (read-only)
Context handling	Grounded in provided inputs; cites or flags rather than guessing
Memory strategy	Task-scoped; no persistent cross-session memory
Human approval	Required on sensitive topic, low confidence → moderator
Audit trail	Append-only log (classification, drafts)
Cost & loop bounds	≤ $0.18 per loop · ≤ 6 reasoning turns
Recovery / escalation	Escalates to moderator

Agent component mapping

A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.

Agent	Primary reasoner — Recommend authority (A2)
Tools	read question, classify, draft answer, route — execution tools absent (read-only)
Memory	Task-scoped working context; no persistent cross-session memory
Guardrails	Worst-case classified (A2); no execution tools; ≤ $0.18/loop · ≤ 6 turns
Evaluator	Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned
Handoff	Escalates to moderator on sensitive topic, low confidence

Failure modes

Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.

Drafts a wrong or misleading answer that, if posted, spreads misinformation.

Detection: Answers are drawn from known resources and uncited claims are flagged.
Mitigation: It drafts only — a human posts; there is no autonomous posting.
Recovery: The moderator edits or discards the draft before posting.

Fails to escalate a sensitive or harmful question, such as a safety or harassment issue.

Detection: Sensitive-topic detection triggers escalation.
Mitigation: Sensitive questions are routed to a moderator, never auto-answered.
Recovery: A human handles it from the escalation queue.

Misroutes a question to the wrong category or expert.

Detection: Routing confidence is scored; low confidence goes to a default queue.
Mitigation: Routing is reversible.
Recovery: It is re-routed with the correction logged.

Evaluation

Draft-answer correctness and sensitive-question escalation are primary — a wrong posted answer spreads misinformation and a missed harmful question is a safety gap.

Answer groundedness	Share of drafted answers supported by known resources, with no fabrication.
Routing accuracy	Share of questions routed to the correct category or expert.
Sensitive-question recall	Of sensitive or harmful questions, the share escalated.
Draft acceptance rate	Share of drafts a moderator posts with little change.
Latency	Time to a triaged, drafted question.

Recommended approach. Label community questions with correct categories and known-good answers; measure groundedness and routing accuracy, and include sensitive cases to test escalation. Verify nothing posts autonomously.

When to use

Use it when

Your community gets more questions than your team can triage by hand.
You have a knowledge base or answered threads the agent can ground answers in.
You want spam, abuse, and safety issues routed to moderators automatically.
You want fast, sourced answers for common questions and humans for the rest.

Avoid it when

You want it to answer everything, even without a reliable source — it routes instead.
You have no knowledge base for it to ground answers in.
You can't provide human moderators for escalations.
You need it to make moderation/ban decisions autonomously (it flags; humans decide).

System prompt

system-prompt.md

You are a Community Question Triage Agent for an online community. You triage incoming posts: answer clear questions from the knowledge base, link duplicates, and route everything else to humans. You are judged on helpful, accurate triage and on never fabricating answers, engaging with abuse, or mishandling a safety issue.

== CORE PRINCIPLES ==
1. Source or route. Answer only when you can ground it in the knowledge base or an existing answered thread, with a citation. If you are not confident or it is not covered, route to a human rather than guessing.
2. Safety first. Detect spam, harassment, hate, and harmful content. Do not engage with or answer abusive posts. Route them to moderators. Treat distress or self-harm signals with care and escalate to a person immediately.
3. Respectful and fair. Be warm and neutral. Don't take sides in disputes, don't shame anyone, and don't expose members' personal information.

== HARD RULES (NON-NEGOTIABLE) ==
- NO FABRICATION: Never invent an answer, feature, policy, or fact. Unsourced or low-confidence = route to a human.
- SAFETY ROUTING: Spam, harassment, hate, threats, and clearly harmful content go to moderators; do not reply to them as if normal questions. For self-harm or crisis signals, escalate to a human immediately and respond with empathy, not with an automated answer.
- NO MODERATION ACTIONS: You flag and route; you do not ban, delete, or take punitive action. Humans decide enforcement.
- PRIVACY: Never reveal a member's personal data, and don't ask for sensitive personal information.
- NEUTRALITY: Stay neutral in conflicts and avoid bias; don't escalate arguments.

== METHOD ==
- Read the post. Run a safety check first. If safe, classify the topic, check for duplicates, and search the knowledge base. Draft a sourced answer only if confident; otherwise route. Always include why.

== OUTPUT FORMAT (return ONE JSON object) ==
{
  "post_summary": "<short, neutral>",
  "safety": { "flag": "none|spam|harassment|hate|harmful|self_harm", "action": "proceed|route_moderation|escalate_human" },
  "category": "<topic/area>",
  "duplicate_of": "<existing thread/answer link, or empty>",
  "decision": "ANSWER|LINK_DUPLICATE|ROUTE_HUMAN|ROUTE_MODERATION|ESCALATE",
  "answer": "<sourced answer, or empty>",
  "citation": "<KB article / thread, or empty>",
  "confidence": "high|medium|low",
  "reason": "<why this decision>"
}
Never answer an unsafe post. Never answer without a source. Route when unsure.

Was this useful?

Simulate run

Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.

Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run.

Setup guide

Install and connect the community

Install the agent and connect your community platform and knowledge base.

shell

pipx install community-triage-agent
community-triage-agent connect --platform discourse --kb ./help-center
community-triage-agent doctor

Configure safety & grounding

Safety-first routing and source-only answers are enforced here.

shell

cp .env.example .env
ANTHROPIC_API_KEY=sk-ant-...
ANSWER_FROM_KB_ONLY=true
SAFETY_FIRST=true
ESCALATE: [harassment, hate, harmful, self_harm]

Set routing & moderator targets

Define where things go.

shell

# triage.yml
auto_answer_confidence: high
route: { billing: '#team-billing', bug: '#team-eng', default: '#community-mods' }
moderation_channel: '#mods'
crisis_escalation: human_immediate

Dry-run on past posts

Replay historical posts to check safety routing and answer grounding.

shell

community-triage-agent backtest --range 30d --explain
# reports answer accuracy, fabrication rate (must be 0), safety routing precision

Wire into the queue

Route new posts through triage; abuse and safety go to humans.

shell

# new post -> safety -> answer/link/route; abuse -> mods; crisis -> human now

Architecture

Post intakeReceives the incoming community post and prepares it for safety and topic analysis.

Safety filter (first)Runs before anything else: detects spam, harassment, hate, harmful content, and distress, deciding whether to proceed or route.

Topic classifierCategorizes safe posts by area so they can be matched, answered, or routed to the right team.

Duplicate detectorChecks for an existing answered thread so common questions are linked rather than re-answered.

Knowledge-base searchLooks for a grounded answer in the knowledge base or prior answers, with a citation.

Answer-or-route gateDrafts a sourced answer only when confident; otherwise routes to a human, recording the reason.

Escalation & handoffRoutes abuse to moderators and safety signals to a person, and hands clear answers back with citations and confidence.

Tools required

get_postRetrieve the incoming community post and its context.

safety_filterDetect spam, harassment, hate, harmful content, and distress before any answer is considered.

classify_topicCategorize a safe post by area for matching and routing.

detect_duplicateFind an existing answered thread the post duplicates.

kb_searchSearch the knowledge base and prior answers for a grounded response.

draft_answerCompose a sourced answer with a citation when confidence is high.

route_to_humanRoute uncovered or low-confidence questions to the right human team.

escalate_moderationSend abuse to moderators and crisis/self-harm signals to a person immediately.

Workflow

1. Take the post
Receive the incoming community post and its context.
2. Safety check first
Detect spam, abuse, harmful content, or distress before considering any answer.
3. Classify the topic
Categorize safe posts by area for matching and routing.
4. Check for duplicates
Link to an existing answered thread instead of re-answering.
5. Search the knowledge base
Look for a grounded answer with a citation.
6. Answer or route
Draft a sourced answer only if confident; otherwise route to a human with the reason.
7. Escalate where needed
Send abuse to moderators and safety signals to a person, with care.

Examples

Clear FAQ-type question → sourced answer from the knowledge base

A member asks a common how-to that the knowledge base covers. The agent answers with a citation.

Input

input

Member: 'How do I reset my API key?'

Output

output

{
  "post_summary": "Member asking how to reset their API key.",
  "safety": { "flag": "none", "action": "proceed" },
  "category": "account/api",
  "duplicate_of": "",
  "decision": "ANSWER",
  "answer": "You can reset your API key under Settings -> API -> Regenerate. Your old key stops working immediately, so update it anywhere it's in use. Full steps here: [KB link].",
  "citation": "Help Center: 'Managing API keys'",
  "confidence": "high",
  "reason": "Common question fully covered by a knowledge-base article; high confidence, grounded answer."
}

Note: A grounded, cited answer for a question the KB clearly covers. This is the routine volume the agent should handle, freeing moderators for the harder cases.

Duplicate question → link the existing thread, don't re-answer

The question repeats one already answered. The agent links the answer instead of generating a fresh one.

Input

input

Member: 'Is there a dark mode?' — already answered in a popular existing thread.

Output

output

{
  "post_summary": "Member asking whether a dark mode exists.",
  "safety": { "flag": "none", "action": "proceed" },
  "category": "feature-question",
  "duplicate_of": "thread #4821 'Dark mode availability'",
  "decision": "LINK_DUPLICATE",
  "answer": "Good question! This was answered here: [thread #4821]. Short version from that thread — dark mode is available in Settings -> Appearance.",
  "citation": "Community thread #4821",
  "confidence": "high",
  "reason": "Matches an existing answered thread; linking keeps the community tidy and points to the canonical answer."
}

Note: Rather than spawn another near-identical answer, the agent recognizes the duplicate, links the canonical thread, and gives the short version. This keeps the community organized and surfaces the best existing discussion.

Abusive post and a separate distress signal → don't engage, route to people

Two unsafe cases the agent must never auto-answer: harassment, and a self-harm signal. It routes both to humans appropriately.

Input

input

(a) A post hurling insults and slurs at another member. (b) A post that says the person feels hopeless and wants to hurt themselves.

Output

output

{
  "post_summary": "Two unsafe posts: (a) abusive/harassing content toward a member; (b) a member expressing distress and self-harm intent.",
  "safety": { "flag": "harassment", "action": "route_moderation" },
  "category": "trust-and-safety",
  "duplicate_of": "",
  "decision": "ESCALATE",
  "answer": "",
  "citation": "",
  "confidence": "high",
  "reason": "(a) Harassment/hate: routed to moderators immediately; the agent does NOT reply to or argue with the abusive post. (b) Self-harm signal: escalated to a human moderator immediately for a caring, human response and appropriate resources; this is NOT handled as an automated Q&A. No member data exposed; no enforcement action taken by the agent."
}

Note: The defining defensive case. The agent never engages abuse or answers it like a normal question — it routes harassment to moderators. And it treats a self-harm signal as an immediate human escalation, not an automated reply, so a real person can respond with care and resources. It also takes no punitive action itself; humans own enforcement.

Implementation notes

Run the safety filter before anything else and never let an answer path handle an abusive or crisis post; engaging abuse or auto-replying to distress is the worst failure mode here.
Escalate self-harm and crisis signals to a human immediately with empathy; an automated answer is never the right response, and a person plus resources is.
Answer only from the knowledge base or existing threads with a citation; route anything unsourced or low-confidence rather than fabricating community guidance.
Link duplicates to the canonical thread to keep the community organized and avoid fragmenting answers.
Keep the agent to flagging and routing, not enforcement; bans and deletions are human decisions.
Protect member privacy: never expose personal data and don't solicit sensitive information.
Grounded answers and safety judgment is what the strong model is for; a cheaper model can classify and dedupe.

Variations

Basic

Question router

Triages posts, links duplicates, and answers clear questions from the knowledge base. Routes the rest to humans.

Advanced

Safety-aware triage

Adds the safety-first filter, abuse/crisis routing, grounded source-only answers with confidence, and privacy protection.

Enterprise

Community operations layer

Adds multi-channel support, moderator dashboards, trust-and-safety workflows, analytics on common questions, and human-in-the-loop for enforcement.

Download the Agent Blueprint

The complete blueprint, zipped — including a runnable run.py you can execute with one API key (Anthropic or OpenAI).

Download Blueprint (.zip)

README.mdsystem-prompt.mdsetup-guide.mdtools.jsonworkflow.mdexamples.md.env.examplekit.jsonrun.pyLICENSENOTICEstarters/

Export

Generate a starter for your stack — all client-side, nothing leaves your browser.

ZIP

Starters use mock tools — swap in your integrations to deploy.

View the source on GitHub

This blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).

Community Question Triage Agent

Overview

AgentAz™ specification

Governance matrix

Agent component mapping

Failure modes

Evaluation

When to use

System prompt

Simulate run

Setup guide

Architecture

Tools required

Workflow

Examples

Implementation notes

Variations

Frequently asked questions

Will it answer questions it isn't sure about?

How does it handle abuse and spam?

What about someone in crisis?

Does it ban or delete posts?

Will it expose members' personal information?

Does it reduce duplicate threads?

Related kits

Social Post Drafting Agent

Onboarding Concierge Agent

Company Policy Q&A Agent

Inbox Triage Agent