Will it make up themes or quotes?

No — that's its core constraint. Every theme must be grounded in cited source items, and quotes must be real excerpts from the corpus. If it can't cite something, it doesn't claim it.

How does it avoid over-weighting a loud minority?

It reports honest frequency (count over total) for every theme and flags small-sample or ambiguous themes as low-confidence, so a handful of emphatic items can't masquerade as a top issue.

Can it tell me why a metric like churn changed?

Only what users actually said. It won't claim feedback caused a metric change unless the feedback states it — those causal questions need cohort analysis, and the agent will say so rather than invent a story.

Does it decide what to build?

No. It produces a faithful, evidence-graded synthesis for prioritization; the product team makes the roadmap decisions.

How does it handle user privacy?

It treats the corpus as sensitive, quotes content rather than personal data, and redacts identifiers so the synthesis doesn't leak PII.

What about themes with very little data?

It surfaces them as low-confidence leads with their small frequency rather than dropping them silently or inflating them, so genuinely narrow issues stay visible without distorting priorities.

User Feedback Synthesis Agent

Overview

Clusters raw feedback (tickets, reviews, surveys) into clear themes with representative quotes.

Reports honest frequency and sample size, so a theme's weight is visible — not just its existence.

Separates real recurring signal from vocal-minority or small-sample noise, and flags low-confidence themes.

Defensive: every theme is grounded in cited source items; it never fabricates sentiment, frequency, or causation.

AgentAz™ specification

A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.

Trust Level ?A2 — Recommend

DNA PatternSynthesis (Extract → Synthesize → Verify)

Worst-Case ActionProduces an inaccurate theme or miscounts feedback, surfaced for a PM to review. It cannot create tickets, change a roadmap, or take any action — execution tools are absent from its registry.

Authority BoundaryReads user feedback across sources, clusters it into themes with representative quotes, and surfaces a synthesis for review. It never creates tickets, edits a roadmap, or fabricates feedback. A PM decides what to act on.

Verification TestAttempt to call a create-ticket or roadmap-write tool → confirm it is absent; confirm themes cite real feedback rather than inventing it.

Production Readiness6/6 dimensions passing. Tool isolation: ticket/roadmap tools absent. Human gates: a PM decides. Confidence escalation: thin or conflicting themes flagged. Cost ceiling: bounded per batch. Audit trail: themes and source counts logged. Escalation path: ambiguous signal flagged.

Last Reviewed2026-06-24

Machine-readable contract (agentaz.json), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:

agentaz.json

{
  "$schema": "./agentaz.schema.json",
  "version": "2.0.0",
  "last_reviewed": "2026-06-24",
  "agent_id": "feedback-synthesis-agent",
  "trust_level": "A2",
  "dna_pattern": "Synthesis",
  "worst_case_action": "Produces an inaccurate theme for PM review. Cannot create tickets or change a roadmap.",
  "authority_boundary": "Clusters feedback into cited themes; ticket/roadmap tools absent; no fabricated feedback.",
  "tags": [
    "product-management",
    "feedback",
    "synthesis",
    "read-only",
    "human-review"
  ],
  "tool_boundary": {
    "allowed_tools": [
      "read_feedback",
      "cluster",
      "extract_quotes",
      "count_signal"
    ],
    "execution_tools_absent": true
  },
  "output_boundary": {
    "format": "structured_json",
    "never_emits": [
      "create_ticket",
      "roadmap_write"
    ],
    "never_fabricates": true
  },
  "cost_boundary": {
    "max_usd_per_trace_loop": 0.25,
    "alert_threshold_usd": 0.16
  },
  "loop_boundary": {
    "max_reasoning_turns": 8
  },
  "human_handoff": {
    "triggers": [
      "thin_theme",
      "conflicting_signal"
    ],
    "destination": "product_manager"
  },
  "audit": {
    "append_only": true,
    "logs": [
      "themes",
      "source_counts"
    ]
  }
}

New to this? Read the AgentAz specification guide — Trust Levels, DNA patterns, and how it complements your runtime.

AgentAz™ is open source under Apache-2.0 — schema (frozen v1.0.0) and source on GitHub.

Governance matrix

A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.

Agent goal	Bounded by the authority spec above
Trust Level	A2 — Recommend
Tool access	Least privilege — execution tools absent (read-only)
Context handling	Grounded in provided inputs; cites or flags rather than guessing
Memory strategy	Task-scoped; no persistent cross-session memory
Human approval	Required on thin theme, conflicting signal → product manager
Audit trail	Append-only log (themes, source counts)
Cost & loop bounds	≤ $0.25 per loop · ≤ 8 reasoning turns
Recovery / escalation	Escalates to product manager

Agent component mapping

A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.

Agent	Primary reasoner — Recommend authority (A2)
Tools	read feedback, cluster, extract quotes, count signal — execution tools absent (read-only)
Memory	Task-scoped working context; no persistent cross-session memory
Guardrails	Worst-case classified (A2); no execution tools; ≤ $0.25/loop · ≤ 8 turns
Evaluator	Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned
Handoff	Escalates to product manager on thin theme, conflicting signal

Failure modes

Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.

Produces a theme not actually supported by the feedback.

Detection: Themes cite representative quotes and unsupported themes are flagged.
Mitigation: It clusters real feedback, never fabricates, and has no ticket or roadmap tools.
Recovery: The PM discards the weak theme.

Miscounts signal, over- or under-stating a theme's prevalence.

Detection: Source counts accompany each theme.
Mitigation: Counts are shown, not asserted impressions.
Recovery: The PM recounts.

Drops a small but important signal.

Detection: Thin or conflicting themes are flagged, not dropped.
Mitigation: It surfaces uncertainty.
Recovery: The PM reviews the raw feedback.

Evaluation

Theme validity supported by real feedback is primary — an unsupported theme or miscounted prevalence is the failure.

Theme validity	Share of themes supported by representative source quotes.
Prevalence accuracy	Agreement of stated theme counts with the actual feedback.
Fabrication rate	Frequency of themes not present in the feedback — should be near zero.
Signal recall	Of small-but-important signals, the share retained.
Latency	Time to synthesize a feedback set.

Recommended approach. Use feedback sets with human-labeled themes and counts; measure theme validity — every theme must cite real quotes — and prevalence accuracy. It has no ticket or roadmap tools.

When to use

Use it when

You have more user feedback than anyone can read and need it distilled into themes.
You want representative quotes and honest frequencies to support prioritization, not vibes.
You want to distinguish a widespread issue from a loud handful before you act on it.
You're preparing a voice-of-customer summary, a roadmap input, or a release retro.

Avoid it when

You want it to declare priorities or make roadmap decisions — it synthesizes evidence, humans decide.
Your feedback is too sparse to support themes (it should flag that, not invent themes).
You need verified causal claims about why metrics moved — that needs analysis beyond feedback text.
You can't let it handle user text with appropriate PII care.

System prompt

system-prompt.md

You are a User Feedback Synthesis Agent for a product team. You turn a corpus of raw user feedback into themes with evidence and honest frequency. You are judged on faithful, useful synthesis — and on never fabricating a theme, a number, or a sentiment the data doesn't support.

== CORE PRINCIPLES ==
1. Grounded in cited items. Every theme must be backed by specific source items (with representative quotes). If you can't cite it, you can't claim it.
2. Honest frequency and sample. Report how many items support each theme and out of how many total. Never inflate prevalence. A theme mentioned by 3 of 400 is a 3/400 theme, not a "top issue".
3. Signal vs. noise. Distinguish a widespread recurring theme from a vocal minority or a small sample. Flag low-confidence/small-N themes explicitly rather than elevating them.

== HARD RULES (NON-NEGOTIABLE) ==
- NO FABRICATION: Never invent feedback, quotes, sentiment, or counts. Quotes must be real excerpts from the corpus. If unsure, omit.
- NO FALSE CAUSATION: Do not claim feedback explains a metric change or that one theme "causes" another unless the feedback explicitly says so. Report what users said, not an inferred causal story.
- REPRESENT FAIRLY: Don't over-weight a few emphatic items. Surface dissent and counter-themes; note when sentiment is mixed.
- FLAG WEAK EVIDENCE: Mark themes built on a small sample or ambiguous wording as low-confidence.
- PII: Treat user identifiers as sensitive; quote content, not personal data; redact where present.

== METHOD ==
- Read and dedupe the corpus. Cluster items into coherent themes by the underlying issue/request, not surface words.
- For each theme: count supporting items, pull 1-3 representative real quotes, tag sentiment, and assess confidence from frequency and clarity.
- Surface counter-signal and note sample sizes. Rank by evidenced frequency, not by how loud individual items are.

== OUTPUT FORMAT (return ONE JSON object) ==
{
  "corpus": { "total_items": <n>, "deduped": <n>, "sources": ["tickets","reviews","survey"] },
  "themes": [
    {
      "theme": "<concise theme>",
      "count": <supporting items>,
      "frequency": "<count>/<total> (<pct>%)",
      "sentiment": "positive|negative|mixed",
      "confidence": "high|medium|low",
      "quotes": ["<real excerpt>", "..."],
      "note": "<caveat, e.g. small sample / mixed / vocal minority, or empty>"
    }
  ],
  "counter_signal": ["<dissent or contradicting feedback, if any>"],
  "not_captured": "<themes too sparse to assert, or empty>",
  "caveats": ["<sampling/representativeness limits>"]
}
Rank themes by evidenced frequency. Mark small-sample or ambiguous themes low-confidence. Never assert a theme you cannot cite.

Was this useful?

Simulate run

Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.

Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run.

Setup guide

Install and connect feedback sources

Install the agent and connect it to your feedback sources.

shell

pipx install feedback-synth-agent
feedback-synth-agent connect --sources zendesk,appstore,typeform
feedback-synth-agent doctor

Configure honesty guardrails

Thresholds for confidence and the no-fabrication posture.

shell

cp .env.example .env
ANTHROPIC_API_KEY=sk-ant-...
LOW_CONFIDENCE_BELOW_N=5
REQUIRE_REAL_QUOTES=true
REDACT_PII=true

Set scope and segments

Define the period and any segmentation for the synthesis.

shell

# scope.yml
period: last_30d
segment_by: [plan_tier]
min_theme_size: 3   # smaller clusters are reported as low-confidence, not dropped silently

Run a synthesis

Generate themes and review the cited frequencies and quotes.

shell

feedback-synth-agent run --period 30d --explain
# prints themes with count/frequency/quotes/confidence + counter-signal

Wire into your workflow

Schedule recurring synthesis and post to your PM channel or docs.

shell

# weekly job -> voice-of-customer summary to #product (read-only synthesis)

Architecture

Feedback ingestionPulls the feedback corpus from its sources (tickets, reviews, surveys, notes) and records the total volume and mix.

Dedupe & normalizeRemoves duplicates and near-duplicates and normalizes text so frequency counts reflect distinct feedback, not repeats.

Theme clusteringClusters items by the underlying issue or request rather than surface wording, forming coherent candidate themes.

Evidence & frequencyFor each theme, counts supporting items, extracts real representative quotes, and computes honest frequency against the total.

Confidence & signal checkAssesses confidence from sample size and clarity, separates widespread signal from vocal-minority noise, and surfaces counter-themes.

Synthesis outputProduces ranked themes with quotes, frequencies, sentiment, confidence, and caveats — plus what was too sparse to assert.

Review & exportHands a faithful, cited summary to the product team for prioritization, with PII kept out of the output.

Tools required

get_feedbackRetrieve the feedback corpus across sources (support tickets, reviews, survey responses, notes) for a period or segment.

dedupeRemove duplicate and near-duplicate items so frequency reflects distinct feedback.

cluster_themesGroup items into coherent themes by the underlying issue/request rather than surface words.

extract_quotesPull real, representative excerpts for each theme (content only, no personal data).

count_frequencyCount supporting items per theme and compute honest frequency against the corpus total.

sentiment_tagTag each theme's sentiment, including mixed, without overstating.

assess_confidenceRate theme confidence from sample size and wording clarity, flagging small-N or ambiguous themes.

redact_piiDetect and redact personal identifiers from quotes and output.

Workflow

1. Ingest the corpus
Pull feedback across sources and record total volume and the source mix.
2. Dedupe & normalize
Collapse duplicates so counts reflect distinct feedback.
3. Cluster into themes
Group items by the underlying issue or request, not surface wording.
4. Attach evidence
For each theme, count supporting items, compute honest frequency, and pull real representative quotes.
5. Assess confidence & signal
Rate confidence by sample size and clarity; separate widespread signal from vocal-minority noise; surface counter-themes.
6. Rank honestly
Order themes by evidenced frequency, mark low-confidence ones, and note what was too sparse to assert.
7. Deliver & redact
Output the cited synthesis with caveats, keeping user PII out of the result.

Examples

Clear recurring theme → reported with frequency and quotes

A strong, widespread theme appears across many items. The agent reports it with an honest count and representative quotes.

Input

input

Corpus: 400 items (tickets+reviews). 86 mention slow load times on the dashboard.

Output

output

{
  "corpus": { "total_items": 400, "deduped": 372, "sources": ["tickets","reviews"] },
  "themes": [
    {
      "theme": "Dashboard load times are slow, especially with large datasets",
      "count": 86,
      "frequency": "86/372 (23%)",
      "sentiment": "negative",
      "confidence": "high",
      "quotes": ["takes 15+ seconds to load my main dashboard every morning", "unusable once I have more than a few thousand rows"],
      "note": ""
    }
  ],
  "counter_signal": ["A few power users on the new infra reported fast loads — possible rollout difference"],
  "not_captured": "",
  "caveats": ["Tickets over-represent users who hit problems; absolute prevalence may differ from the full base"]
}

Note: A high-confidence theme: 86 of 372 deduped items (23%), with real representative quotes and a clear frequency. Note it still surfaces counter-signal (power users with fast loads) and a sampling caveat rather than overstating prevalence.

Small-sample signal → reported but flagged low-confidence

A potentially interesting issue appears only a few times. The agent reports it honestly without elevating it to a top issue.

Input

input

Corpus: 400 items. 3 mention a specific export-to-CSV encoding bug.

Output

output

{
  "corpus": { "total_items": 400, "deduped": 372, "sources": ["tickets","reviews"] },
  "themes": [
    {
      "theme": "CSV export has a character-encoding issue for non-Latin names",
      "count": 3,
      "frequency": "3/372 (0.8%)",
      "sentiment": "negative",
      "confidence": "low",
      "quotes": ["exported CSV shows garbled characters for my customers' names"],
      "note": "Small sample (3 items) — could be a real narrow bug or noise. Worth a quick check, not a top priority on this evidence alone."
    }
  ],
  "counter_signal": [],
  "not_captured": "Several one-off requests appeared once each and are not asserted as themes",
  "caveats": ["3 items is too few to estimate prevalence; treat as a lead to investigate, not a validated theme"]
}

Note: The honesty case: only 3 mentions. The agent still surfaces it (it could be a real narrow bug) but explicitly marks it low-confidence with its tiny frequency, and refuses to dress it up as a major theme. This is what stops a vocal-minority issue from hijacking a roadmap.

Tempting but unsupported causal claim → refuse to fabricate

The team hopes the feedback explains a churn spike. The feedback doesn't actually say that, so the agent declines to assert causation.

Input

input

Corpus: 400 items. Some negative comments about a recent redesign. Prompt context: 'churn went up last month — does the feedback explain it?'

Output

output

{
  "corpus": { "total_items": 400, "deduped": 372, "sources": ["tickets","reviews","survey"] },
  "themes": [
    {
      "theme": "Mixed reactions to the navigation redesign",
      "count": 41,
      "frequency": "41/372 (11%)",
      "sentiment": "mixed",
      "confidence": "medium",
      "quotes": ["the new nav took me a while to get used to", "actually prefer the new layout once I found things"],
      "note": "Sentiment is genuinely mixed — roughly half negative, half adapting/positive"
    }
  ],
  "counter_signal": ["Several users explicitly said they liked the redesign"],
  "not_captured": "",
  "caveats": ["This feedback does NOT establish that the redesign caused the churn increase. The corpus shows redesign reactions are mixed; no item links a cancellation to the redesign. A causal claim would require churn-cohort analysis, not feedback text alone."]
}

Note: The defining defensive case: there's a strong pull to hand the team the tidy story 'the redesign caused the churn.' The agent reports the redesign theme honestly as mixed, surfaces the positive counter-signal, and explicitly states the feedback can't establish causation — pointing to the analysis that actually could. It refuses to manufacture a satisfying but unsupported narrative.

Implementation notes

Require real, cited quotes for every theme and block fabricated ones; a synthesis you can't trace to source items is worse than no synthesis.
Always report frequency as count-over-total; bare theme lists hide whether something is widespread or a handful of loud voices.
Flag small-sample and ambiguous themes as low-confidence instead of dropping or inflating them — leads are fine, just labeled.
Forbid causal claims unless the feedback states them; route metric-causation questions to actual analysis, not vibes from text.
Surface counter-signal and mixed sentiment so the team sees dissent, not just the dominant narrative.
Keep PII out: quote content, redact identifiers, and treat the corpus as sensitive user data.
Spend the strong model on theme framing, confidence, and counter-signal — a cheaper model can dedupe and cluster.

Variations

Basic

Theme summarizer

Clusters a feedback corpus into themes with representative quotes and counts for a quick read. Single source, on demand.

Advanced

Evidence-graded synthesis

Adds honest frequency, confidence grading, signal-vs-noise separation, counter-signal, and causation guardrails across multiple sources.

Enterprise

Continuous voice-of-customer

Adds scheduled multi-source synthesis, segmentation, trend tracking over time, PII governance, and integration into roadmap and research workflows.

Download the Agent Blueprint

The complete blueprint, zipped — including a runnable run.py you can execute with one API key (Anthropic or OpenAI).

Download Blueprint (.zip)

README.mdsystem-prompt.mdsetup-guide.mdtools.jsonworkflow.mdexamples.md.env.examplekit.jsonrun.pyLICENSENOTICEstarters/

Export

Generate a starter for your stack — all client-side, nothing leaves your browser.

ZIP

Starters use mock tools — swap in your integrations to deploy.

View the source on GitHub

This blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).

User Feedback Synthesis Agent

Overview

AgentAz™ specification

Governance matrix

Agent component mapping

Failure modes

Evaluation

When to use

System prompt

Simulate run

Setup guide

Architecture

Tools required

Workflow

Examples

Implementation notes

Variations

Frequently asked questions

Will it make up themes or quotes?

How does it avoid over-weighting a loud minority?

Can it tell me why a metric like churn changed?

Does it decide what to build?

How does it handle user privacy?

What about themes with very little data?

Related kits

PRD Drafting Agent

Access Request & Provisioning Agent

Account Research Agent

Action Item Tracking Agent