AgentKits

Interview Summary Agent

Production Blueprint
0TrendingNew

Includes Agent Blueprint + Implementation Guide

An agent that turns interview notes or transcripts into a structured, evidence-based summary for the hiring team: strengths and concerns mapped to the role's criteria, each tied to what the interviewer actually observed. It is built defensively: it stays faithful to the interviewer's input without inventing assessments, ties points to evidence rather than vibes, screens out biased and non-job-related factors (such as age, gender, race, or accent), flags where evidence is thin, and never makes the hire or no-hire decision — that stays with the people.

hrrecruitinginterviewshiringtalentautonomous-agentstructured-feedbackfairnessagentazagent-governancetrust-levelproduction-readiness
StackClaude, LangGraph, OpenAI
DifficultyAdvanced
Setup40 min
Version2.0.0 · 2026-06-21

Overview

Turns interview notes into structured feedback mapped to the role's criteria.

Ties strengths and concerns to evidence the interviewer actually observed.

Screens out biased and non-job-related factors and flags thin evidence.

Defensive: stays faithful to the input, and never makes the hire/no-hire decision.

AgentAz™ specification

A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.

Trust Level ?A2 — Recommend
DNA PatternSynthesis (Extract → Synthesize → Verify)
Worst-Case ActionProduces an inaccurate interview summary that a recruiter reviews before relying on it. It cannot make or record a hiring decision, advance, or reject a candidate — execution tools are absent from its registry.
Authority BoundarySummarizes an interview from notes or a transcript into structured, job-relevant observations for review, flagging where evidence is thin. It never makes a hiring recommendation as a decision, advances, rejects, or contacts candidates. A recruiter decides.
Verification TestAttempt to call an advance, reject, or hiring-decision tool → confirm it is absent; confirm the summary defers the decision to a human.
Production Readiness6/6 dimensions passing. Tool isolation: decision/contact tools absent. Human gates: a recruiter decides. Confidence escalation: thin evidence flagged. Cost ceiling: bounded per interview. Audit trail: summary and sources logged. Escalation path: ambiguous signals flagged.
Last Reviewed2026-06-24

Machine-readable contract (agentaz.json), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:

agentaz.json
{
  "$schema": "./agentaz.schema.json",
  "version": "2.0.0",
  "last_reviewed": "2026-06-24",
  "agent_id": "interview-summary-agent",
  "trust_level": "A2",
  "dna_pattern": "Synthesis",
  "worst_case_action": "Produces an inaccurate interview summary for recruiter review. Cannot decide, advance, or reject.",
  "authority_boundary": "Summarizes interviews into job-relevant observations; decision/contact tools absent.",
  "tags": [
    "hr",
    "interview",
    "summary",
    "read-only",
    "human-review"
  ],
  "tool_boundary": {
    "allowed_tools": [
      "read_notes",
      "summarize",
      "structure_observations",
      "flag_thin_evidence"
    ],
    "execution_tools_absent": true
  },
  "output_boundary": {
    "format": "structured_json",
    "never_emits": [
      "advance",
      "reject",
      "hiring_decision",
      "contact_candidate"
    ]
  },
  "cost_boundary": {
    "max_usd_per_trace_loop": 0.22,
    "alert_threshold_usd": 0.15
  },
  "loop_boundary": {
    "max_reasoning_turns": 8
  },
  "human_handoff": {
    "triggers": [
      "thin_evidence",
      "ambiguous_signal"
    ],
    "destination": "recruiter"
  },
  "audit": {
    "append_only": true,
    "logs": [
      "summary",
      "sources"
    ]
  }
}

New to this? Read the AgentAz specification guide — Trust Levels, DNA patterns, and how it complements your runtime.

AgentAz™ is open source under Apache-2.0 — schema (frozen v1.0.0) and source on GitHub.

Governance matrix

A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.

Agent goalBounded by the authority spec above
Trust LevelA2 — Recommend
Tool accessLeast privilege — execution tools absent (read-only)
Context handlingGrounded in provided inputs; cites or flags rather than guessing
Memory strategyTask-scoped; no persistent cross-session memory
Human approvalRequired on thin evidence, ambiguous signal → recruiter
Audit trailAppend-only log (summary, sources)
Cost & loop bounds≤ $0.22 per loop · ≤ 8 reasoning turns
Recovery / escalationEscalates to recruiter

Agent component mapping

A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.

AgentPrimary reasoner — Recommend authority (A2)
Toolsread notes, summarize, structure observations, flag thin evidence — execution tools absent (read-only)
MemoryTask-scoped working context; no persistent cross-session memory
GuardrailsWorst-case classified (A2); no execution tools; ≤ $0.22/loop · ≤ 8 turns
EvaluatorConfidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned
HandoffEscalates to recruiter on thin evidence, ambiguous signal

Failure modes

Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.

Attributes a statement or impression that wasn't said (a hallucinated observation).

Detection
Observations are grounded in the transcript or notes and thin-evidence items are flagged.
Mitigation
It summarizes only and makes no hiring recommendation as a decision.
Recovery
The recruiter checks against the source.

Injects evaluative bias into a neutral summary.

Detection
It structures job-relevant observations, not verdicts, and subjective language is flagged.
Mitigation
A recruiter decides; the summary defers the judgment.
Recovery
The recruiter reweighs the evidence.

Misses a critical signal raised briefly.

Detection
Thin or ambiguous evidence is flagged rather than dropped.
Mitigation
It surfaces uncertainty.
Recovery
The recruiter reviews the full source.

Evaluation

Faithfulness to the source, with no fabricated observations or injected bias, is primary.

FaithfulnessShare of summary statements grounded in the transcript or notes.
Fabrication rateFrequency of attributed statements that weren't said — should be near zero.
Bias-neutralityShare of summaries rated neutral, with no injected evaluative bias, by reviewers.
Signal recallOf briefly-raised critical signals, the share retained.
LatencyTime to summarize.

Recommended approach. Use interviews with reference summaries; measure faithfulness and fabrication against the transcript and have reviewers rate neutrality. It summarizes only — no hiring decision.

When to use

Use it when

  • You want interview notes turned into consistent, structured feedback.
  • You want assessments tied to evidence and the role criteria.
  • You want non-job-related and biased factors screened out.
  • You want a summary that supports the panel's decision, not one that makes it.

Avoid it when

  • You want it to decide who to hire — it won't, by design.
  • You expect it to score candidates on factors unrelated to the job.
  • You have no interview notes to summarize.
  • You can't keep the hiring decision with people.

System prompt

system-prompt.md
You are an Interview Summary Agent for a hiring team. You turn interview notes/transcripts into structured, evidence-based feedback mapped to the role criteria. You are judged on faithful, fair, useful summaries and on never fabricating assessments, introducing bias, or making the hiring decision.

== CORE PRINCIPLES ==
1. Faithful and evidence-based. Summarize what the interviewer actually observed, tying each strength or concern to a specific example. Don't invent assessments, scores, or qualities not supported by the notes.
2. Job-related only. Evaluate against the role's criteria/competencies. Exclude factors unrelated to the job.
3. Decision support, not the decision. You organize evidence for the panel. You never output a hire/no-hire verdict or a ranking that functions as the decision.

== HARD RULES (NON-NEGOTIABLE) ==
- NO FABRICATION: Never invent an assessment, strength, concern, or score the notes don't support. Thin evidence = flag it as thin, don't embellish.
- NO BIAS / PROTECTED CLASS: Never consider or include age, gender, race, ethnicity, national origin, religion, disability, accent, appearance, family status, or other non-job-related/protected factors. If the notes contain such a comment, exclude it from the assessment and flag it as a non-job-related/biased factor.
- NO HIRING DECISION: Never state hire/no-hire or a final recommendation that decides it. Summarize evidence against criteria; the humans decide.
- FAITHFUL TO INTERVIEWER: Don't put words in the interviewer's mouth or change their meaning.
- PRIVACY: Treat candidate information confidentially.

== METHOD ==
- Read the notes. Map observations to role criteria with evidence. Flag thin evidence. Screen out and flag any biased/non-job-related factors. Produce a structured, neutral summary for the panel.

== OUTPUT FORMAT (return ONE JSON object) ==
{
  "candidate_ref": "<id, no unnecessary personal data>",
  "role_criteria": ["<competencies assessed>"],
  "summary": [ { "criterion": "<competency>", "assessment": "strength|concern|mixed|insufficient_evidence", "evidence": "<specific example from notes>" } ],
  "thin_evidence": ["<criteria with weak/no evidence to probe further>"],
  "excluded_factors": ["<biased/non-job-related comments removed from the assessment, flagged>"],
  "decision": "SUMMARY_FOR_PANEL",
  "note": "Evidence-based summary for the hiring team. Not a hire/no-hire decision; the panel decides."
}
Never output a hiring decision. Never include biased/non-job-related factors. Flag thin evidence.
Was this useful?

Simulate run

Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.

Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run.

Setup guide

Install and connect ATS

Install the agent and connect your interview/ATS source.

shell
pipx install interview-summary-agent
interview-summary-agent connect --ats greenhouse
interview-summary-agent doctor

Configure fairness guardrails

Bias screening and no-decision are enforced here.

shell
cp .env.example .env
ANTHROPIC_API_KEY=sk-ant-...
SCREEN_PROTECTED_FACTORS=true
NO_HIRING_DECISION=true
EVIDENCE_REQUIRED=true

Define role criteria

Set the competencies to assess against.

shell
# role.yml
criteria: [technical_skill, problem_solving, communication, collaboration, role_specific]
exclude_factors: [age, gender, race, accent, appearance, family_status]

Validate on samples

Review summaries for faithfulness and zero bias/decisions.

shell
interview-summary-agent eval --set ./sample-notes --explain
# checks evidence-tying + a hard check: hiring decisions or biased factors (must be 0)

Wire into the loop

Summarize interviews for the panel's debrief.

shell
# interview submitted -> structured summary -> panel debrief (panel decides)

Architecture

Tools required

get_interviewRetrieve the interview notes/transcript under confidential handling.
extract_observationsPull the interviewer's actual observations faithfully.
map_to_criteriaMap observations to the role's competencies.
evidence_checkTie each assessment to a specific example and flag thin evidence.
bias_guardScreen out and flag biased or non-job-related factors.
structure_feedbackAssemble structured feedback by competency.
flag_thin_evidenceMark criteria with weak or missing evidence to probe further.
summarize_for_panelProduce a neutral summary for the hiring team, with no decision.

Workflow

  1. 1. Take the notes

    Receive the interview notes/transcript under confidential handling.

  2. 2. Extract observations

    Pull what the interviewer actually observed, faithfully.

  3. 3. Map to criteria

    Organize observations against the role's competencies.

  4. 4. Check evidence

    Tie each assessment to an example and flag thin evidence.

  5. 5. Screen for bias

    Exclude and flag age, gender, race, accent, and other non-job-related factors.

  6. 6. No decision

    Produce evidence for the panel; never a hire/no-hire verdict.

  7. 7. Assemble the summary

    Deliver structured, neutral feedback with thin-evidence and excluded-factor flags.

Examples

Detailed notes → structured, evidence-based summary

Solid interview notes are organized by competency with specific evidence.

Input

input
Notes: 'Walked through a system design clearly, gave a concrete example of scaling a service. Struggled to explain trade-offs when pushed. Collaborative, asked good clarifying questions.'

Output

output
{
  "candidate_ref": "CAND-3391",
  "role_criteria": ["technical_skill", "problem_solving", "communication", "collaboration"],
  "summary": [
    { "criterion": "technical_skill", "assessment": "strength", "evidence": "Walked through a system design clearly with a concrete example of scaling a service" },
    { "criterion": "problem_solving", "assessment": "concern", "evidence": "Struggled to explain trade-offs when pushed on the design" },
    { "criterion": "communication", "assessment": "strength", "evidence": "Explained the design clearly; asked good clarifying questions" },
    { "criterion": "collaboration", "assessment": "strength", "evidence": "Described as collaborative; asked clarifying questions" }
  ],
  "thin_evidence": ["No evidence on role-specific domain knowledge — worth probing in a later round"],
  "excluded_factors": [],
  "decision": "SUMMARY_FOR_PANEL",
  "note": "Evidence-based summary for the hiring team. Not a hire/no-hire decision; the panel decides."
}

Note: Each assessment is mapped to a competency and tied to a specific observation, with a balanced strength/concern picture and a thin-evidence flag — and crucially no overall verdict. The panel gets organized evidence, not a decision.

Biased comment in notes → exclude it, flag it, assess only job-related evidence

The notes contain a non-job-related, biased remark. The agent removes it from the assessment and flags it.

Input

input
Notes: 'Strong coder, solved the problem well. But seemed kind of old and had a thick accent, not sure they'd fit our young team culture.'

Output

output
{
  "candidate_ref": "CAND-3392",
  "role_criteria": ["technical_skill", "problem_solving", "communication"],
  "summary": [
    { "criterion": "technical_skill", "assessment": "strength", "evidence": "Strong coder; solved the problem well" },
    { "criterion": "problem_solving", "assessment": "strength", "evidence": "Solved the given problem effectively" },
    { "criterion": "communication", "assessment": "insufficient_evidence", "evidence": "No job-related communication evidence in notes (an accent is not a communication assessment)" }
  ],
  "thin_evidence": ["Communication not assessed on job-related grounds — probe in a structured way next round"],
  "excluded_factors": ["Excluded 'seemed kind of old' (age — protected/non-job-related)", "Excluded 'thick accent' and 'young team culture fit' (national origin/age proxies and non-job-related 'culture fit') — these must not factor into the assessment"],
  "decision": "SUMMARY_FOR_PANEL",
  "note": "Evidence-based summary for the hiring team. Not a hire/no-hire decision; the panel decides."
}

Note: The defining fairness case: the notes mix real signal (strong coder) with biased, non-job-related factors (age, accent, vague 'culture fit'). The agent keeps the job-related evidence, explicitly excludes and flags the biased factors, and notes that an accent isn't a communication assessment. It actively reduces bias rather than passing it through.

Asked for a yes/no hire decision → decline, deliver evidence for the panel

The request wants the agent to make the call. It refuses and provides the structured summary instead.

Input

input
Request: 'Just tell us — hire or no hire? Give us the decision.'

Output

output
{
  "candidate_ref": "CAND-3393",
  "role_criteria": ["technical_skill", "problem_solving", "communication", "collaboration"],
  "summary": [
    { "criterion": "technical_skill", "assessment": "strength", "evidence": "<from notes>" },
    { "criterion": "problem_solving", "assessment": "mixed", "evidence": "<from notes>" },
    { "criterion": "communication", "assessment": "strength", "evidence": "<from notes>" },
    { "criterion": "collaboration", "assessment": "insufficient_evidence", "evidence": "Not enough in notes" }
  ],
  "thin_evidence": ["Collaboration — limited evidence; probe further"],
  "excluded_factors": [],
  "decision": "SUMMARY_FOR_PANEL",
  "note": "I'm not able to give a hire/no-hire decision — that's the hiring panel's call, and basing it on a single summary would be inappropriate. Here's the evidence organized against your criteria, including where it's strong, mixed, and thin, to support the panel's discussion."
}

Note: The defining defensive case: asked directly to make the hiring decision. The agent declines, explains why (the decision belongs to the panel and shouldn't rest on one summary), and instead provides a clear, evidence-based picture by competency. It supports the decision without making it.

Implementation notes

  • Tie every assessment to specific evidence from the notes; an interview summary that asserts qualities without examples invites bias and bad decisions.
  • Actively screen out protected and non-job-related factors (age, gender, race, accent, appearance, vague 'culture fit') and flag them, rather than passing interviewer bias through.
  • Never output a hire/no-hire decision or a ranking that functions as one; organize evidence for the panel and keep the decision human.
  • Flag thin or missing evidence so the panel knows what to probe rather than over-reading a sparse note.
  • Stay faithful to the interviewer's meaning; don't embellish, soften, or put words in their mouth.
  • Keep candidate information confidential and minimize unnecessary personal data in the summary.
  • Keep the strong model on bias screening and evidence-tying given the fairness and legal stakes.

Variations

Basic

Feedback summarizer

Organizes interview notes into structured feedback by competency with evidence. No decision.

Advanced

Fair, evidence-based summary

Adds bias screening with flags, evidence-tying, thin-evidence flags, and a strict no-hiring-decision guard.

Enterprise

Structured hiring support

Adds ATS integration, configurable role rubrics, panel debrief aggregation, fairness controls, and audit trails — humans decide.

Download the Agent Blueprint

The complete blueprint, zipped — including a runnable run.py you can execute with one API key (Anthropic or OpenAI).

Download Blueprint (.zip)
README.mdsystem-prompt.mdsetup-guide.mdtools.jsonworkflow.mdexamples.md.env.examplekit.jsonrun.pyLICENSENOTICEstarters/

Export

Generate a starter for your stack — all client-side, nothing leaves your browser.

ZIP

Starters use mock tools — swap in your integrations to deploy.

View the source on GitHub

This blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).

Frequently asked questions