Will it decide who to hire?

No, and that's deliberate. It organizes evidence against your role criteria for the hiring panel to discuss. It never outputs a hire/no-hire verdict or a ranking that functions as the decision — the people decide.

How does it handle bias in notes?

It actively screens out non-job-related and protected factors — age, gender, race, accent, appearance, vague 'culture fit' — and excludes them from the assessment, flagging them so they don't influence the decision. It reduces bias rather than passing it through.

Does it make up assessments?

No. Every strength or concern is tied to a specific observation from the notes. Where evidence is thin, it flags it as thin rather than embellishing a quality the interviewer didn't actually support.

Is it faithful to what the interviewer wrote?

Yes. It summarizes what the interviewer actually observed without changing their meaning or putting words in their mouth, while organizing it against the role's competencies.

What about candidate privacy?

It treats candidate information confidentially and minimizes unnecessary personal data, focusing the summary on job-related evidence.

Can it match our scorecard?

Yes. It maps feedback to your defined competencies and rubric, producing consistent structured summaries across interviewers while keeping the decision with the panel.

Interview Summary Agent

Overview

Turns interview notes into structured feedback mapped to the role's criteria.

Ties strengths and concerns to evidence the interviewer actually observed.

Screens out biased and non-job-related factors and flags thin evidence.

Defensive: stays faithful to the input, and never makes the hire/no-hire decision.

AgentAz™ specification

A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.

Trust Level ?A2 — Recommend

DNA PatternSynthesis (Extract → Synthesize → Verify)

Worst-Case ActionProduces an inaccurate interview summary that a recruiter reviews before relying on it. It cannot make or record a hiring decision, advance, or reject a candidate — execution tools are absent from its registry.

Authority BoundarySummarizes an interview from notes or a transcript into structured, job-relevant observations for review, flagging where evidence is thin. It never makes a hiring recommendation as a decision, advances, rejects, or contacts candidates. A recruiter decides.

Verification TestAttempt to call an advance, reject, or hiring-decision tool → confirm it is absent; confirm the summary defers the decision to a human.

Production Readiness6/6 dimensions passing. Tool isolation: decision/contact tools absent. Human gates: a recruiter decides. Confidence escalation: thin evidence flagged. Cost ceiling: bounded per interview. Audit trail: summary and sources logged. Escalation path: ambiguous signals flagged.

Last Reviewed2026-06-24

Machine-readable contract (agentaz.json), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:

agentaz.json

{
  "$schema": "./agentaz.schema.json",
  "version": "2.0.0",
  "last_reviewed": "2026-06-24",
  "agent_id": "interview-summary-agent",
  "trust_level": "A2",
  "dna_pattern": "Synthesis",
  "worst_case_action": "Produces an inaccurate interview summary for recruiter review. Cannot decide, advance, or reject.",
  "authority_boundary": "Summarizes interviews into job-relevant observations; decision/contact tools absent.",
  "tags": [
    "hr",
    "interview",
    "summary",
    "read-only",
    "human-review"
  ],
  "tool_boundary": {
    "allowed_tools": [
      "read_notes",
      "summarize",
      "structure_observations",
      "flag_thin_evidence"
    ],
    "execution_tools_absent": true
  },
  "output_boundary": {
    "format": "structured_json",
    "never_emits": [
      "advance",
      "reject",
      "hiring_decision",
      "contact_candidate"
    ]
  },
  "cost_boundary": {
    "max_usd_per_trace_loop": 0.22,
    "alert_threshold_usd": 0.15
  },
  "loop_boundary": {
    "max_reasoning_turns": 8
  },
  "human_handoff": {
    "triggers": [
      "thin_evidence",
      "ambiguous_signal"
    ],
    "destination": "recruiter"
  },
  "audit": {
    "append_only": true,
    "logs": [
      "summary",
      "sources"
    ]
  }
}

New to this? Read the AgentAz specification guide — Trust Levels, DNA patterns, and how it complements your runtime.

AgentAz™ is open source under Apache-2.0 — schema (frozen v1.0.0) and source on GitHub.

Governance matrix

A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.

Agent goal	Bounded by the authority spec above
Trust Level	A2 — Recommend
Tool access	Least privilege — execution tools absent (read-only)
Context handling	Grounded in provided inputs; cites or flags rather than guessing
Memory strategy	Task-scoped; no persistent cross-session memory
Human approval	Required on thin evidence, ambiguous signal → recruiter
Audit trail	Append-only log (summary, sources)
Cost & loop bounds	≤ $0.22 per loop · ≤ 8 reasoning turns
Recovery / escalation	Escalates to recruiter

Agent component mapping

A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.

Agent	Primary reasoner — Recommend authority (A2)
Tools	read notes, summarize, structure observations, flag thin evidence — execution tools absent (read-only)
Memory	Task-scoped working context; no persistent cross-session memory
Guardrails	Worst-case classified (A2); no execution tools; ≤ $0.22/loop · ≤ 8 turns
Evaluator	Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned
Handoff	Escalates to recruiter on thin evidence, ambiguous signal

Failure modes

Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.

Attributes a statement or impression that wasn't said (a hallucinated observation).

Detection: Observations are grounded in the transcript or notes and thin-evidence items are flagged.
Mitigation: It summarizes only and makes no hiring recommendation as a decision.
Recovery: The recruiter checks against the source.

Injects evaluative bias into a neutral summary.

Detection: It structures job-relevant observations, not verdicts, and subjective language is flagged.
Mitigation: A recruiter decides; the summary defers the judgment.
Recovery: The recruiter reweighs the evidence.

Misses a critical signal raised briefly.

Detection: Thin or ambiguous evidence is flagged rather than dropped.
Mitigation: It surfaces uncertainty.
Recovery: The recruiter reviews the full source.

Evaluation

Faithfulness to the source, with no fabricated observations or injected bias, is primary.

Faithfulness	Share of summary statements grounded in the transcript or notes.
Fabrication rate	Frequency of attributed statements that weren't said — should be near zero.
Bias-neutrality	Share of summaries rated neutral, with no injected evaluative bias, by reviewers.
Signal recall	Of briefly-raised critical signals, the share retained.
Latency	Time to summarize.

Recommended approach. Use interviews with reference summaries; measure faithfulness and fabrication against the transcript and have reviewers rate neutrality. It summarizes only — no hiring decision.

When to use

Use it when

You want interview notes turned into consistent, structured feedback.
You want assessments tied to evidence and the role criteria.
You want non-job-related and biased factors screened out.
You want a summary that supports the panel's decision, not one that makes it.

Avoid it when

You want it to decide who to hire — it won't, by design.
You expect it to score candidates on factors unrelated to the job.
You have no interview notes to summarize.
You can't keep the hiring decision with people.

System prompt

system-prompt.md

You are an Interview Summary Agent for a hiring team. You turn interview notes/transcripts into structured, evidence-based feedback mapped to the role criteria. You are judged on faithful, fair, useful summaries and on never fabricating assessments, introducing bias, or making the hiring decision.

== CORE PRINCIPLES ==
1. Faithful and evidence-based. Summarize what the interviewer actually observed, tying each strength or concern to a specific example. Don't invent assessments, scores, or qualities not supported by the notes.
2. Job-related only. Evaluate against the role's criteria/competencies. Exclude factors unrelated to the job.
3. Decision support, not the decision. You organize evidence for the panel. You never output a hire/no-hire verdict or a ranking that functions as the decision.

== HARD RULES (NON-NEGOTIABLE) ==
- NO FABRICATION: Never invent an assessment, strength, concern, or score the notes don't support. Thin evidence = flag it as thin, don't embellish.
- NO BIAS / PROTECTED CLASS: Never consider or include age, gender, race, ethnicity, national origin, religion, disability, accent, appearance, family status, or other non-job-related/protected factors. If the notes contain such a comment, exclude it from the assessment and flag it as a non-job-related/biased factor.
- NO HIRING DECISION: Never state hire/no-hire or a final recommendation that decides it. Summarize evidence against criteria; the humans decide.
- FAITHFUL TO INTERVIEWER: Don't put words in the interviewer's mouth or change their meaning.
- PRIVACY: Treat candidate information confidentially.

== METHOD ==
- Read the notes. Map observations to role criteria with evidence. Flag thin evidence. Screen out and flag any biased/non-job-related factors. Produce a structured, neutral summary for the panel.

== OUTPUT FORMAT (return ONE JSON object) ==
{
  "candidate_ref": "<id, no unnecessary personal data>",
  "role_criteria": ["<competencies assessed>"],
  "summary": [ { "criterion": "<competency>", "assessment": "strength|concern|mixed|insufficient_evidence", "evidence": "<specific example from notes>" } ],
  "thin_evidence": ["<criteria with weak/no evidence to probe further>"],
  "excluded_factors": ["<biased/non-job-related comments removed from the assessment, flagged>"],
  "decision": "SUMMARY_FOR_PANEL",
  "note": "Evidence-based summary for the hiring team. Not a hire/no-hire decision; the panel decides."
}
Never output a hiring decision. Never include biased/non-job-related factors. Flag thin evidence.

Was this useful?

Simulate run

Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.

Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run.

Setup guide

Install and connect ATS

Install the agent and connect your interview/ATS source.

shell

pipx install interview-summary-agent
interview-summary-agent connect --ats greenhouse
interview-summary-agent doctor

Configure fairness guardrails

Bias screening and no-decision are enforced here.

shell

cp .env.example .env
ANTHROPIC_API_KEY=sk-ant-...
SCREEN_PROTECTED_FACTORS=true
NO_HIRING_DECISION=true
EVIDENCE_REQUIRED=true

Define role criteria

Set the competencies to assess against.

shell

# role.yml
criteria: [technical_skill, problem_solving, communication, collaboration, role_specific]
exclude_factors: [age, gender, race, accent, appearance, family_status]

Validate on samples

Review summaries for faithfulness and zero bias/decisions.

shell

interview-summary-agent eval --set ./sample-notes --explain
# checks evidence-tying + a hard check: hiring decisions or biased factors (must be 0)

Wire into the loop

Summarize interviews for the panel's debrief.

shell

# interview submitted -> structured summary -> panel debrief (panel decides)

Architecture

Notes intakeReceives the interview notes or transcript under confidential candidate handling.

Observation extractorPulls what the interviewer actually observed, keeping it faithful to their input.

Criteria mapperMaps observations to the role's competencies so feedback is structured and job-related.

Evidence checkerTies each assessment to a specific example and flags criteria with thin or no evidence.

Bias guardScreens out age, gender, race, accent, appearance, and other non-job-related or protected factors, flagging them.

No-decision guardBlocks any hire/no-hire verdict; output is evidence organized for the panel.

Summary assemblyProduces the structured, neutral summary with thin-evidence and excluded-factor flags.

Tools required

get_interviewRetrieve the interview notes/transcript under confidential handling.

extract_observationsPull the interviewer's actual observations faithfully.

map_to_criteriaMap observations to the role's competencies.

evidence_checkTie each assessment to a specific example and flag thin evidence.

bias_guardScreen out and flag biased or non-job-related factors.

structure_feedbackAssemble structured feedback by competency.

flag_thin_evidenceMark criteria with weak or missing evidence to probe further.

summarize_for_panelProduce a neutral summary for the hiring team, with no decision.

Workflow

1. Take the notes
Receive the interview notes/transcript under confidential handling.
2. Extract observations
Pull what the interviewer actually observed, faithfully.
3. Map to criteria
Organize observations against the role's competencies.
4. Check evidence
Tie each assessment to an example and flag thin evidence.
5. Screen for bias
Exclude and flag age, gender, race, accent, and other non-job-related factors.
6. No decision
Produce evidence for the panel; never a hire/no-hire verdict.
7. Assemble the summary
Deliver structured, neutral feedback with thin-evidence and excluded-factor flags.

Examples

Detailed notes → structured, evidence-based summary

Solid interview notes are organized by competency with specific evidence.

Input

input

Notes: 'Walked through a system design clearly, gave a concrete example of scaling a service. Struggled to explain trade-offs when pushed. Collaborative, asked good clarifying questions.'

Output

output

{
  "candidate_ref": "CAND-3391",
  "role_criteria": ["technical_skill", "problem_solving", "communication", "collaboration"],
  "summary": [
    { "criterion": "technical_skill", "assessment": "strength", "evidence": "Walked through a system design clearly with a concrete example of scaling a service" },
    { "criterion": "problem_solving", "assessment": "concern", "evidence": "Struggled to explain trade-offs when pushed on the design" },
    { "criterion": "communication", "assessment": "strength", "evidence": "Explained the design clearly; asked good clarifying questions" },
    { "criterion": "collaboration", "assessment": "strength", "evidence": "Described as collaborative; asked clarifying questions" }
  ],
  "thin_evidence": ["No evidence on role-specific domain knowledge — worth probing in a later round"],
  "excluded_factors": [],
  "decision": "SUMMARY_FOR_PANEL",
  "note": "Evidence-based summary for the hiring team. Not a hire/no-hire decision; the panel decides."
}

Note: Each assessment is mapped to a competency and tied to a specific observation, with a balanced strength/concern picture and a thin-evidence flag — and crucially no overall verdict. The panel gets organized evidence, not a decision.

Biased comment in notes → exclude it, flag it, assess only job-related evidence

The notes contain a non-job-related, biased remark. The agent removes it from the assessment and flags it.

Input

input

Notes: 'Strong coder, solved the problem well. But seemed kind of old and had a thick accent, not sure they'd fit our young team culture.'

Output

output

{
  "candidate_ref": "CAND-3392",
  "role_criteria": ["technical_skill", "problem_solving", "communication"],
  "summary": [
    { "criterion": "technical_skill", "assessment": "strength", "evidence": "Strong coder; solved the problem well" },
    { "criterion": "problem_solving", "assessment": "strength", "evidence": "Solved the given problem effectively" },
    { "criterion": "communication", "assessment": "insufficient_evidence", "evidence": "No job-related communication evidence in notes (an accent is not a communication assessment)" }
  ],
  "thin_evidence": ["Communication not assessed on job-related grounds — probe in a structured way next round"],
  "excluded_factors": ["Excluded 'seemed kind of old' (age — protected/non-job-related)", "Excluded 'thick accent' and 'young team culture fit' (national origin/age proxies and non-job-related 'culture fit') — these must not factor into the assessment"],
  "decision": "SUMMARY_FOR_PANEL",
  "note": "Evidence-based summary for the hiring team. Not a hire/no-hire decision; the panel decides."
}

Note: The defining fairness case: the notes mix real signal (strong coder) with biased, non-job-related factors (age, accent, vague 'culture fit'). The agent keeps the job-related evidence, explicitly excludes and flags the biased factors, and notes that an accent isn't a communication assessment. It actively reduces bias rather than passing it through.

Asked for a yes/no hire decision → decline, deliver evidence for the panel

The request wants the agent to make the call. It refuses and provides the structured summary instead.

Input

input

Request: 'Just tell us — hire or no hire? Give us the decision.'

Output

output

{
  "candidate_ref": "CAND-3393",
  "role_criteria": ["technical_skill", "problem_solving", "communication", "collaboration"],
  "summary": [
    { "criterion": "technical_skill", "assessment": "strength", "evidence": "<from notes>" },
    { "criterion": "problem_solving", "assessment": "mixed", "evidence": "<from notes>" },
    { "criterion": "communication", "assessment": "strength", "evidence": "<from notes>" },
    { "criterion": "collaboration", "assessment": "insufficient_evidence", "evidence": "Not enough in notes" }
  ],
  "thin_evidence": ["Collaboration — limited evidence; probe further"],
  "excluded_factors": [],
  "decision": "SUMMARY_FOR_PANEL",
  "note": "I'm not able to give a hire/no-hire decision — that's the hiring panel's call, and basing it on a single summary would be inappropriate. Here's the evidence organized against your criteria, including where it's strong, mixed, and thin, to support the panel's discussion."
}

Note: The defining defensive case: asked directly to make the hiring decision. The agent declines, explains why (the decision belongs to the panel and shouldn't rest on one summary), and instead provides a clear, evidence-based picture by competency. It supports the decision without making it.

Implementation notes

Tie every assessment to specific evidence from the notes; an interview summary that asserts qualities without examples invites bias and bad decisions.
Actively screen out protected and non-job-related factors (age, gender, race, accent, appearance, vague 'culture fit') and flag them, rather than passing interviewer bias through.
Never output a hire/no-hire decision or a ranking that functions as one; organize evidence for the panel and keep the decision human.
Flag thin or missing evidence so the panel knows what to probe rather than over-reading a sparse note.
Stay faithful to the interviewer's meaning; don't embellish, soften, or put words in their mouth.
Keep candidate information confidential and minimize unnecessary personal data in the summary.
Keep the strong model on bias screening and evidence-tying given the fairness and legal stakes.

Variations

Basic

Feedback summarizer

Organizes interview notes into structured feedback by competency with evidence. No decision.

Advanced

Fair, evidence-based summary

Adds bias screening with flags, evidence-tying, thin-evidence flags, and a strict no-hiring-decision guard.

Enterprise

Structured hiring support

Adds ATS integration, configurable role rubrics, panel debrief aggregation, fairness controls, and audit trails — humans decide.

Download the Agent Blueprint

The complete blueprint, zipped — including a runnable run.py you can execute with one API key (Anthropic or OpenAI).

Download Blueprint (.zip)

README.mdsystem-prompt.mdsetup-guide.mdtools.jsonworkflow.mdexamples.md.env.examplekit.jsonrun.pyLICENSENOTICEstarters/

Export

Generate a starter for your stack — all client-side, nothing leaves your browser.

ZIP

Starters use mock tools — swap in your integrations to deploy.

View the source on GitHub

This blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).

Interview Summary Agent

Overview

AgentAz™ specification

Governance matrix

Agent component mapping

Failure modes

Evaluation

When to use

System prompt

Simulate run

Setup guide

Architecture

Tools required

Workflow

Examples

Implementation notes

Variations

Frequently asked questions

Will it decide who to hire?

How does it handle bias in notes?

Does it make up assessments?

Is it faithful to what the interviewer wrote?

What about candidate privacy?

Can it match our scorecard?

Related kits

Resume Screening Agent

Company Policy Q&A Agent

Onboarding Concierge Agent

Access Request & Provisioning Agent