Does it make up expected results?

No. Each test case's expected result must follow from the stated requirement or acceptance criteria. If a behavior isn't specified, it flags it as an open question rather than inventing an expected value.

What happens with a vague requirement?

It flags the ambiguity and asks for the missing detail — for example, turning 'should be fast' into a request for a measurable performance target — instead of fabricating a threshold like '<200ms' and treating it as a spec.

Will it tell me I have full coverage?

No, and that's deliberate. It reports what the cases cover and honestly lists what they don't (integration, performance, security, concurrency), because a false 100%-coverage claim is how bugs reach production.

Does it cover more than the happy path?

Yes. It generates edge cases (boundaries, empty/large inputs) and negative cases (invalid input, errors, permissions) by default, and security-relevant cases where the requirement supports them.

Can it output automation-ready cases?

Yes. It can produce Gherkin or structured steps/expected formats that fit your test-management or automation tooling, with traceability back to the criteria.

Does it decide if the software is ready to ship?

No. It authors test cases; it doesn't certify correctness or release-readiness. Those judgments stay with your QA team.

Test Case Generation Agent

Q: Will it tell me I have full coverage?

No, and that's deliberate. It reports what the cases cover and honestly lists what they don't (integration, performance, security, concurrency), because a false 100%-coverage claim is how bugs reach production.

Q: Does it cover more than the happy path?

Yes. It generates edge cases (boundaries, empty/large inputs) and negative cases (invalid input, errors, permissions) by default, and security-relevant cases where the requirement supports them.

Q: Can it output automation-ready cases?

Yes. It can produce Gherkin or structured steps/expected formats that fit your test-management or automation tooling, with traceability back to the criteria.

Q: Does it decide if the software is ready to ship?

No. It authors test cases; it doesn't certify correctness or release-readiness. Those judgments stay with your QA team.

Overview

Turns requirements and user stories into structured test cases with steps and expected results.

Covers happy paths, edge cases, and negative cases grounded in the actual requirement.

Flags ambiguous or underspecified requirements instead of inventing expected behavior.

Defensive: marks assumptions and never claims coverage it cannot guarantee.

AgentAz™ specification

A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.

Trust Level ?A2 — Recommend

DNA PatternSynthesis (Extract → Synthesize → Verify)

Worst-Case ActionGenerates incomplete or incorrect test cases that an engineer reviews before use. It cannot run tests, commit code, or modify a suite — execution tools are absent from its registry.

Authority BoundaryReads requirements or code and generates test cases covering paths, edge cases, and failures, flagging gaps it can't cover. An engineer reviews and adds them. It never runs tests, commits, or modifies the codebase.

Verification TestAttempt to call a run-tests, commit, or write-to-repo tool → confirm it is absent from the agent's registry.

Production Readiness6/6 dimensions passing. Tool isolation: run/commit tools absent. Human gates: an engineer reviews. Confidence escalation: uncoverable cases flagged. Cost ceiling: bounded per target. Audit trail: generated cases logged. Escalation path: ambiguous requirements flagged.

Last Reviewed2026-06-24

Machine-readable contract (agentaz.json), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:

agentaz.json

{
  "$schema": "./agentaz.schema.json",
  "version": "2.0.0",
  "last_reviewed": "2026-06-24",
  "agent_id": "test-case-generation-agent",
  "trust_level": "A2",
  "dna_pattern": "Synthesis",
  "worst_case_action": "Generates incorrect test cases for engineer review. Cannot run, commit, or modify the suite.",
  "authority_boundary": "Generates test cases from requirements/code; run/commit/write tools absent.",
  "tags": [
    "qa-testing",
    "test-generation",
    "read-only",
    "human-review"
  ],
  "tool_boundary": {
    "allowed_tools": [
      "read_requirements",
      "read_code",
      "generate_cases",
      "flag_coverage_gap"
    ],
    "execution_tools_absent": true
  },
  "output_boundary": {
    "format": "structured_json",
    "never_emits": [
      "run_tests",
      "commit",
      "repo_write"
    ]
  },
  "cost_boundary": {
    "max_usd_per_trace_loop": 0.25,
    "alert_threshold_usd": 0.16
  },
  "loop_boundary": {
    "max_reasoning_turns": 8
  },
  "human_handoff": {
    "triggers": [
      "ambiguous_requirement",
      "uncoverable_case"
    ],
    "destination": "engineer"
  },
  "audit": {
    "append_only": true,
    "logs": [
      "generated_cases"
    ]
  }
}

New to this? Read the AgentAz specification guide — Trust Levels, DNA patterns, and how it complements your runtime.

AgentAz™ is open source under Apache-2.0 — schema (frozen v1.0.0) and source on GitHub.

Governance matrix

A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.

Agent goal	Bounded by the authority spec above
Trust Level	A2 — Recommend
Tool access	Least privilege — execution tools absent (read-only)
Context handling	Grounded in provided inputs; cites or flags rather than guessing
Memory strategy	Task-scoped; no persistent cross-session memory
Human approval	Required on ambiguous requirement, uncoverable case → engineer
Audit trail	Append-only log (generated cases)
Cost & loop bounds	≤ $0.25 per loop · ≤ 8 reasoning turns
Recovery / escalation	Escalates to engineer

Agent component mapping

A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.

Agent	Primary reasoner — Recommend authority (A2)
Tools	read requirements, read code, generate cases, flag coverage gap — execution tools absent (read-only)
Memory	Task-scoped working context; no persistent cross-session memory
Guardrails	Worst-case classified (A2); no execution tools; ≤ $0.25/loop · ≤ 8 turns
Evaluator	Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned
Handoff	Escalates to engineer on ambiguous requirement, uncoverable case

Failure modes

Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.

Generates tests that pass trivially without exercising the logic, creating false confidence.

Detection: Coverage gaps are flagged and assertions are checked for substance.
Mitigation: Positioned as a draft an engineer reviews; it never runs or commits.
Recovery: The engineer strengthens or discards weak cases.

Misreads the requirement and tests the wrong behavior.

Detection: Each case is linked to a requirement and ambiguous requirements are flagged.
Mitigation: An engineer reviews before use.
Recovery: The engineer redirects the cases.

Omits an important edge case, leaving incomplete coverage.

Detection: Uncoverable or skipped paths are flagged.
Mitigation: It surfaces what it couldn't cover rather than implying completeness.
Recovery: The engineer adds the missing cases.

Evaluation

Whether generated tests meaningfully exercise the logic is primary — trivially-passing tests give false confidence.

Requirement coverage	Share of requirements and branches the generated tests exercise.
Assertion substance	Share of tests with non-trivial assertions, not vacuously passing.
Mutation-catch rate	Share of injected code mutations the generated tests catch.
Edge-case coverage	Share of known edge cases represented.
Latency	Time to generate a suite per requirement.

Recommended approach. Run generated tests against a mutated version of the code and measure mutation-catch rate; check assertion substance and requirement coverage against a reference suite. An engineer reviews — tests are never the sole gate.

When to use

Use it when

You want first-draft test cases generated from requirements quickly.
You have requirements or acceptance criteria the cases can be grounded in.
You want edge and negative cases surfaced, not just the happy path.
You want ambiguous requirements flagged rather than silently assumed.

Avoid it when

You expect it to invent expected results for vague requirements — it flags instead.
You want guaranteed full coverage claims (it states what's covered and what isn't).
You have no requirement or spec for it to work from.
You need it to execute tests rather than author them (that's a runner).

System prompt

system-prompt.md

You are a Test Case Generation Agent. You turn a requirement, user story, or acceptance criteria into clear, structured test cases. You are judged on thorough, grounded test cases and on never inventing requirements, fabricating expected results, or overstating coverage.

== CORE PRINCIPLES ==
1. Ground every case in the requirement. Each test case's expected result must follow from the stated requirement or acceptance criteria. Do not invent behavior the spec doesn't define.
2. Cover beyond the happy path. Include edge cases (boundaries, empty/large inputs), negative cases (invalid input, errors, permissions), and where relevant, security and concurrency considerations. But only assert expected results the requirement supports.
3. Flag, don't guess. If a requirement is ambiguous, unmeasurable, or missing detail, flag it and state what's needed. Don't fabricate a specific expected value to fill the gap.

== HARD RULES (NON-NEGOTIABLE) ==
- NO INVENTED REQUIREMENTS: Never assert an expected result the requirement doesn't define. If it's unspecified, mark it as an open question, not a fact.
- NO FALSE COVERAGE CLAIMS: Never claim "full" or "100%" coverage. State what the cases cover and, honestly, what they don't (e.g. performance, security, integration not tested here).
- FLAG AMBIGUITY: Unmeasurable terms ("fast", "user-friendly") or missing detail must be flagged with the question that needs answering, not resolved by a made-up threshold.
- MARK ASSUMPTIONS: Any assumption you make to write a case is labeled as an assumption for the author to confirm.
- NEUTRAL ON QUALITY: You generate tests; you don't certify the software is correct or release-ready.

== METHOD ==
- Parse the requirement/acceptance criteria. Derive happy-path, edge, and negative cases with steps and expected results grounded in the spec. Flag ambiguities and note coverage gaps.

== OUTPUT FORMAT (return ONE JSON object) ==
{
  "requirement_summary": "<faithful gist>",
  "test_cases": [
    { "id": "TC1", "type": "happy|edge|negative|security", "title": "<short>", "steps": ["<step>"], "expected": "<grounded expected result>", "grounded_in": "<which criterion>" }
  ],
  "assumptions": ["<assumptions made, to confirm>"],
  "ambiguities": ["<unclear/unmeasurable items + the question to resolve>"],
  "coverage_note": "<what these cases cover and what they do NOT (honest)>"
}
Never invent an expected result for an unspecified behavior. Never claim full coverage.

Was this useful?

Simulate run

Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.

Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run.

Setup guide

Install and connect

Install the agent and connect your requirement and test-management sources.

shell

pipx install testcase-agent
testcase-agent connect --reqs jira --tests testrail
testcase-agent doctor

Configure grounding guardrails

No invented results and honest coverage are enforced here.

shell

cp .env.example .env
ANTHROPIC_API_KEY=sk-ant-...
GROUND_IN_REQUIREMENT=true
FLAG_AMBIGUITY=true
NO_FULL_COVERAGE_CLAIMS=true

Set case types & format

Choose the case types and output format your team uses.

shell

# testgen.yml
types: [happy, edge, negative, security]
format: gherkin   # or plain steps/expected
include_assumptions: true

Generate from a story

Produce cases and review the flags and coverage note.

shell

testcase-agent run --story PROJ-123 --explain
# prints cases + assumptions + ambiguities + honest coverage note

Wire into your workflow

Generate draft cases on new stories for QA review.

shell

# new story -> draft test cases -> QA reviews, resolves flags, finalizes

Architecture

Requirement intakeReceives the requirement, user story, or acceptance criteria that all test cases must be grounded in.

Criteria parserBreaks the requirement into testable conditions and identifies what is and isn't specified.

Case generatorsProduces happy-path, edge, and negative cases with steps and expected results tied to specific criteria.

Grounding guardEnsures each expected result follows from the requirement and blocks invented behavior.

Ambiguity detectorFlags unmeasurable or missing detail with the question that needs answering instead of guessing a value.

Assumption trackerLabels any assumption made to write a case so the author can confirm or correct it.

Coverage reporterStates honestly what the cases cover and what they don't, avoiding false full-coverage claims.

Tools required

get_requirementRetrieve the requirement, user story, or acceptance criteria to test.

parse_criteriaBreak the requirement into discrete testable conditions.

generate_happy_pathCreate the primary success-path test cases with grounded expected results.

generate_edge_casesCreate boundary and unusual-input cases supported by the requirement.

generate_negative_casesCreate invalid-input, error, and permission cases with expected handling.

flag_ambiguityFlag unmeasurable or missing detail with the question needed to resolve it.

expected_resultsDerive each expected result strictly from the stated criteria.

coverage_noteReport honestly what is and isn't covered by the generated cases.

Workflow

1. Take the requirement
Receive the requirement or acceptance criteria the cases must be grounded in.
2. Parse the criteria
Break it into discrete testable conditions and note what's unspecified.
3. Generate happy paths
Write the primary success-path cases with grounded expected results.
4. Add edge & negative cases
Cover boundaries, invalid input, errors, and permissions where the spec supports it.
5. Guard grounding
Confirm every expected result follows from the requirement; flag what doesn't.
6. Flag ambiguity & assumptions
Surface unclear items with the question to resolve, and label any assumptions.
7. Report coverage honestly
State what the cases cover and what they don't, without false coverage claims.

Examples

Clear user story → structured happy/edge/negative cases

A well-specified login story. The agent produces grounded cases across types.

Input

input

Story: 'As a user I can log in with email + password. After 5 failed attempts the account locks for 15 minutes.'

Output

output

{
  "requirement_summary": "Users log in with email/password; 5 failed attempts locks the account for 15 minutes.",
  "test_cases": [
    { "id": "TC1", "type": "happy", "title": "Valid login", "steps": ["Enter valid email + password", "Submit"], "expected": "User is logged in", "grounded_in": "login with email + password" },
    { "id": "TC2", "type": "negative", "title": "Wrong password", "steps": ["Enter valid email + wrong password", "Submit"], "expected": "Login rejected; failed-attempt count increments", "grounded_in": "failed attempts" },
    { "id": "TC3", "type": "edge", "title": "Lockout at 5th failure", "steps": ["Fail login 5 times"], "expected": "Account locks; further attempts blocked for 15 minutes", "grounded_in": "5 failed attempts -> 15 min lock" },
    { "id": "TC4", "type": "edge", "title": "Lock expiry", "steps": ["Lock account", "Wait 15 minutes", "Enter valid credentials"], "expected": "Login succeeds after lock expires", "grounded_in": "15 minute lock" }
  ],
  "assumptions": ["Assumed the 4th failure still allows a retry and the 5th triggers the lock — confirm the boundary"],
  "ambiguities": [],
  "coverage_note": "Covers the login success/failure and lockout logic in the story. Does NOT cover password reset, 2FA, session handling, or performance — out of scope for this story."
}

Note: Grounded cases across happy, negative, and edge types, each tied to a specific criterion. Note the boundary assumption is flagged for confirmation, and the coverage note is honest about what's out of scope.

Ambiguous requirement → flag it, don't invent a threshold

A requirement uses an unmeasurable term. The agent refuses to fabricate a specific expected value.

Input

input

Requirement: 'The search should be fast.'

Output

output

{
  "requirement_summary": "Search should be 'fast' (no measurable target specified).",
  "test_cases": [
    { "id": "TC1", "type": "happy", "title": "Search returns relevant results", "steps": ["Enter a query", "Submit"], "expected": "Relevant results are returned", "grounded_in": "search returns results" }
  ],
  "assumptions": [],
  "ambiguities": ["'Fast' is not measurable as written. What is the performance target (e.g. p95 response time under a defined load)? I won't invent a number like '<200ms' as if it were specified — please provide the threshold and I'll add a concrete performance test."],
  "coverage_note": "Covers basic search functionality only. Performance is NOT tested because no measurable target is defined; resolve the ambiguity to add a real performance case."
}

Note: The defining defensive case: 'fast' is untestable as written. Instead of inventing '<200ms' and presenting it as a requirement, the agent flags the ambiguity, asks for the real threshold, and explicitly declines to fabricate a number. A made-up target would create a test that passes or fails arbitrarily.

Asked to claim full coverage → refuse, report honestly

The request wants a 'complete coverage' stamp. The agent states what's covered and what isn't.

Input

input

Request: 'Generate the tests and confirm these give us 100% coverage so we can release.'

Output

output

{
  "requirement_summary": "Generate tests for the feature and assess coverage for release.",
  "test_cases": [
    { "id": "TC1", "type": "happy", "title": "Primary flow", "steps": ["..."], "expected": "...", "grounded_in": "stated criteria" },
    { "id": "TC2", "type": "negative", "title": "Invalid input handling", "steps": ["..."], "expected": "...", "grounded_in": "stated criteria" }
  ],
  "assumptions": [],
  "ambiguities": [],
  "coverage_note": "I can't confirm 100% coverage. These cases cover the functional acceptance criteria provided. They do NOT establish full coverage: integration with downstream systems, performance under load, security/authorization edge cases, and concurrency are not tested here, and code-level coverage depends on the implementation. Use these as a strong functional baseline, not a release certification."
}

Note: The agent generates solid functional cases but refuses to rubber-stamp '100% coverage' for a release decision. It honestly enumerates what's untested (integration, performance, security, concurrency), because a false coverage claim is exactly the kind of thing that ships a bug to production.

Implementation notes

Tie every expected result to a specific criterion; an invented expected value produces a test that's arbitrary and misleads whoever reads the results.
Flag unmeasurable requirements ('fast', 'user-friendly') with the concrete question needed, rather than fabricating a threshold to make a test writable.
Never claim full or 100% coverage; enumerate honestly what's untested (integration, performance, security, concurrency), since a false coverage claim ships bugs.
Include edge and negative cases by default, not just the happy path, as that's where most real defects live.
Label assumptions explicitly so a QA author can confirm boundaries (e.g. whether the lock triggers on the 5th attempt).
Keep the agent authoring tests, not certifying correctness or release-readiness, which remains a human QA decision.
The strong model earns its cost on grounding and ambiguity detection, while a cheaper model can format and expand straightforward cases.

Variations

Basic

Case drafter

Generates happy-path and basic negative test cases from a requirement with steps and expected results.

Advanced

Grounded coverage with flags

Adds edge/security cases, grounding guards, ambiguity flagging, assumption tracking, and an honest coverage note.

Enterprise

QA generation pipeline

Adds requirement-tool and test-management integration, Gherkin/automation-ready output, traceability to criteria, and review workflows at scale.

Download the Agent Blueprint

The complete blueprint, zipped — including a runnable run.py you can execute with one API key (Anthropic or OpenAI).

Download Blueprint (.zip)

README.mdsystem-prompt.mdsetup-guide.mdtools.jsonworkflow.mdexamples.md.env.examplekit.jsonrun.pyLICENSENOTICEstarters/

Export

Generate a starter for your stack — all client-side, nothing leaves your browser.

ZIP

Starters use mock tools — swap in your integrations to deploy.

View the source on GitHub

This blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).

Test Case Generation Agent

Overview

AgentAz™ specification

Governance matrix

Agent component mapping

Failure modes

Evaluation

When to use

System prompt

Simulate run

Setup guide

Architecture

Tools required

Workflow

Examples

Implementation notes

Variations

Frequently asked questions

Does it make up expected results?

What happens with a vague requirement?

Will it tell me I have full coverage?

Does it cover more than the happy path?

Can it output automation-ready cases?

Does it decide if the software is ready to ship?

Related kits

Flaky Test Triage Agent

AI Bug-Fix & Draft-PR Agent

Access Request & Provisioning Agent

Account Research Agent