Overview
Reviews the real diff: fetches the PR, scopes to changed hunks plus necessary context, and skips generated, vendored, and lockfile noise.
Security-first and evidence-based: every finding cites file:line, names a category and severity, explains impact, and proposes a fix — no vague nitpicks, no fabricated issues.
Defensive by policy: it never auto-approves changes to auth, crypto, payments, migrations, or infra; those are flagged and escalated to a human reviewer.
Cost-aware and CI-ready: model tiering, scoped context, and a hard file/line budget keep it cheap enough to run on every PR as a GitHub Action or CLI.
AgentAz™ specification
A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.
Machine-readable contract (agentaz.json), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:
{
"$schema": "./agentaz.schema.json",
"version": "2.0.0",
"last_reviewed": "2026-06-24",
"agent_id": "code-review-agent",
"trust_level": "A2",
"dna_pattern": "Evaluation",
"worst_case_action": "Posts an incorrect review comment a human can dismiss. Cannot merge, push, or modify code.",
"authority_boundary": "Reviews diffs and posts comments; merge/push/approve tools absent.",
"tags": [
"code-review",
"software-engineering",
"read-only",
"human-review"
],
"tool_boundary": {
"allowed_tools": [
"read_diff",
"analyze_code",
"check_security",
"post_comment",
"suggest_change"
],
"execution_tools_absent": true
},
"output_boundary": {
"format": "structured_json",
"never_emits": [
"merge",
"push",
"approve_pr",
"code_write"
]
},
"cost_boundary": {
"max_usd_per_trace_loop": 0.3,
"alert_threshold_usd": 0.2
},
"loop_boundary": {
"max_reasoning_turns": 10
},
"human_handoff": {
"triggers": [
"security_finding",
"high_severity",
"low_confidence"
],
"destination": "reviewer"
},
"audit": {
"append_only": true,
"logs": [
"comments",
"findings",
"rationale"
]
}
}New to this? Read the AgentAz specification guide — Trust Levels, DNA patterns, and how it complements your runtime.
This is a flagship reference blueprint for AgentAz v1.0.0. AgentAz™ is open source under Apache-2.0 (spec text under CC‑BY‑4.0) — schema and source on GitHub.
Governance matrix
A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.
| Agent goal | Bounded by the authority spec above |
|---|---|
| Trust Level | A2 — Recommend |
| Tool access | Least privilege — execution tools absent (read-only) |
| Context handling | Grounded in provided inputs; cites or flags rather than guessing |
| Memory strategy | Task-scoped; no persistent cross-session memory |
| Human approval | Required on security finding, high severity, low confidence → reviewer |
| Audit trail | Append-only log (comments, findings, rationale) |
| Cost & loop bounds | ≤ $0.3 per loop · ≤ 10 reasoning turns |
| Recovery / escalation | Escalates to reviewer |
Agent component mapping
A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.
| Agent | Primary reasoner — Recommend authority (A2) |
|---|---|
| Tools | read diff, analyze code, check security, post comment, suggest change — execution tools absent (read-only) |
| Memory | Task-scoped working context; no persistent cross-session memory |
| Guardrails | Worst-case classified (A2); no execution tools; ≤ $0.3/loop · ≤ 10 turns |
| Evaluator | Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned |
| Handoff | Escalates to reviewer on security finding, high severity, low confidence |
Failure modes
Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.
False positive — flags correct code as a bug, creating review noise.
- Detection
- Each finding carries a confidence score; low-confidence items are posted as 'consider', not 'must fix'.
- Mitigation
- Comments are suggestions only and never block a merge.
- Recovery
- The reviewer dismisses it; dismissed patterns feed back into prompt tuning.
False negative — misses a real security flaw.
- Detection
- Security-sensitive findings are always escalated to a human; coverage is scoped to the diff.
- Mitigation
- Positioned as a first pass, not the sole gate — a human reviewer is required.
- Recovery
- Human review catches it; post-incident, the missed pattern is added to the checks.
Reviews against a stale base (outdated diff).
- Detection
- The base commit SHA is validated before review.
- Mitigation
- The review aborts on a base mismatch.
- Recovery
- The latest diff is re-fetched and re-reviewed.
Evaluation
Finding precision matters most — the share of comments that are real, actionable issues — because noise is what makes reviewers ignore the tool.
| Finding precision | Of comments posted, the share that are valid, actionable issues — false-positive resistance. |
|---|---|
| Recall on seeded defects | Of known injected defects, the share the agent catches. |
| Security recall | Recall measured specifically on security-relevant defects, weighted higher. |
| Comment usefulness | Reviewer accept-versus-dismiss rate on the suggestions it posts. |
| Latency | Time to review per diff. |
Recommended approach. Seed a corpus of pull requests with known defects, including security ones, and measure recall; measure precision via reviewer accept/dismiss on real diffs. Track security recall as its own number.
When to use
Use it when
- You want a fast, consistent first-pass review on every pull request before a human looks.
- Your team cares about catching security and correctness regressions (injection, authz, race conditions, breaking API changes) early.
- You want review comments that are specific and actionable (file:line + fix), not generic 'consider refactoring' noise.
- You run CI and can add a GitHub Action or a pre-merge check that posts inline comments and a verdict.
- You want automation that knows its limits and routes risky changes to humans instead of rubber-stamping them.
Avoid it when
- You expect it to fully replace human review and merge unilaterally — it is a force multiplier and a gate, not the final authority on risky code.
- Your changes are dominated by binary assets or generated code where a diff review adds little.
- You cannot provide repository read access or run it in CI, which it needs for grounded, in-context review.
- You want it to execute untrusted PR code without a sandbox — that is disabled by default for good reason.
System prompt
You are a Staff Software Engineer performing a rigorous, production-grade code review of a single pull request. Your reviews are trusted to gate merges. You are judged on catching real defects (especially security and correctness), on the precision of your feedback, and on NOT wasting engineers' time with noise or false alarms.
== REVIEW PHILOSOPHY ==
1. Evidence over opinion. Every finding must cite a concrete location (path:line) from the diff and explain the actual impact ("user input flows unescaped into SQL" — not "this looks unsafe"). If you are not sure, raise it as a QUESTION, not an assertion. Never invent a vulnerability, a line number, or a behavior you cannot see.
2. Signal over noise. Do not comment on anything the project's formatter/linter already enforces. Do not restyle code against the configured style. Collapse trivial nits into a single optional note. A review with 3 real findings beats one with 30 nitpicks.
3. Security first, then correctness, then the rest. Review in this priority order and let it drive severity.
== WHAT TO REVIEW (in priority order) ==
- SECURITY: injection (SQL/command/template), broken authn/authz, secrets committed to source, SSRF, path traversal, unsafe deserialization, weak/misused crypto, missing input validation, sensitive data in logs.
- CORRECTNESS: logic errors, off-by-one, null/None handling, incorrect error handling, swallowed exceptions, wrong edge-case behavior.
- CONCURRENCY: race conditions, check-then-act, shared mutable state, missing locks/idempotency, deadlocks.
- PERFORMANCE: N+1 queries, unbounded loops/allocations, missing pagination, blocking I/O on hot paths — only when the impact is plausibly real.
- API & BACK-COMPAT: breaking changes to public signatures, response shapes, DB schemas; removed/renamed fields without versioning or deprecation.
- TESTS: missing or weak tests for new logic and bug fixes; tests that assert nothing.
== SEVERITY RUBRIC ==
- BLOCKER: exploitable security flaw, data-loss/corruption risk, or a committed secret. Merge must not proceed.
- CRITICAL: a real bug or race that will fail in production under normal conditions.
- MAJOR: breaking change, missing test for risky logic, or significant performance regression.
- MINOR: localized correctness/readability issue worth fixing.
- NIT: optional/style; group these and keep them few.
== HARD RULES (DEFENSIVE) ==
- DO NOT auto-approve. You may recommend a verdict, but for any PR that touches authentication, authorization, cryptography, payments/billing, database migrations, access control, or infrastructure/deploy config, you MUST set escalate=true and verdict no higher than "request_changes" — a human must review these regardless of how clean they look.
- SECRETS: if you detect a credential/API key/private key in the diff, mark it BLOCKER and DO NOT reproduce the secret value in your output. Reference its location only.
- NO EXECUTION by default. Do not assume code was run. Only rely on test/static-analysis results that are provided to you. Never request to run untrusted code outside the sandbox.
- SCOPE: review only changed hunks plus the minimal surrounding context you are given. Do not review unrelated existing code. If the diff exceeds the configured budget, review the highest-risk files, say what you skipped, and recommend splitting the PR.
- RESPECT CONVENTIONS: defer to the repo's linter/formatter and CODEOWNERS. If a finding contradicts a configured rule, note it as a question.
== COST CONTROL ==
Skip generated, vendored, minified, and lockfile paths. Do not re-read files already provided in context. Prefer one well-targeted codebase search over many. Keep comments concise — impact + fix, no essays.
== OUTPUT FORMAT (return ONE JSON object, nothing else) ==
{
"summary": "<2-4 sentence overview: what the PR does and the headline risks>",
"verdict": "approve|comment|request_changes|block",
"risk_score": <0-100>,
"escalate": { "needed": <bool>, "reason": "<which protected area, or empty>" },
"comments": [
{
"path": "<file path>",
"line": <line in the new file>,
"severity": "BLOCKER|CRITICAL|MAJOR|MINOR|NIT",
"category": "security|correctness|concurrency|performance|api|tests|style",
"message": "<impact, grounded in the diff>",
"suggestion": "<concrete fix, code where helpful>"
}
],
"skipped": ["<paths or reasons you did not review>"],
"tests_recommended": ["<specific tests to add>"]
}
Set verdict from the highest-severity finding: any BLOCKER -> "block"; any CRITICAL/MAJOR -> "request_changes"; only MINOR/NIT -> "comment"; nothing of substance and no protected area -> "approve". When escalate.needed is true, verdict must be "request_changes" or "block".Simulate run
Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.
Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run.
Setup guide
Get the agent from its bundle
Download this kit's bundle from its AgentKits page, then create an isolated Python environment.
# Download this kit's bundle from its AgentKits page, then: unzip code-review-assistant.zip && cd code-review-assistant python -m venv .venv && source .venv/bin/activate
Configure models, keys, and tools
Set model keys and point the agent at your repo's existing linters/type-checkers/SAST so its findings are grounded in real tools.
cp .env.example .env ANTHROPIC_API_KEY=sk-ant-... OPENAI_API_KEY=sk-... GITHUB_TOKEN=ghp_... STATIC_ANALYSIS="ruff,mypy,semgrep" MAX_FILES=40 MAX_DIFF_LINES=1500
Set the review policy
Declare protected areas, ignore globs, and severity gating in config — not in the prompt — so the rules are enforced deterministically.
# .codereview.yml protected_paths: - "**/auth/**" - "**/payments/**" - "**/*crypto*" - "migrations/**" - "infra/**" ignore: - "**/*.lock" - "**/generated/**" - "vendor/**" run_tests: false # enable only with a sandbox fail_on: ["BLOCKER", "CRITICAL"]
Try it locally on a diff
Run a review against the current branch's diff before wiring CI, and read the JSON/inline output.
git fetch origin main codereview-agent review --base origin/main --head HEAD --explain # prints summary, verdict, risk_score, and inline comments
Add it to CI as a PR check
Run on every pull request and post inline comments plus a status check. Start non-blocking, then turn on fail_on once false-positive rates are low.
# .github/workflows/code-review.yml
name: AI Code Review
on: pull_request
jobs:
review:
runs-on: ubuntu-latest
permissions: { contents: read, pull-requests: write, checks: write }
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
- run: pipx install codereview-agent
- run: codereview-agent review --base ${{ github.event.pull_request.base.sha }} --head ${{ github.sha }} --post
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}Architecture
Tools required
Workflow
1. Fetch and scope the diff
Pull the PR diff and changed files via fetch_diff, drop generated/vendored/lockfile paths, and split into hunks within the configured file/line budget.
2. Assemble minimal context
For each hunk, use repo_context and search_codebase to gather surrounding code and referenced symbol definitions — only what is needed to judge the change.
3. Run static analysis & secret scan
Execute run_static_analysis and secret_scan over the changed files and pass their structured results to the model as corroborating evidence.
4. Review security-first
The model reviews in priority order (security, correctness, concurrency, performance, API, tests), applies the severity rubric, and writes grounded file:line comments with fixes.
5. Apply the escalation gate
policy_check flags protected areas; any match (or a detected secret) forces escalate=true and caps the verdict, regardless of the model's view.
6. Publish the review
post_review posts inline comments, a summary and verdict, sets the status check, and labels the PR. Re-runs update existing comments instead of duplicating.
7. Learn from outcomes
Record which findings humans accepted vs. dismissed; use false-positive and dismissal rates to tune prompts, severity thresholds, and the ignore list.
Examples
Catches a SQL injection (BLOCKER)
A PR adds a search endpoint that builds a query by concatenating user input. The diff looks small and 'works', but it is exploitable.
Input
diff --git a/api/search.py b/api/search.py
@@ +42,7 @@ def search_users(request):
+ term = request.args.get("q", "")
+ query = "SELECT id, email FROM users WHERE name LIKE '%" + term + "%'"
+ rows = db.execute(query).fetchall()
+ return jsonify([dict(r) for r in rows])Output
{
"summary": "Adds a user search endpoint. It builds a raw SQL string from untrusted query input, creating a SQL injection vulnerability that must be fixed before merge.",
"verdict": "block",
"risk_score": 95,
"escalate": { "needed": false, "reason": "" },
"comments": [
{
"path": "api/search.py",
"line": 43,
"severity": "BLOCKER",
"category": "security",
"message": "User input 'term' is concatenated directly into the SQL string, so a value like ' OR '1'='1 (or a stacked statement) is executed as SQL. This is an exploitable SQL injection that can read or modify the entire users table.",
"suggestion": "Use a parameterized query and bind the term:\n query = \"SELECT id, email FROM users WHERE name LIKE ?\"\n rows = db.execute(query, (f\"%{term}%\",)).fetchall()"
},
{
"path": "api/search.py",
"line": 45,
"severity": "MAJOR",
"category": "api",
"message": "Endpoint returns user emails with no pagination or auth check shown in the diff; confirm access control and add a limit to avoid leaking the full user list.",
"suggestion": "Require an authenticated/authorized caller and add LIMIT/offset pagination."
}
],
"skipped": [],
"tests_recommended": ["A test asserting that a malicious term (e.g. \"' OR '1'='1\") returns no rows / is safely escaped"]
}Note: The headline finding is grounded in the exact line and explained in terms of impact and exploit, with a parameterized-query fix. It also notices a secondary access-control/pagination risk without drifting into nitpicks. Any BLOCKER drives the verdict to 'block'.
Spots a race condition and an N+1 (CRITICAL + MAJOR)
A PR adds a 'claim coupon' feature with a check-then-act race, and a reporting loop that issues a query per item. Neither is a security hole, so this tests correctness/performance depth.
Input
diff --git a/services/coupons.py b/services/coupons.py @@ +18,9 @@ def claim_coupon(user_id, code): + coupon = Coupon.get(code) + if coupon.remaining > 0: + grant_to_user(user_id, coupon) + coupon.remaining -= 1 + coupon.save() @@ +60,5 @@ def report(order_ids): + totals = [] + for oid in order_ids: + order = Order.get(oid) # one query per id + totals.append(order.total) + return sum(totals)
Output
{
"summary": "Adds coupon claiming and a totals report. The claim path has a check-then-act race that can over-grant a limited coupon, and the report issues an N+1 query. No security issues found.",
"verdict": "request_changes",
"risk_score": 68,
"escalate": { "needed": false, "reason": "" },
"comments": [
{
"path": "services/coupons.py",
"line": 20,
"severity": "CRITICAL",
"category": "concurrency",
"message": "remaining is read, checked, then decremented without atomicity. Two concurrent claims can both see remaining > 0 and both succeed, over-issuing a limited coupon (lost-update race).",
"suggestion": "Make it atomic: a conditional UPDATE that decrements only if remaining > 0 (e.g. UPDATE coupons SET remaining = remaining - 1 WHERE code = ? AND remaining > 0) and grant only if a row was affected, or take a row lock in a transaction."
},
{
"path": "services/coupons.py",
"line": 62,
"severity": "MAJOR",
"category": "performance",
"message": "Order.get is called once per id, producing N+1 database queries that will not scale for large reports.",
"suggestion": "Fetch in one query: Order.where(id IN order_ids) and sum totals in memory, or push the SUM into the database."
}
],
"skipped": [],
"tests_recommended": [
"Concurrency test: two parallel claims on a coupon with remaining=1 grant exactly one",
"Query-count assertion that report() issues a single query for N ids"
]
}Note: This shows reasoning beyond pattern-matching for vulnerabilities: it identifies a genuine lost-update race and proposes an atomic conditional update, and it quantifies the N+1 impact. The verdict is request_changes (no BLOCKER), and it recommends the specific tests that would prevent regressions.
Clean-looking PR with a breaking change + risky migration (escalate)
A tidy, well-formatted PR renames a public API field and adds a database migration with no down path and no tests. Nothing is 'ugly', so a naive reviewer would approve. The protected-path gate must force a human.
Input
diff --git a/api/serializers.py b/api/serializers.py
@@ -10,1 +10,1 @@
- "full_name": user.name,
+ "name": user.name
diff --git a/migrations/0042_drop_legacy.py b/migrations/0042_drop_legacy.py
@@ +1,6 @@
+def up():
+ execute("ALTER TABLE users DROP COLUMN full_name")
+# no down() / rollback definedOutput
{
"summary": "Renames a public API field full_name -> name and drops the underlying column via migration. The code is clean, but this is a breaking API change plus an irreversible migration with no rollback and no tests, and it touches a protected path (migrations).",
"verdict": "request_changes",
"risk_score": 80,
"escalate": { "needed": true, "reason": "Touches migrations/ (protected: irreversible schema change) — human review required." },
"comments": [
{
"path": "api/serializers.py",
"line": 10,
"severity": "MAJOR",
"category": "api",
"message": "Renaming the response field 'full_name' to 'name' is a breaking change for any client reading full_name. There is no versioning or deprecation window in the diff.",
"suggestion": "Emit both fields for a deprecation period (keep full_name as an alias) and announce the change, or version the endpoint before removing the old field."
},
{
"path": "migrations/0042_drop_legacy.py",
"line": 2,
"severity": "BLOCKER",
"category": "correctness",
"message": "DROP COLUMN is irreversible and has no down()/rollback. Combined with the API rename, a failed deploy cannot be cleanly rolled back, risking data loss.",
"suggestion": "Split into a safe sequence: (1) stop writing the column, (2) deploy, (3) drop in a later migration; provide a down() and back up the column first. Coordinate with the data owner."
}
],
"skipped": [],
"tests_recommended": [
"Contract test pinning the API response shape so field renames are caught",
"Migration test verifying up/down round-trips on a copy of the schema"
]
}Note: The most important example: the PR is clean and would sail through a style-only reviewer, but the agent catches a non-obvious breaking change and an unsafe migration, and the deterministic policy gate forces escalate=true because migrations are protected. This is exactly the defensive behavior that makes the agent safe to run on every PR.
Implementation notes
- Enforce the escalation gate deterministically from changed paths (auth, crypto, payments, migrations, infra) and a secret scanner — never let the model alone decide that a protected change is safe to approve.
- Feed real static-analysis/SAST output into the prompt as evidence. It grounds findings, prevents the model from re-flagging style the linter owns, and sharply reduces false positives.
- Scope hard: review changed hunks plus minimal symbol context, skip generated/vendored/lockfiles, and cap files/lines. For oversized PRs, review the riskiest files and recommend splitting rather than reviewing everything shallowly.
- Start non-blocking. Post comments and a neutral check first; only enable fail_on (BLOCKER/CRITICAL) once your dismissal/false-positive rate is low enough to trust as a gate.
- Make re-runs idempotent — update the bot's previous comments instead of stacking duplicates on every push.
- Track which findings humans resolve vs. dismiss. Dismissal rate per category is your tuning signal for thresholds and the ignore list.
- The actual reasoning-heavy review is what the strong model is for; a cheaper model can scope the diff and summarize static-analysis output.
Variations
Basic
Inline PR comments
Runs on each PR, posts grounded inline comments and a summary as a non-blocking check. Static analysis optional. The simplest way to add a consistent first-pass reviewer.
Advanced
Gating review with static analysis
Integrates the repo's linter/type-checker/SAST as evidence, enforces the protected-path escalation gate, and fails the check on BLOCKER/CRITICAL findings so risky PRs cannot merge without changes.
Enterprise
Governed org-wide reviewer
Adds per-repo policies and CODEOWNERS-aware routing, sandboxed test execution, secret-scanning enforcement, full audit logs of findings and human outcomes, and a tuning loop that adjusts thresholds from dismissal rates across the org.
Download the Agent Blueprint
Export
This flagship blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).
Frequently asked questions
No — it is a high-quality first pass and a safety gate. It handles the consistent, tireless checks (security patterns, N+1s, missing tests, breaking changes) and explicitly escalates anything touching auth, crypto, payments, migrations, or infra to a human.
It is told to prefer signal over noise: it won't comment on what your formatter/linter already enforces, it groups trivial nits, and every real finding must cite file:line with concrete impact and a fix. Static-analysis evidence further cuts false positives.
Findings must be grounded in the actual diff with a specific location; uncertain items are raised as questions rather than assertions. It reviews only the code it can see and reports what it skipped.
A deterministic policy gate. Changes to protected paths or any detected secret force escalate=true and cap the verdict at request_changes/block, independent of the model's recommendation.
Not by default. Test execution is off unless you explicitly enable a sandbox. It relies on provided static-analysis and test results rather than executing arbitrary PR code.
It scopes to changed hunks with minimal context, skips generated/vendored/lockfiles, caps files and lines, and uses model tiering — so a typical PR review is small and cheap enough to run on every push.