Back to library

AI-Assisted Code Review Standards

AI code review tools (GitHub Copilot code review, CodeRabbit, Sourcery, DeepCode) can catch real issues — but they catch different things than human reviewers, and they miss different things too. This skill defines the standards that keep both working together.

Context

What AI code review tools are good at:

CategoryExamples
Syntax and styleCode style violations, naming conventions, formatting
Common bugsNull dereferences, off-by-one errors, unused variables
Security patternsSQL injection, XSS, hardcoded credentials
Boilerplate issuesMissing error handling, unclosed resources
DocumentationMissing docstrings, stale comments

What AI code review reliably misses:

CategoryWhy it's missed
Business logic correctnessThe AI doesn't know what the feature is supposed to do
Architecture decisionsWhether this approach fits the system's overall design
Performance at scaleWhether this query will work with 10M rows
Security in contextWhether this API creates a product-level risk
Test coverage qualityWhether the tests actually validate meaningful behaviour
Prompt/AI output correctnessWhether the AI system will produce correct outputs

Step 1 — Define the code review context

CODE REVIEW CONTEXT:

Code type: [Frontend / Backend / AI feature / Infrastructure]

Code origin: [Human / AI-generated / Mixed]

Stakes level: [Critical / High / Medium / Low]

AI review tool(s): [tool name(s)]

Human review: [Yes — N reviewers / No / Occasional]

Step 2 — Define the AI review scope

Always expected from AI review:
  • Style and formatting violations
  • Type errors and type safety violations
  • Null/undefined access patterns
  • Unused imports and variables
  • Missing error handling on async operations
  • Hardcoded secrets or credentials
  • Common security anti-patterns
  • Missing input validation
  • For AI-specific code:
  • API keys stored in environment variables
  • Parameterised prompt templates (no string concatenation with user input)
  • Error handling for malformed model output
  • Retry logic with exponential backoff
  • Token/cost limits enforced
  • Step 3 — Define human review responsibilities

    Always human-reviewed:
  • Does this code implement what the spec says?
  • Does the architecture fit the existing system?
  • Are the tests actually testing meaningful behaviour?
  • Are there performance implications at scale?
  • Does this introduce a new security surface?
  • Review pairing by stakes level:
  • Critical/High: AI review → Human reviewer 1 (business logic) → Human reviewer 2 (architecture)
  • Medium: AI review → Human reviewer (business logic only)
  • Low: AI review primary → Human spot-check
  • Step 4 — Define the AI code review policy

  • All PRs must pass AI review before human review
  • AI review passing ≠ code is correct, safe, or performant
  • AI-generated code labelled "[AI-assisted]" in PR title
  • False negatives: add to human checklist + new lint rules
  • False positives: reviewer adds explanation, team lead reviews weekly
  • Step 5 — AI feature code review standards

  • Prompt changes = deployments, not text edits
  • Agent guardrails implemented as specified — no "we'll add later"
  • Model version pinned — not "latest"
  • AI output never directly executed as code without sanitisation
  • AI output used in HTML is escaped (no XSS via model output)
  • Quality check before delivering

    AI review scope explicitly states what it is NOT responsible for
    Human review responsibilities are specific
    AI-generated code policy is defined
    Prompt change policy treats prompts as deployments
    Output injection risks covered
    Suggested next step: Add the AI feature code review checklist to your PR template.