AI-Assisted Code Review Standards

AI code review tools (GitHub Copilot code review, CodeRabbit, Sourcery, DeepCode) can catch real issues — but they catch different things than human reviewers, and they miss different things too. This skill defines the standards that keep both working together.

Context

What AI code review tools are good at:

Category	Examples
Syntax and style	Code style violations, naming conventions, formatting
Common bugs	Null dereferences, off-by-one errors, unused variables
Security patterns	SQL injection, XSS, hardcoded credentials
Boilerplate issues	Missing error handling, unclosed resources
Documentation	Missing docstrings, stale comments

What AI code review reliably misses:

Category	Why it's missed
Business logic correctness	The AI doesn't know what the feature is supposed to do
Architecture decisions	Whether this approach fits the system's overall design
Performance at scale	Whether this query will work with 10M rows
Security in context	Whether this API creates a product-level risk
Test coverage quality	Whether the tests actually validate meaningful behaviour
Prompt/AI output correctness	Whether the AI system will produce correct outputs

Step 1 — Define the code review context

CODE REVIEW CONTEXT:
Code type: [Frontend / Backend / AI feature / Infrastructure]
Code origin: [Human / AI-generated / Mixed]
Stakes level: [Critical / High / Medium / Low]
AI review tool(s): [tool name(s)]
Human review: [Yes — N reviewers / No / Occasional]

Step 2 — Define the AI review scope

Always expected from AI review:

Style and formatting violations

Type errors and type safety violations

Null/undefined access patterns

Unused imports and variables

Missing error handling on async operations

Hardcoded secrets or credentials

Common security anti-patterns

Missing input validation

For AI-specific code:

API keys stored in environment variables

Parameterised prompt templates (no string concatenation with user input)

Error handling for malformed model output

Retry logic with exponential backoff

Token/cost limits enforced

Step 3 — Define human review responsibilities

Always human-reviewed:

Does this code implement what the spec says?

Does the architecture fit the existing system?

Are the tests actually testing meaningful behaviour?

Are there performance implications at scale?

Does this introduce a new security surface?

Review pairing by stakes level:

Critical/High: AI review → Human reviewer 1 (business logic) → Human reviewer 2 (architecture)

Medium: AI review → Human reviewer (business logic only)

Low: AI review primary → Human spot-check

Step 4 — Define the AI code review policy

All PRs must pass AI review before human review

AI review passing ≠ code is correct, safe, or performant

AI-generated code labelled "[AI-assisted]" in PR title

False negatives: add to human checklist + new lint rules

False positives: reviewer adds explanation, team lead reviews weekly

Step 5 — AI feature code review standards

Prompt changes = deployments, not text edits

Agent guardrails implemented as specified — no "we'll add later"

Model version pinned — not "latest"

AI output never directly executed as code without sanitisation

AI output used in HTML is escaped (no XSS via model output)

Quality check before delivering

AI review scope explicitly states what it is NOT responsible for

Human review responsibilities are specific

AI-generated code policy is defined

Prompt change policy treats prompts as deployments

Output injection risks covered

Suggested next step: Add the AI feature code review checklist to your PR template.