Skip to content
Code ReviewSkillPro

Want AI to review your code and find issues before you ship?
How about 4 different models at once?

Code Review is the slash-command skill that turns "review the last commit" from a 90-minute manual orchestration into one command. Type /code-review and the skill spawns multiple frontier models in parallel, each reading the diff through a different lens, logic · contracts · architecture · security. Pre-analysis bundles Diagnose findings, blast radius, and a RED/YELLOW/GREEN risk rating into every reviewer’s prompt, so they are not staring at a raw diff. No-deferral policy: every confirmed finding is fixed in-session or written as a tracked task. Pro tier.

AI alone

  • Agent commits without a second look
  • Same model wrote it AND reviewed it
  • "Looks fine to me" is the only check
  • Findings deferred to "we’ll look at it later"
  • Manual multi-model review = 90-minute orchestration
  • No structured verdict per finding

Twira Code Review powertool

  • Multiple frontier models per commit, one command
  • Each model reviews through a different lens
  • Pre-analysis bundles detector + impact context
  • No-deferral: fix in-session or create a task
  • Four-verdict synthesis, confirmed · disputed · noted · dismissed
  • Structured per-commit report for the PR

Your AI commits. Multiple frontier models review. Nothing ships unreviewed.

/code-review

Slash command in your agent. Triggers multi-model review of the last commit.

multi-commit

Pass a range or "all pending", walks every commit in sequence.

verdicts

Four per finding, confirmed · disputed · noted · dismissed. Recorded.

no-deferral

Every confirmed finding fixed in-session or tracked as a task. No "look later".

pre-analysis

Diagnose findings + blast radius + risk rating bundled into every reviewer’s prompt.

You ask

/code-review

Twira instantly

  • skill identifies the last commit (a3f12c8)
  • runs Diagnose + blast_radius on the changed files (pre-analysis)
  • spawns claude-opus, gemini-3-1-pro, grok-4, codex in parallel, each with a different lens
  • the agent synthesises across all four; each finding gets a verdict
  • confirmed findings are fixed in-session or written as masterplan tasks

One structured report per commit. No untracked work. Nothing shipped on "looks fine to me".

Without Twira
With Twira
one model reviewing its own code
multiple frontier models per commit
"looks fine"
structured per-finding verdicts
findings deferred forever
fix in-session or task it
manual 90-min orchestration
one slash command
no record of what was checked
report per commit, chained to next
raw diff
pre-analysis: detectors + impact + risk

How it’s installed

Installed as a slash-command skill at `.claude/skills/code-review/SKILL.md` (and `~/.claude/skills/code-review/` globally). Triggers on `/code-review`, *"review my code"*, *"review the last commit"*, *"review commit abc1234"*, *"review abc1234..def5678"*, or *"review all pending"*. Skill orchestrates the agent through the workflow; `twira review run <commits>` is the underlying CLI that spawns the reviewers.

When you reach for it

  • End of a coding session, before commit lands in main, `/code-review` catches the issue the agent missed and you would have shipped.
  • Multi-commit feature branch about to merge, *"review all pending"* walks every commit since the last review and gives a per-commit report.
  • Security-sensitive change (auth, billing, anything money-touching), the security-lens reviewer catches things the logic-lens reviewer misses.
  • Pre-PR sanity check, review the commit locally before you push, then attach the structured report to the PR description.
  • Periodic catch-up, *"review all pending"* on Monday morning sweeps everything from last week.
  • Pairing with Diagnose, the deterministic detectors run as pre-analysis on every reviewer prompt; you get *both* deterministic findings AND multi-model judgement in one report.

See it work

$ /code-review     # typed in your AI agent
⏳ Running multi-model review on commit a3f12c8, typically 2-5 minutes (reviewers run in parallel; wall-clock = slowest reviewer) ✓ logic reviewer (claude-opus) complete · 3m 12s ✓ api_contract reviewer (gemini-3-1-pro) complete · 3m 48s ✓ architecture reviewer (grok-4) complete · 4m 21s ✓ security reviewer (codex) complete · 4m 02sCode Review: commit a3f12c8Overall: issues foundReviewers: claude-opus, gemini-3-1-pro, grok-4, codexConfirmed (fixing now / task created): • [HIGH] race condition on session-token refresh, src/auth/session.rs:142 Action: Fixed in this session • [MEDIUM] missing input validation on /api/billing endpoint Action: Task created: #BILL-204Dismissed (false positive): • "unused import", used in feature-gated code path✓ Total elapsed: 4m 21s

Code Review burns time and tokens

Each multi-model review is typically 2–5 minutes per commit and bills across every provider you invoke (your keys, your bills). Use it on commits that matter, security-sensitive changes, breaking refactors, the kind you would otherwise pair-review on. Not on every typo fix.

Technical depth, for engineers who want it

In your editor

You already do this, for code reviewed by humans. Open a PR, the reviewer reads it, files comments, you fix or push back, eventually it merges. Code reviews on human commits are standard practice. On AI-written commits, the same rigour rarely happens, the agent shipped it, you skimmed it, you trusted it.

What Code Review does

Code Review is a slash-command skill installed by `twira init`. Type `/code-review` (or *"review the last commit"*, or *"review commit abc1234"*) in your AI harness; the skill walks the workflow. It picks the commit, runs Diagnose + blast_radius on the changed files as pre-analysis, spawns multiple frontier models in parallel via the underlying `team review` MCP tool (each with a different review lens, logic, contracts, architecture, security), then drives the synthesis: four verdicts per finding (confirmed · disputed · noted · dismissed), enforced no-deferral policy (every confirmed finding either fixed in-session or written as a masterplan task), and a structured per-commit report you can attach to the PR description.

How it actually works

Code Review is the Twira workflow that turns "a multi-model code review on the last commit" from a 90-minute manual orchestration into a single slash command. You type /code-review in your AI agent; the skill orchestrates everything from there, picks the commit, runs deterministic pre-analysis on the changed files, spawns multiple frontier models in parallel (each with a different review lens), waits for them to complete, then walks the agent through a strict no-deferral synthesis that ends with every confirmed finding either fixed in this session or tracked as a concrete task in the masterplan.

Triggered by a slash command, not by an MCP call. This is a Skill, not an MCP tool, it lives at .claude/skills/code-review/SKILL.md inside your project and at ~/.claude/skills/code-review/ globally so you can use it in any project. When you type /code-review in Claude Code, Cursor, or any other harness that supports skills, the skill text becomes the agent’s instructions for the next turn: which commit to review, how to invoke the underlying CLI, how to interpret the JSON output, what verdicts to record, what format to present back to you. Trigger phrases are flexible, "review my code", "review the last commit", "review commit abc1234", "review abc1234..def5678", "review all pending" all kick off the same workflow.

Different lenses per reviewer is the whole point. A single AI reviewer tends to catch one class of issue and miss others. Code Review spawns multiple reviewers in parallel, each instructed to read the commit through a specific lens: logic (correctness, edge cases, off-by-one, race conditions), api_contract (type safety, error handling, missing validation), architecture (pattern consistency, design fit), and security (vulnerabilities, unsafe patterns, invariant violations). The lenses are distributed across the available models, with four or more models each gets one lens; with three, the first gets two and the rest get one; with two, each gets two; with one, that model gets all four. Each reviewer returns structured JSON findings with severity, file, line numbers, and a description. The agent then synthesises across all of them using its own full project context, which catches a fifth thing: false positives one reviewer raised that the others (or the agent itself) would correctly dispute.

Four-verdict synthesis, confirmed, disputed, noted, dismissed. After the reviewers finish, the agent reads the referenced file and line for each finding and classifies it: confirmed (real issue, agent agrees), disputed (reviewer flagged it but the agent has context that contradicts, the rationale is recorded), noted (plausible but uncertain, flagged for human review), dismissed (false positive with a brief reason). Verdicts are recorded against the finding ID via twira review verdict <id> <verdict> --reason "..." so the next reviewer of the same commit sees the prior decision history rather than re-litigating it.

No-deferral policy, enforced in the skill text, not optional. The skill instructions are explicit: every confirmed finding MUST be resolved one of two ways, fix it immediately in this session, or create a tracked task via masterplan with the exact fix described, file path, line numbers, and clear instructions. The skill bans phrases like "we should look at this later" or "noted for future consideration" without a concrete task. A multi-model code review is only valuable if its findings actually become work; without the deferral ban, review findings rot in chat history.

Pre-analysis is bundled into every reviewer prompt. Before the reviewers see the diff, Twira runs Diagnose against the changed files (65 deterministic detectors across four profiles, Bug, Health, Security, DOM), computes the blast radius (every dependent symbol the change reaches), and pulls a RED / YELLOW / GREEN risk rating from git history for each touched file. All of that is stitched into the prompt every reviewer receives. So the reviewer is not staring at a raw diff, they see "this commit touches files X and Y, has these existing detector findings, blast-radius reaches N callers, risk rating is RED on file X." Better context produces better reviews.

Structured report back to you, every commit gets the same shape. The skill ends the workflow with a per-commit report: an overall verdict (clean / minor issues / issues found / critical issues), the model list, confirmed issues with action (fixed in session or task ID), noted items, disputed items with rationale, dismissed items with reason. Same shape every time. Easy to skim, easy to attach to a PR description, easy to write a script around if you want to enforce policy at the PR level.

Multi-commit batching, review the whole branch at once. Pass a range (abc1234..def5678) or --all-pending and the skill walks every commit in sequence, presenting each report and asking you whether to continue before moving on. Useful for weekly review cadence, pre-merge sweeps on feature branches, or the "I have not reviewed in three days, what is in the queue?" check. The backfill action goes one step further, it picks up every unreviewed commit up to a configurable age limit and reviews them in one go.

Underlying primitives, the MCP review tool tracks the bookkeeping. The skill drives the workflow; the underlying review tool tracks the state. Three MCP actions on the review tool: status (pending vs completed counts plus the latest pending commits), findings (the JSON output for a specific commit), record (mark the current HEAD as a candidate for review later, e.g. at end-of-session). The corresponding CLI surface (twira review) has more, record / run / status / backfill / verdict / show / reset, for cases where the skill is not the right driver. The run action is CLI-only because it is long-running and would block the MCP thread.

Auto-recording via the post-commit hook. When Twira is installed in a repository, the post-commit git hook records every new commit into the review queue automatically. The review tool’s status action shows the queue; the skill processes it. You do not have to remember to record commits, they accumulate, and any time you type /code-review the skill picks up the next pending one. Use twira review status (or the MCP status action) to see what is waiting.

Tier is Pro. Multi-model code review is gated by the code_review feature flag. The skill itself is installed on every install; the actual twira review run execution requires Pro because it spawns paid frontier-model reviewers under the hood. Status and findings queries on existing reviews also flow through the Pro gate, so a downgrade from Pro stops further reviews running but does not delete past data.

Setup is zero. twira init installs the skill automatically to .claude/skills/code-review/ inside the project and to ~/.claude/skills/code-review/ globally. Trigger with /code-review or any of the natural-language phrases. The first run prompts for any provider CLIs missing on the system (it shells out to claude, codex, gemini, and the xAI client). The reviewer timeout is 5 minutes per reviewer; reviewers run in parallel, so the wall-clock per commit matches the slowest of them, not the sum.

What it isn’t

  • Triggered by slash command, not callable from the agent loop directly. The agent runs the underlying review CLI indirectly via the skill text.
  • Pro tier. The orchestration is included; the actual token costs are billed across the providers you invoke (your keys, your bills).
  • Requires the underlying provider CLIs installed, claude, codex, gemini, the xAI client. The first run prompts for any missing CLI.
  • Typically 2–5 minutes per commit. Reviewers run in parallel with a 5-minute per-reviewer timeout, so wall-clock matches the slowest reviewer. Multi-commit "review all pending" walks the workflow per commit sequentially.

One install. Your agent will know the difference in the first session.

$ curl -fsSL twira.com/install.sh | sh
Code Review, Tools · Twira