Hydra tutorial series — Part 2: The three pipelines
Build, code review and security review. Who does what, in what order, and how a label state machine stitches the whole ride together. The second of six short modules.
In part 1 you saw that Hydra has four personas. In this part we look at the pipeline: how those personas work on one issue one after another, which labels mark the transitions, and when the pipeline diverts to needs-input. By the end you'll know Hydra's label state machine inside out and you'll be able to steer an issue back to the right phase if it gets stuck somewhere.
Three pipelines, one pipeline
Hydra technically has three pipelines: build, code-review and security-review. But in practice they always run in the same order, automatically driven by labels. Under the hood it's one state machine:
ready-to-build (or <prefix>-ready-to-build on a dev workstation — see part 5)
↓
build:queued → build:running → build:pass
↓
code-review:queued → :running → :pass / :fail
↓ (always traversed — even on :fail)
security-review:queued → :running → :pass / :fail
↓
decision:
both reviews :pass + 0 fixes → done (Axel skipped)
both reviews :pass + ≥1 fix → applier:queued → :running → :pass / :fail
one of the reviews :fail → needs-input (Axel skipped, human decides)
The pipeline has one important rule: every outcome after review is terminal. Hydra doesn't automatically fix in a loop. If it succeeds in one run, great. If not, the issue lands on needs-input and a human has to decide the next step. More on that in part 6.
Pipeline 1: Build (Al Gorithm, Haiku)
The builder receives the change, a clean clone of the target repo, and a turn budget. No review history, no feedback from earlier runs. He implements the tasks from tasks.md, runs the quality suite, opens a draft PR.
Order within the build phase:
- Implementation — reads
proposal.md,design.mdandtasks.mdfromopenspec/changes/<slug>/in the target repo, writes code. - Quality checks — PHPCS, PHPMD, Psalm, PHPStan, ESLint, Stylelint,
composer audit,npm audit. - PHPUnit + Newman — backend and API tests.
- Browser tests — Playwright MCP walks through the UI.
fix-quality/fix-browser— if checks go red, Al Gorithm gets one or two attempts to mechanically repair. This is a pre-review fixup, not a review loop.- Verdict —
build:pass(PR has been opened) orbuild:fail(broken build,needs-input).
Important: Al Gorithm runs on Haiku. Reason from part 1: he follows patterns, he doesn't judge. By putting him on Haiku we keep the Sonnet budget free for the reviewers.
Pipeline 2: Code review (Juan Claude van Damme, Sonnet)
Once build:pass is set, the supervisor automatically queues code-review:queued. Juan Claude picks it up:
- Reads only the PR diff (
HYDRA_REVIEW_SCOPE=diff, ADR-020). That keeps the scope manageable and ensures every line he comments on is actually touched in this PR. - Walks the ADR library and gate skills (see part 3).
- For mechanical, in-scope errors he may fix himself (ADR-021: bounded-fix scope). Missing PHPCS headers? He adds them. Awkward variable name? Not his department.
- For each finding he writes an inline comment on the PR with prefix
[fixed:...]or[unfixed:...]. - He commits and pushes his fixes to the feature branch and sets
code-review:passorcode-review:failon the issue.
Juan doesn't write out his fixes_applied[] and unfixed[] as JSON himself — the orchestrator only bundles them into one round file after the next step (Clyde). See "How verdicts are persisted" below.
Pipeline 3: Security review (Clyde Barcode, Sonnet)
Right after code review (whether :pass or :fail), the supervisor queues security-review:queued. Clyde Barcode runs on the PR state after Juan Claude's fixes:
- Same shape as code review, plus Semgrep and pattern matching on CWE classes (SQL injection, XSS, path traversal, hardcoded secrets, …).
- May also push bounded fixes in PR mode.
- Writes
security-review:passorsecurity-review:fail.
Why sequential and not parallel? Because Clyde reviews what Juan Claude left behind. Parallel would mean Clyde looks at pre-fix code — he would then find security issues that Juan Claude had already cleaned up and you'd have to reconcile verdicts. Sequential is simpler and clearer.
How verdicts are persisted
After a complete review round (Juan + Clyde back to back), the orchestrator bundles both verdicts into one file on the feature branch: openspec/changes/<slug>/reviews/<round>.json in the target repo. Rounds number sequentially — 1.json after the first pass, 2.json after a retry:queued cycle, and so on (just ls reviews/*.json | wc -l plus one). That file holds the fixes_applied[] and unfixed[] from both reviewers combined — which is what Axel Pliér will read later. An aggregated status snapshot runs alongside in openspec/changes/<slug>/hydra.json (see part 6 for the structure and how to read it).
The applier: Axel Pliér's binary gate
With both reviews in, Hydra has three options:
-
Both reviews
:passand Juan and Clyde together pushed 0 fixes.
Nothing changed after the original build, so no new risks. Axel is skipped. Issue getsdone. Saves tokens. -
Both reviews
:passand ≥ 1 fix was pushed.
The code has changed since the original build. Before we trust the final state, the orchestrator reruns the mechanical gates (PHPCS, Psalm, PHPStan, ...). If those pass, Axel Pliér is up. He reads:- the final diff,
- the
fixes_applied[]+unfixed[]from both reviews, - all inline comments on the PR.
And gives one binary answer:
{pass: true}or{pass: false, blocking: [...]}. He has no Write or Edit tools. He judges, he doesn't intervene. -
One of the reviews
:fail.
The reviewers are the authority. Axel is skipped. Issue goes toneeds-input. A human decides whether this warrants aretry:queuedorrebuild:queued(see part 6).
Labels: one source of truth
Hydra is stateless. The whole state machine lives on the issue labels on GitHub. The supervisor (the daemon) reads the labels every few seconds and decides what to do. If a container crashes, the supervisor looks at the labels again and picks up where it left off.
Per stage there are four possible states: :queued, :running, :pass, :fail. They always live on the issue, never on the PR. That's a deliberate choice: the issue is the unit of work, the PR is an artefact of a build cycle.
Alongside the stage labels there are metadata labels that live next to them:
| Label | Meaning |
|---|---|
openspec | Issue follows the OpenSpec working model |
yolo | May auto-merge as long as all gates are green — no human approve needed |
agent-maxed-out | Persona has used up its turn budget; output may be incomplete |
needs-input | Hydra has stopped the pipeline, human's turn |
retry:queued versus rebuild:queued
Two human-triggered recovery labels that reach back into the pipeline. Both single-shot (no loop):
retry:queued— the existing PR stays. Hydra builds an ephemeralfeedback.md(in a/tmp/hydra-retry-XXXXXX/working directory, not committed) containing theunfixed[]findings + applier blockers, mounts it as/workspace/feedback.mdin the builder container, and dispatches Al Gorithm inHYDRA_MODE=fixscoped to exactly the flagged files. Cheap. Suitable for: "the reviewers were right, builder just needs to touch up a few things".rebuild:queued— existing PR closes, branch resets todevelopment, all cycle labels removed, back tobuild:queued. Expensive. Suitable for: "Al Gorithm's approach was wrong, we're starting over with the same change".
Rule of thumb: always go for the cheapest recovery first. First gh pr update-branch <N> (merge development → PR), then retry:queued, only then rebuild:queued. Part 6 goes into this in more depth — including the label cleanup steps required before applying either trigger (removing needs-input and fail labels first, otherwise the supervisor won't pick up the queue entry).
Test yourself
Four short questions to check whether you've understood this part. Stuck? Click Hint. Curious about the answer? Click Answer.
1. In what order do the three pipelines run, and why is security review sequential after code review and not parallel?
Hint
Think about which version of the code Clyde Barcode sees: before or after Juan Claude's fixes?
Answer
The order is build → code-review → security-review (and after that applier or needs-input).
Security review runs after code review so Clyde Barcode sees the PR state after Juan Claude's bounded fixes. Parallel would mean Clyde looks at pre-fix code and finds security issues Juan had already cleaned up. You'd then have to reconcile verdicts. Sequential is simpler and keeps the "who reviews what" boundary sharp.
2. What's the difference between build:fail and code-review:fail in terms of follow-up?
Hint
In one of them the pipeline stops immediately, in the other it continues to the next stage. Which is which, and why?
Answer
build:fail= the PR has not been (correctly) opened, there's nothing to review. Issue goes straight toneeds-input. Pipeline stops.code-review:fail= the PR exists, Juan Claude rejected it, but the pipeline still continues tosecurity-review(always, even on:fail). Only after security review is the decision made: at least one review:fail→ applier skipped →needs-input.
Reason for the difference: without a PR there's no work to judge; with a PR we want both reviewers to surface what's there so a human has a complete picture later.
3. In which situation is Axel Pliér (the applier) skipped, and why is that a deliberate choice?
Hint
There are two scenarios. One revolves around "has anything changed since the original build?", the other around "who has the final word?"
Answer
Two scenarios:
- Both reviews
:passand 0 fixes pushed by Juan/Clyde. The code is identical to what the builder delivered; there's nothing new to judge. Axel skipped →done. Saves Sonnet tokens. - One of the reviews
:fail. The reviewers are the authority; their rejection trumps the applier. Axel skipped →needs-input.
So Axel only runs when fixes were pushed and both reviews passed — then he judges whether those fixes make the final state good.
4. When do you pick retry:queued and when rebuild:queued?
Hint
The pivot is: was Al Gorithm's build approach sound? If yes → cheap path. If no → start over.
Answer
retry:queuedwhen the builder approach was fundamentally right and the reviewers had legitimate, local findings. The existing PR stays; the builder getsfeedback.mdand fixes scoped to the flagged files. Cheap, single-shot.rebuild:queuedwhen the build approach itself was wrong — wrong implementation, stub methods, requirements missed. PR closes, branch resets todevelopment, all cycle labels removed, back tobuild:queued. Expensive.
Rule of thumb: always cheapest first — gh pr update-branch → retry:queued → only then rebuild:queued.
Next step
Now that you can see the pipeline skeleton, in part 3 we dive into the quality gates — the mechanical referees that keep all three personas in line.
