AcademytutorialHydra tutorial series — Part 2: The three pipelines

Hydra tutorial series — Part 2: The three pipelines

Build, code review and security review. Who does what, in what order, and how a label state machine stitches the whole ride together. The second of six short modules.

TutorialHydraPipelinesPersonasTutorial series

Conduction·12 mei 202610 min read

In part 1 you saw that Hydra has four personas. In this part we look at the pipeline: how those personas work on one issue one after another, which labels mark the transitions, and when the pipeline diverts to needs-input. By the end you'll know Hydra's label state machine inside out and you'll be able to steer an issue back to the right phase if it gets stuck somewhere.

Three pipelines, one pipeline

Hydra technically has three pipelines: build, code-review and security-review. But in practice they always run in the same order, automatically driven by labels. Under the hood it's one state machine:

ready-to-build (or <prefix>-ready-to-build on a dev workstation — see part 5)
    ↓
build:queued → build:running → build:pass
    ↓
code-review:queued → :running → :pass / :fail
    ↓  (always traversed — even on :fail)
security-review:queued → :running → :pass / :fail
    ↓
decision:
    both reviews :pass + 0 fixes → done (Axel skipped)
    both reviews :pass + ≥1 fix → applier:queued → :running → :pass / :fail
    one of the reviews :fail     → needs-input (Axel skipped, human decides)

The pipeline has one important rule: every outcome after review is terminal. Hydra doesn't automatically fix in a loop. If it succeeds in one run, great. If not, the issue lands on needs-input and a human has to decide the next step. More on that in part 6.

Pipeline 1: Build (Al Gorithm, Haiku)

The builder receives the change, a clean clone of the target repo, and a turn budget. No review history, no feedback from earlier runs. He implements the tasks from tasks.md, runs the quality suite, opens a draft PR.

Order within the build phase:

Implementation — reads proposal.md, design.md and tasks.md from openspec/changes/<slug>/ in the target repo, writes code.
Quality checks — PHPCS, PHPMD, Psalm, PHPStan, ESLint, Stylelint, composer audit, npm audit.
PHPUnit + Newman — backend and API tests.
Browser tests — Playwright MCP walks through the UI.
fix-quality / fix-browser — if checks go red, Al Gorithm gets one or two attempts to mechanically repair. This is a pre-review fixup, not a review loop.
Verdict — build:pass (PR has been opened) or build:fail (broken build, needs-input).

Important: Al Gorithm runs on Haiku. Reason from part 1: he follows patterns, he doesn't judge. By putting him on Haiku we keep the Sonnet budget free for the reviewers.

Pipeline 2: Code review (Juan Claude van Damme, Sonnet)

Once build:pass is set, the supervisor automatically queues code-review:queued. Juan Claude picks it up:

Reads only the PR diff (HYDRA_REVIEW_SCOPE=diff, ADR-020). That keeps the scope manageable and ensures every line he comments on is actually touched in this PR.
Walks the ADR library and gate skills (see part 3).
For mechanical, in-scope errors he may fix himself (ADR-021: bounded-fix scope). Missing PHPCS headers? He adds them. Awkward variable name? Not his department.
For each finding he writes an inline comment on the PR with prefix [fixed:...] or [unfixed:...].
He commits and pushes his fixes to the feature branch and sets code-review:pass or code-review:fail on the issue.

Juan doesn't write out his fixes_applied[] and unfixed[] as JSON himself — the orchestrator only bundles them into one round file after the next step (Clyde). See "How verdicts are persisted" below.

Pipeline 3: Security review (Clyde Barcode, Sonnet)

Right after code review (whether :pass or :fail), the supervisor queues security-review:queued. Clyde Barcode runs on the PR state after Juan Claude's fixes:

Same shape as code review, plus Semgrep and pattern matching on CWE classes (SQL injection, XSS, path traversal, hardcoded secrets, …).
May also push bounded fixes in PR mode.
Writes security-review:pass or security-review:fail.

Why sequential and not parallel? Because Clyde reviews what Juan Claude left behind. Parallel would mean Clyde looks at pre-fix code — he would then find security issues that Juan Claude had already cleaned up and you'd have to reconcile verdicts. Sequential is simpler and clearer.

How verdicts are persisted

After a complete review round (Juan + Clyde back to back), the orchestrator bundles both verdicts into one file on the feature branch: openspec/changes/<slug>/reviews/<round>.json in the target repo. Rounds number sequentially — 1.json after the first pass, 2.json after a retry:queued cycle, and so on (just ls reviews/*.json | wc -l plus one). That file holds the fixes_applied[] and unfixed[] from both reviewers combined — which is what Axel Pliér will read later. An aggregated status snapshot runs alongside in openspec/changes/<slug>/hydra.json (see part 6 for the structure and how to read it).

The applier: Axel Pliér's binary gate

With both reviews in, Hydra has three options:

Both reviews :pass and Juan and Clyde together pushed 0 fixes.
Nothing changed after the original build, so no new risks. Axel is skipped. Issue gets done. Saves tokens.
Both reviews :pass and ≥ 1 fix was pushed.
The code has changed since the original build. Before we trust the final state, the orchestrator reruns the mechanical gates (PHPCS, Psalm, PHPStan, ...). If those pass, Axel Pliér is up. He reads:
- the final diff,
- the fixes_applied[] + unfixed[] from both reviews,
- all inline comments on the PR.
And gives one binary answer: {pass: true} or {pass: false, blocking: [...]}. He has no Write or Edit tools. He judges, he doesn't intervene.
One of the reviews :fail.
The reviewers are the authority. Axel is skipped. Issue goes to needs-input. A human decides whether this warrants a retry:queued or rebuild:queued (see part 6).

Labels: one source of truth

Hydra is stateless. The whole state machine lives on the issue labels on GitHub. The supervisor (the daemon) reads the labels every few seconds and decides what to do. If a container crashes, the supervisor looks at the labels again and picks up where it left off.

Per stage there are four possible states: :queued, :running, :pass, :fail. They always live on the issue, never on the PR. That's a deliberate choice: the issue is the unit of work, the PR is an artefact of a build cycle.

Alongside the stage labels there are metadata labels that live next to them:

Label	Meaning
`openspec`	Issue follows the OpenSpec working model
`yolo`	May auto-merge as long as all gates are green — no human approve needed
`agent-maxed-out`	Persona has used up its turn budget; output may be incomplete
`needs-input`	Hydra has stopped the pipeline, human's turn

`retry:queued` versus `rebuild:queued`

Two human-triggered recovery labels that reach back into the pipeline. Both single-shot (no loop):

retry:queued — the existing PR stays. Hydra builds an ephemeral feedback.md (in a /tmp/hydra-retry-XXXXXX/ working directory, not committed) containing the unfixed[] findings + applier blockers, mounts it as /workspace/feedback.md in the builder container, and dispatches Al Gorithm in HYDRA_MODE=fix scoped to exactly the flagged files. Cheap. Suitable for: "the reviewers were right, builder just needs to touch up a few things".
rebuild:queued — existing PR closes, branch resets to development, all cycle labels removed, back to build:queued. Expensive. Suitable for: "Al Gorithm's approach was wrong, we're starting over with the same change".

Rule of thumb: always go for the cheapest recovery first. First gh pr update-branch <N> (merge development → PR), then retry:queued, only then rebuild:queued. Part 6 goes into this in more depth — including the label cleanup steps required before applying either trigger (removing needs-input and fail labels first, otherwise the supervisor won't pick up the queue entry).

Test yourself

Four short questions to check whether you've understood this part. Stuck? Click Hint. Curious about the answer? Click Answer.

1. In what order do the three pipelines run, and why is security review sequential after code review and not parallel?

Hint

Think about which version of the code Clyde Barcode sees: before or after Juan Claude's fixes?

Answer

The order is build → code-review → security-review (and after that applier or needs-input).

Security review runs after code review so Clyde Barcode sees the PR state after Juan Claude's bounded fixes. Parallel would mean Clyde looks at pre-fix code and finds security issues Juan had already cleaned up. You'd then have to reconcile verdicts. Sequential is simpler and keeps the "who reviews what" boundary sharp.

2. What's the difference between build:fail and code-review:fail in terms of follow-up?

Hint

In one of them the pipeline stops immediately, in the other it continues to the next stage. Which is which, and why?

Answer

build:fail = the PR has not been (correctly) opened, there's nothing to review. Issue goes straight to needs-input. Pipeline stops.
code-review:fail = the PR exists, Juan Claude rejected it, but the pipeline still continues to security-review (always, even on :fail). Only after security review is the decision made: at least one review :fail → applier skipped → needs-input.

Reason for the difference: without a PR there's no work to judge; with a PR we want both reviewers to surface what's there so a human has a complete picture later.

3. In which situation is Axel Pliér (the applier) skipped, and why is that a deliberate choice?

Hint

There are two scenarios. One revolves around "has anything changed since the original build?", the other around "who has the final word?"

Answer

Two scenarios:

Both reviews :pass and 0 fixes pushed by Juan/Clyde. The code is identical to what the builder delivered; there's nothing new to judge. Axel skipped → done. Saves Sonnet tokens.
One of the reviews :fail. The reviewers are the authority; their rejection trumps the applier. Axel skipped → needs-input.

So Axel only runs when fixes were pushed and both reviews passed — then he judges whether those fixes make the final state good.

4. When do you pick retry:queued and when rebuild:queued?

Hint

The pivot is: was Al Gorithm's build approach sound? If yes → cheap path. If no → start over.

Answer

retry:queued when the builder approach was fundamentally right and the reviewers had legitimate, local findings. The existing PR stays; the builder gets feedback.md and fixes scoped to the flagged files. Cheap, single-shot.
rebuild:queued when the build approach itself was wrong — wrong implementation, stub methods, requirements missed. PR closes, branch resets to development, all cycle labels removed, back to build:queued. Expensive.

Rule of thumb: always cheapest first — gh pr update-branch → retry:queued → only then rebuild:queued.

Next step

Now that you can see the pipeline skeleton, in part 3 we dive into the quality gates — the mechanical referees that keep all three personas in line.

Part 3 — Quality gates

Previous step — What is Hydra?

The pipeline architecture in detail

Keep learning…

View all

Hydra tutorial series — Part 2: The three pipelines

Three pipelines, one pipeline

Pipeline 1: Build (Al Gorithm, Haiku)

Pipeline 2: Code review (Juan Claude van Damme, Sonnet)

Pipeline 3: Security review (Clyde Barcode, Sonnet)

How verdicts are persisted

The applier: Axel Pliér's binary gate

Labels: one source of truth

`retry:queued` versus `rebuild:queued`

Test yourself

Next step

Keep learning…

Build a Nextcloud app on the Conduction stack — Part 2: Schemas + manifest

Hydra tutorial series — Part 3: Quality gates

Hydra tutorial series — Part 2: The three pipelines

Three pipelines, one pipeline

Pipeline 1: Build (Al Gorithm, Haiku)

Pipeline 2: Code review (Juan Claude van Damme, Sonnet)

Pipeline 3: Security review (Clyde Barcode, Sonnet)

How verdicts are persisted

The applier: Axel Pliér's binary gate

Labels: one source of truth

retry:queued versus rebuild:queued

Test yourself

Next step

Keep learning…

Build a Nextcloud app on the Conduction stack — Part 2: Schemas + manifest

Hydra tutorial series — Part 3: Quality gates

`retry:queued` versus `rebuild:queued`