Hydra tutorial series — Part 3: Quality gates
What are Hydra's mechanical quality gates, why do they deliberately NOT rely on AI judgement, and what do you do when a gate misfires? The third of six short modules.
In part 2 you saw that the three personas are backed up by mechanical quality gates — checks that pass or fail deterministically, without AI in the loop. This part explains which gates we have, why they are mechanical, and how to handle the exception: the false positive.
Why mechanical gates?
AI review scales, but it's not predictable. Two runs on exactly the same diff can produce different findings. And AI is especially weak at the boring checking that doesn't require judgement — for example: "does every new PHP file have an SPDX licence header at the top?". You can do that kind of check much faster and cheaper with a simple grep command.
The rule inside Hydra:
Whatever can be checked objectively, a mechanical gate handles — one script that passes or fails per check. Only where judgement is required do we bring in a reviewer.
Concretely: before we let Juan Claude or Clyde waste expensive Sonnet time on "this function is called doSomething and that should be a verb-noun pair", we run PHPCS first. Only then do the AI extra pairs of eyes start their work, focused on what a tool can't catch.
Category 1: generic code-quality tools
These checks run inside scripts/run-quality.sh in a Docker php:X.Y-cli container, with --keep-server to leave Nextcloud running afterwards for the browser tests:
| Tool | What it catches |
|---|---|
lint | Syntax errors in PHP. |
phpcs | Coding standard (PSR-12 + Nextcloud convention). |
phpmd | Code-mess detector (overlong methods, deep nesting, dead code). |
psalm | Static type analysis, level 4 baseline. |
phpstan | Second static type analyser (catches things Psalm misses and vice versa). |
phpmetrics | Complexity metrics (cyclomatic, maintainability index). |
composer audit | CVE check on composer.lock dependencies. |
eslint | JS/TS lint. |
stylelint | CSS/SCSS lint. |
npm audit | CVE check on package-lock.json. |
PHPUnit | Unit + integration tests with a containerised Nextcloud + SQLite. |
Newman | API tests against the PHP built-in server. |
Each check is red or green. One red check → build:fail (in the pre-review phase) or code-review:fail / security-review:fail (post-fixes).
Category 2: Hydra-specific gates
On top of the generic tools, Hydra has its own set of hydra-gate-* skills for things that are Conduction-specific. They live in hydra/.claude/skills/. The current set is 14 skills: one dispatcher that calls the other 13 mechanical gates one after another.
| Gate | What it checks |
|---|---|
hydra-gates | Dispatcher: calls the 13 gates below one after another and produces one pass/fail summary. |
hydra-gate-spdx | Every new PHP file has an EUPL-1.2 SPDX header. |
hydra-gate-forbidden-patterns | No dd(, var_dump, dump(, console.log left behind. |
hydra-gate-composer-audit | Mirror of the generic composer audit, for the reviewer's mandatory block. |
hydra-gate-stub-scan | No empty stub methods with TODOs left behind. |
hydra-gate-route-auth | Every route in appinfo/routes.php has explicit auth annotations. |
hydra-gate-orphan-auth | No @AuthorizedAdminSetting on an endpoint that doesn't exist. |
hydra-gate-unsafe-auth-resolver | Auth resolvers don't allow a claim bypass. |
hydra-gate-semantic-auth | Authorisation logic matches the route's purpose semantically. |
hydra-gate-admin-router | Admin routes use the admin router and not the regular one. |
hydra-gate-no-admin-idor | No Insecure Direct Object Reference on admin endpoints. |
hydra-gate-modal-isolation | Modal components isolate state correctly. |
hydra-gate-nc-input-labels | Nextcloud <NcTextField> components have <label> linkage. |
hydra-gate-initial-state | No initialState leaks to the client without deliberate exposure. |
A large share of these were born out of incidents: a gate arises in response to a bug that slipped through all earlier checks. hydra-gate-stub-scan, for example, came out of the decidesk-44-45 retrospective, where a builder delivered a method that just did return null; with a TODO. PHPCS swallowed it, PHPUnit had no test for it, code review read it as "in scope", and it broke in production.
ADR-020: gate scope is the PR diff
An important rule you already brushed against in part 2: gates run on the PR diff, not on the whole repo. That's in ADR-020.
Why? Because many of our repos carry a backlog of technical debt. If you turn phpcs loose on the whole repo you get hundreds of findings that have nothing to do with the current PR — and every PR ends up red. By limiting the scope to the lines touched in the PR diff, the gate only fires on what the current builder just added or changed.
Override mechanism: HYDRA_REVIEW_SCOPE=full in secrets/.env flips the scope to the whole repo. Use it when you onboard a new repo or do a dedicated tech-debt sweep. Expect a lot of red in any other case.
Recognising false positives
Mechanical gates are deterministic, but not always correct. A homegrown classic: hydra-gate-forbidden-patterns searched for dd( without a word boundary and so falsely tripped on a legitimate $builder->add(. One wrong grep flag and you have a repeatable false positive.
How to recognise a false positive:
- The gate keeps firing on the same line in every retry, while the line itself looks innocuous.
- A manual test of the gate implementation (open
scripts/run-quality.shor the associated skill, run it locally) confirms it: yes, the pattern matches, but not for the intended reason. - Nobody on the team can explain why this specific line should be complained about.
In that case the fix is not "another retry". The fix is: go to scripts/run-quality.sh or the gate skill and repair the detection. Example for hydra-gate-forbidden-patterns: from grep 'dd(' to grep -wE '(^|[^A-Za-z0-9_])dd\('.
Recheck after reviewer fixes
A subtle but important rule: after every reviewer-fix cycle the orchestrator reruns the mechanical gates on the final PR state. This is the "quality recheck" phase. Reason: during their bounded fix, a reviewer can repair a PHPCS-style error and accidentally introduce a new violation.
If the recheck goes red, the issue moves to needs-input. No retry. The reviewer has stopped, the builder has stopped, a human looks.
Test yourself
Four short questions to check whether you've understood this part. Stuck? Click Hint. Curious about the answer? Click Answer.
1. Why does Hydra have mechanical gates and AI reviewers, instead of just one of the two?
Hint
One kind of check is predictable and cheap, the other is expensive but can judge. What's the strength and weakness of each?
Answer
They complement each other exactly where the other is weak.
- Mechanical gates are deterministic: same input → same outcome, pass or fail. Perfect for boring, objective checking — for example "does every new PHP file have an SPDX header?". Cheap and repeatable.
- AI reviewers do judgement: "does this authorisation logic semantically match the route's purpose", "is this a security risk in this context". Not predictable and more expensive, so you deploy them where judgement is needed.
Mechanical-only misses context-dependent errors; AI-only is expensive, not predictable, and wastes Sonnet time on things a grep can do.
2. What does ADR-020 say about gate scope, and when do you switch that off via HYDRA_REVIEW_SCOPE=full?
Hint
Think about what happens when you turn phpcs loose on a repo with lots of old technical debt. And when do you actually want that?
Answer
ADR-020 says: gates run on the PR diff, not on the whole repo.
Reason: many repos drag along technical debt. phpcs across all of it produces hundreds of findings that have nothing to do with the current PR — every PR would be red. By only checking the touched lines, the gate only fires on what the builder just added or changed.
HYDRA_REVIEW_SCOPE=full in secrets/.env disables this — gates then run on the whole repo. Use for:
- Onboarding a new repo into Hydra.
- A dedicated tech-debt sweep.
NOT for regular PRs — expect a lot of red.
3. How do you recognise a false-positive gate, and what's the right fix? What is NOT?
Hint
Three signals together point at a false positive. And the "tempting but wrong" reflex is doing the same thing again.
Answer
Recognition — three signals together:
- The gate fires on the same line in every retry, while that line looks innocuous.
- Running the gate locally by hand confirms: the pattern matches, but not for the intended reason (e.g.
grep 'dd('also matches$builder->add(). - Nobody on the team can explain why this specific line should be complained about.
Right fix: tighten the gate detection in scripts/run-quality.sh or the associated skill. Example: grep 'dd(' → grep -wE '(^|[^A-Za-z0-9_])dd\('.
NOT: another retry:queued. That reproduces the same false positive and wastes cycles.
4. Why does Hydra rerun the mechanical gates AFTER the reviewer fixes (the "quality recheck" phase)?
Hint
The reviewers may fix within scope. What can go wrong while they do that?
Answer
Reviewers (Juan + Clyde) may push mechanical fixes within scope (ADR-021). While doing so a reviewer can accidentally introduce a new violation — for instance a PHPCS-style fix that breaks a rule elsewhere.
The quality recheck reruns all mechanical gates on the final PR state to make sure what's there now is also objectively clean. If the recheck goes red → needs-input, no retry. The reviewer has stopped, the builder has stopped, a human looks. It's the last deterministic gate before a human decision falls.
Next step
In part 4 we look at the skills and commands the personas use during their work — including how you can add a new skill yourself.
