Hydra tutorial series — Part 4: Skills
Which skills run inside the automated Hydra factory, which exist for humans on the CLI, how they get invoked, and when you should add a new skill to the loop yourself. Fourth of six short modules.
This part dives straight into the Hydra-specific skill families. If you'd rather first learn what a Claude Skill even is, how the frontmatter works, and when you'd write one yourself, take the public Claude Skills tutorial series (three short modules, ~40 minutes). From here on we assume you know the basics.
The previous parts were about what Hydra does. This part is about how: the skills that let the personas do their job. By the end you'll know which skills the automated pipeline (the "Hydra factory") runs and which ones you call yourself as a human, you'll know the five families, and you'll be able to judge when a new skill is worth writing.
Skills, in one paragraph
A skill in Claude Code is a folder with a SKILL.md (and optionally scripts, examples/, helpers). The folder sits under .claude/skills/<name>/. The description in the frontmatter tells Claude when the skill is relevant; the contents are the instructions Claude follows once the skill is loaded.
For Hydra, skills are the bundling unit for behaviour: instead of pasting a thousand lines of prompt text straight into a persona's CLAUDE.md, it lives in a skill — reusable and testable.
How a skill gets invoked
A single skill can be triggered in two ways:
- Manually — you type
/opsx-applyin a Claude session. Direct, predictable, and the skill runs exactly when you want it to. - Automatically — Claude reads the
descriptionof every available skill and picks one itself when the current situation matches. Ask "what changed?" and Claude will trigger a skill likesummarize-changeson its own if one is available.
Which mode is allowed is set in the skill's own frontmatter:
disable-model-invocation: true— only you can call it via/<name>. Use this for skills with side-effects (/opsx-applymodifies code,/create-propens a PR). Claude shouldn't just decide to do that on its own.user-invocable: false— only Claude itself may load it. Use this for background knowledge (for example alegacy-system-contextthat doesn't make sense as an action).- Neither set — both are allowed.
So that distinction — manual vs. automatic — is about configuration of one skill, not about two different kinds of files. A persona in a Docker container has no keyboard and leans heavily on auto-activation; a human on the CLI usually wants explicit control and types /.
Two worlds: the Hydra factory vs. skills for humans
Hydra's hydra/.claude/skills/ contains ~70 skills, but the automated pipeline only uses a handful. The rest is tooling for you, behind a keyboard. Important to keep separate:
What the Hydra factory actually runs
Every pipeline container has its own, limited set of skills baked into its Docker image:
| Container | Persona | Skills in image | Main function |
|---|---|---|---|
| Builder | Al Gorithm | All .claude/skills/* + all vendor skills | Implements tasks; primarily opsx-apply. Has the other opsx skills on hand for context during verification, finding fits again, and fixing after a quality fail. |
| Reviewer | Juan Claude | hydra-gates + 13 hydra-gate-* skills + vendor code-review | Mandatory hydra-gates check + deeper code review via the Anthropic community skill. |
| Security | Clyde Barcode | hydra-gates + 13 hydra-gate-* skills + vendor trailofbits + vendor owasp | Mandatory hydra-gates + SAST (Semgrep) + OWASP top 10 checklists. |
| Applier | Axel Plier | No skills | Applies small, deterministic fixes purely through its CLAUDE.md — no skill layer needed. |
| Browser UI Tester | (sonnet, headless) | One skill: hydra-ui-test | Logs in, navigates the live app via Playwright MCP, delivers verdict JSON. |
On top of that, scripts/run-hydra-gates.sh runs all 14 hydra-gates mechanically in every container — independent of which skill files are in the image. The skill files serve Claude as documentation when fixing; the script does the detection.
What the factory doesn't do — for humans on the CLI
All the other skills (~50 out of 70) are for you, or for a fellow dev in a Claude Code session on their laptop. Examples:
- Preparing a change —
opsx-new,opsx-ff,opsx-explore,opsx-plan-to-issuesare things you do as a human before you throw the work into the pipeline. The Builder only runsopsx-applyon an existing change. - Taking on a role —
team-architect,team-backend,team-poetc. are pure roleplay frames for one-human-one-role sessions. The pipeline doesn't use them. - Running tests —
/test-counsel(all 8 personas) or/test-app(one browser sweep) you start by hand. The factory has its own browser tester (hydra-ui-test); thetest-*family is separate from that. - PRs and day-to-day work —
/create-pr,/review-pr,/report-outare the "three times a day" tools for you, not for the pipeline.
Remember: the factory grabs 5 skills, a human can call all 70. That's the whole difference.
The five skill families in full
With that split in mind, here are all five families. The tag system in the table: 🤖 factory = ships inside a pipeline container, ⌨️ human = CLI only.
Family 1: OpenSpec workflow (opsx-*, 16 skills)
The OPSX skills implement the Conduction workflow for OpenSpec changes — from proposal to archive.
| Skill | For whom | Does |
|---|---|---|
opsx-apply | 🤖 Builder | Implements tasks from a change (the pipeline default). |
opsx-verify | 🤖 Builder | Verifies that the implementation covers the change artefacts. |
opsx-archive | 🤖 Builder + ⌨️ | Archives a completed change, syncing delta into the spec. |
opsx-new | ⌨️ | Starts a new change (proposal scaffold, schema choice). |
opsx-ff | ⌨️ | "Fast-forward": creates a change + all artefacts in a single pass. |
opsx-continue | ⌨️ | Pick up an interrupted change. |
opsx-explore | ⌨️ | Explore pre-spec what a change should even be. |
opsx-onboard | ⌨️ | Onboarding flow on a new repo (sets up OpenSpec config + structure). |
opsx-plan-to-issues | ⌨️ | Converts tasks.md into GitHub issues + a tracking issue. |
opsx-sync | ⌨️ | Sync delta specs back into the main spec. |
opsx-bulk-archive | ⌨️ | Archive multiple completed changes in one go. |
opsx-apply-loop | ⌨️ | Headless build → quality-fix loop for local dev runs. |
opsx-pipeline | ⌨️ | Run multiple changes in parallel (multi-agent). |
opsx-coverage-scan | ⌨️ | Audit a legacy app for spec ↔ code coverage. |
opsx-annotate | ⌨️ | Applies @spec PHPDoc tags after a coverage scan. |
opsx-reverse-spec | ⌨️ | Reverse-engineers a spec from existing code. |
Family 2: Quality + security gates (hydra-gate-*, 14 skills)
The parent skill hydra-gates is a dispatcher that calls all gates together. The script engine under the hood (scripts/run-hydra-gates.sh) runs all 14 gates in every container; the skill files serve Claude as reference material when fixing a failure.
| Gate skill | What it detects |
|---|---|
hydra-gates (dispatcher) | Calls all 14 gates in sequence, delivers pass/fail summary. |
hydra-gate-spdx | Missing @license/@copyright headers on PHP/Vue files. |
hydra-gate-forbidden-patterns | Debug calls (var_dump, console.log), TODOs in production code, hardcoded secrets. |
hydra-gate-stub-scan | Stub methods that return null; or do nothing — no test, no logic. |
hydra-gate-composer-audit | composer audit with vulnerabilities in dependencies. |
hydra-gate-route-auth | routes.php without an auth attribute where one belongs. |
hydra-gate-orphan-auth | Auth checks in methods that are never called (orphaned). |
hydra-gate-no-admin-idor | Application::APP_ID missing on admin-only methods → IDOR risk. |
hydra-gate-unsafe-auth-resolver | catch(\Throwable) { return null; } around auth logic. |
hydra-gate-semantic-auth | #[NoAdminRequired] on a method that still calls requireAdmin(). |
hydra-gate-admin-router | Admin routes that don't go through the admin router. |
hydra-gate-initial-state | Frontend initialState without serverside hydration. |
hydra-gate-modal-isolation | Vue modals that teleport outside their container. |
hydra-gate-nc-input-labels | Nextcloud input components without an attached label (a11y). |
All 14 run 🤖 inside the Hydra factory (Reviewer + Security containers; Builder also runs them as a post-flight check during fix-quality).
Family 3: Team roles (team-*, 7 skills) — ⌨️ humans only
Skills that model a specific kind of work via a persona frame. Intended for a human working in a terminal alongside Hydra who wants to step into a single role for a bit.
team-po(Product Owner) — writes user stories and acceptance criteria.team-sm(Scrum Master) — manages backlog and sprint planning artefacts.team-architect— makes architecture decisions, writes ADR drafts.team-backend— implements backend work (PHP, services, mappers).team-frontend— implements frontend work (Vue 2, Pinia, NL Design System).team-reviewer— a manual variant of the reviewer work.team-qa— writes test cases and test plans.
None of these ship in a pipeline container — they have no place in the automated loop.
Family 4: Test suites (test-*, 19 skills) — almost all ⌨️ for humans
The largest family by count: around twenty skills that drive agentic browser and API testing, in three clusters:
Test types (one per test kind):
test-app— automated browser test of a whole Nextcloud app (Playwright MCP).test-functional— functional scenarios against implemented features.test-api— API checks against the PHP built-in server.test-accessibility,test-performance,test-security,test-regression— specialised variants.
Personas (test-persona-*) — eight Dutch user profiles, each looking at an app from their own angle: annemarie, fatima, henk, janwillem, mark, noor, priya, sem. Henk reads with large type and looks for simple navigation; Noor hammers on RBAC and audit trails; Annemarie checks NLGov/GEMMA mapping. The persona cards themselves live in hydra/personas/.
Scenario management — test-scenario-create, test-scenario-edit, test-scenario-run write, edit and run reusable TS-NNN-*.md scenarios per app.
The dispatcher for this family is test-counsel: it coordinates all eight personas against one feature and delivers a combined report. Same pattern as hydra-gates for the quality-gates family.
The Hydra factory has its own browser tester (hydra-ui-test) that runs in the Browser UI Tester step. That stands apart from this test-* family. The test-* skills are for when you — as a human — want to test a feature before or after the pipeline has touched it. Gates from part 3 run static checks (regex/AST); test-* runs the actual application in a browser.
Family 5: Utility & maintenance (~13 skills) — almost all ⌨️ for humans
The rest — those are the remaining ~13 skills. Mostly dev comfort and meta work:
| Skill | Does |
|---|---|
create-pr | Creates a PR from a feature branch — local checks → branch pick → PR body. |
review-pr | Reviews a GitHub PR (note: manual variant; the factory has its own Juan Claude). |
report-out | End-of-day report: today's commits + GitHub activity → Dutch Slack notification. |
clean-env | Fully resets the local Docker dev environment. |
local-run | Bring up the local Nextcloud dev environment. |
sync-docs | Sync {app}/docs/ or .github/docs/claude/ with reality in the repo. |
skill-creator | Wizard for building a new skill (scaffold, frontmatter, evals). |
feature-counsel | Pre-build spec analysis from eight persona perspectives (sibling of test-counsel). |
persistence-audit | Audit how an app handles data persistence (object store, sessions, etc.). |
journeydoc-init / journeydoc-add-story / journeydoc-instrument | Manual instrumentation + extension of Journey docs. |
verify-global-settings-version | Check whether global-settings/VERSION was bumped after a change. |
Nothing in this family runs in the pipeline. They are your daily / commands.
Vendor skills (community)
Next to its own skills, Hydra has vendor skills under hydra/vendor/skills/:
code-review— community review skill from Anthropic. → 🤖 Reviewer container.trailofbits— Semgrep-based static-analysis methodology from Trail of Bits. → 🤖 Security container.owasp— OWASP top 10:2025 + ASVS 5.0 checklists. → 🤖 Security container.
So those three do ship inside the factory (containers). They get loaded onto Juan and Clyde for extra coverage on top of our own hydra-gate-*. Maintenance sits with external parties — updates happen by tracking upstream, not by editing them yourself.
When do you write a new skill?
The pragmatic test: write a skill if…
- The check / behaviour is repeatable — needed more than once, in more than one place.
- It's mechanically describable — you can instruct it in 1-3 paragraphs without it devolving into "it depends on the context".
- A persona or a human would benefit from it. Don't write a skill because you can.
For a false-positive gate (part 3): you adjust the existing skill, you don't write a new one. For a new class of mistake that you see come by 3x: yes, that earns its own hydra-gate-* skill.
Test yourself
Four short questions to check whether you've grasped this part. Stuck? Click Hint. Curious about the answer? Click Answer.
1. What are the two ways a skill can be activated, and how do you control that per skill?
Hint
One way requires a human to type something. The other lets Claude decide for itself based on the skill description. Which frontmatter fields determine which mode is allowed?
Answer
- Manual: you type
/<name>in a Claude session. Direct, predictable. - Automatic: Claude reads the
descriptionof all skills and picks one itself when the current conversation matches.
In the skill's frontmatter you set:
disable-model-invocation: true— manual only. Used for skills with side-effects (/opsx-apply,/create-pr).user-invocable: false— automatic only. Used for background knowledge that isn't useful as an action.- Both empty → both allowed. The default for most Hydra skills.
In the Hydra pipeline the containers mostly use auto-activation (Claude picks the right skill itself); a human on the CLI typically types / explicitly.
2. Which skills actually run inside the pipeline containers, and which ones sit in the repo only for humans?
Hint
Three containers (Builder, Reviewer, Security) each have a specific set in their image. What's inside — and which families sit entirely outside?
Answer
Inside the pipeline containers (🤖 factory):
- Builder — all
.claude/skills/*baked in, but the standard run isopsx-apply. The other opsx skills are there for context. - Reviewer —
hydra-gates+ 13 individualhydra-gate-*+ vendorcode-review. - Security —
hydra-gates+ 13 individualhydra-gate-*+ vendortrailofbits+ vendorowasp. - Applier — no skills (pure CLAUDE.md).
- Browser UI Tester — only
hydra-ui-test.
Plus: scripts/run-hydra-gates.sh runs all 14 gates mechanically in every container, regardless of which skill files are in the image.
Not in the factory, only for humans on the CLI:
- Almost the entire
team-*family (7 skills). - Almost the entire
test-*family (19 skills) — the factory has its ownhydra-ui-test. - Most
opsx-*skills exceptopsx-apply/verify/archive(humans start changes; the pipeline implements them). - The whole utility family (
create-pr,review-pr,report-out,clean-env,sync-docs,skill-creator,feature-counsel,persistence-audit,journeydoc-*,verify-global-settings-version).
In total: of ~70 skills in the repo, the automated pipeline uses a handful; the rest is your toolkit.
3. When do you adjust an existing gate skill and when do you write a new one?
Hint
One decision is about "the gate doesn't do what we already wanted it to do". The other is about "we've discovered a new category of mistake".
Answer
- Adjust existing for a false positive: the gate triggers too broadly or too narrowly on something it was already meant to check. Example:
hydra-gate-forbidden-patternsmatched$builder->add(incorrectly — you tighten the regex, you don't add a second gate. - Write new for a new class of mistake that you see come by 3× and that slips through all existing meshes. Example:
hydra-gate-stub-scanwas born when a builder shipped areturn null;method that PHPCS swallowed and had no test — that was a new category, not a fix on an existing check.
Rule of three: once is chance, twice is coincidence, three times is a pattern that deserves its own gate.
4. What do the "vendor skills" do and why do we keep them separate from our own hydra-gate-*?
Hint
Think about provenance (who wrote them?), which pipeline container they get loaded into, and what happens when you need an external party to update their work.
Answer
Vendor skills (hydra/vendor/skills/) are skills that were not written by us, and that do ship inside the factory:
code-review(Anthropic community) → Reviewer container (Juan Claude).trailofbits(Trail of Bits, Semgrep methodology) → Security container (Clyde).owasp(OWASP top 10:2025 + ASVS 5.0) → Security container.
We keep them separate because:
- Maintenance sits with external parties — we update them by tracking upstream (see
vendor/skills/VERSIONS.md), not by editing them ourselves. Our own gates we mutate freely; vendor skills we leave alone. - Audit trail stays clear: what's ours vs. what's community/external? On a failure you immediately know which camp is responsible for the fix.
Next step
In part 5 we get practical: starting a real Hydra run on a real app, including the label-prefix trick for parallel dev runs.
