Skip to main content
AcademytutorialHydra tutorial series — Part 4: Skills

Hydra tutorial series — Part 4: Skills

Which skills run inside the automated Hydra factory, which exist for humans on the CLI, how they get invoked, and when you should add a new skill to the loop yourself. Fourth of six short modules.

TutorialHydraSkillsOPSXTestingTutorial series
16 min read

This part dives straight into the Hydra-specific skill families. If you'd rather first learn what a Claude Skill even is, how the frontmatter works, and when you'd write one yourself, take the public Claude Skills tutorial series (three short modules, ~40 minutes). From here on we assume you know the basics.

The previous parts were about what Hydra does. This part is about how: the skills that let the personas do their job. By the end you'll know which skills the automated pipeline (the "Hydra factory") runs and which ones you call yourself as a human, you'll know the five families, and you'll be able to judge when a new skill is worth writing.

Skills, in one paragraph

A skill in Claude Code is a folder with a SKILL.md (and optionally scripts, examples/, helpers). The folder sits under .claude/skills/<name>/. The description in the frontmatter tells Claude when the skill is relevant; the contents are the instructions Claude follows once the skill is loaded.

For Hydra, skills are the bundling unit for behaviour: instead of pasting a thousand lines of prompt text straight into a persona's CLAUDE.md, it lives in a skill — reusable and testable.

How a skill gets invoked

A single skill can be triggered in two ways:

  1. Manually — you type /opsx-apply in a Claude session. Direct, predictable, and the skill runs exactly when you want it to.
  2. Automatically — Claude reads the description of every available skill and picks one itself when the current situation matches. Ask "what changed?" and Claude will trigger a skill like summarize-changes on its own if one is available.

Which mode is allowed is set in the skill's own frontmatter:

  • disable-model-invocation: true — only you can call it via /<name>. Use this for skills with side-effects (/opsx-apply modifies code, /create-pr opens a PR). Claude shouldn't just decide to do that on its own.
  • user-invocable: false — only Claude itself may load it. Use this for background knowledge (for example a legacy-system-context that doesn't make sense as an action).
  • Neither set — both are allowed.

So that distinction — manual vs. automatic — is about configuration of one skill, not about two different kinds of files. A persona in a Docker container has no keyboard and leans heavily on auto-activation; a human on the CLI usually wants explicit control and types /.

Two worlds: the Hydra factory vs. skills for humans

Hydra's hydra/.claude/skills/ contains ~70 skills, but the automated pipeline only uses a handful. The rest is tooling for you, behind a keyboard. Important to keep separate:

What the Hydra factory actually runs

Every pipeline container has its own, limited set of skills baked into its Docker image:

ContainerPersonaSkills in imageMain function
BuilderAl GorithmAll .claude/skills/* + all vendor skillsImplements tasks; primarily opsx-apply. Has the other opsx skills on hand for context during verification, finding fits again, and fixing after a quality fail.
ReviewerJuan Claudehydra-gates + 13 hydra-gate-* skills + vendor code-reviewMandatory hydra-gates check + deeper code review via the Anthropic community skill.
SecurityClyde Barcodehydra-gates + 13 hydra-gate-* skills + vendor trailofbits + vendor owaspMandatory hydra-gates + SAST (Semgrep) + OWASP top 10 checklists.
ApplierAxel PlierNo skillsApplies small, deterministic fixes purely through its CLAUDE.md — no skill layer needed.
Browser UI Tester(sonnet, headless)One skill: hydra-ui-testLogs in, navigates the live app via Playwright MCP, delivers verdict JSON.

On top of that, scripts/run-hydra-gates.sh runs all 14 hydra-gates mechanically in every container — independent of which skill files are in the image. The skill files serve Claude as documentation when fixing; the script does the detection.

What the factory doesn't do — for humans on the CLI

All the other skills (~50 out of 70) are for you, or for a fellow dev in a Claude Code session on their laptop. Examples:

  • Preparing a changeopsx-new, opsx-ff, opsx-explore, opsx-plan-to-issues are things you do as a human before you throw the work into the pipeline. The Builder only runs opsx-apply on an existing change.
  • Taking on a roleteam-architect, team-backend, team-po etc. are pure roleplay frames for one-human-one-role sessions. The pipeline doesn't use them.
  • Running tests/test-counsel (all 8 personas) or /test-app (one browser sweep) you start by hand. The factory has its own browser tester (hydra-ui-test); the test-* family is separate from that.
  • PRs and day-to-day work/create-pr, /review-pr, /report-out are the "three times a day" tools for you, not for the pipeline.

Remember: the factory grabs 5 skills, a human can call all 70. That's the whole difference.

The five skill families in full

With that split in mind, here are all five families. The tag system in the table: 🤖 factory = ships inside a pipeline container, ⌨️ human = CLI only.

Family 1: OpenSpec workflow (opsx-*, 16 skills)

The OPSX skills implement the Conduction workflow for OpenSpec changes — from proposal to archive.

SkillFor whomDoes
opsx-apply🤖 BuilderImplements tasks from a change (the pipeline default).
opsx-verify🤖 BuilderVerifies that the implementation covers the change artefacts.
opsx-archive🤖 Builder + ⌨️Archives a completed change, syncing delta into the spec.
opsx-new⌨️Starts a new change (proposal scaffold, schema choice).
opsx-ff⌨️"Fast-forward": creates a change + all artefacts in a single pass.
opsx-continue⌨️Pick up an interrupted change.
opsx-explore⌨️Explore pre-spec what a change should even be.
opsx-onboard⌨️Onboarding flow on a new repo (sets up OpenSpec config + structure).
opsx-plan-to-issues⌨️Converts tasks.md into GitHub issues + a tracking issue.
opsx-sync⌨️Sync delta specs back into the main spec.
opsx-bulk-archive⌨️Archive multiple completed changes in one go.
opsx-apply-loop⌨️Headless build → quality-fix loop for local dev runs.
opsx-pipeline⌨️Run multiple changes in parallel (multi-agent).
opsx-coverage-scan⌨️Audit a legacy app for spec ↔ code coverage.
opsx-annotate⌨️Applies @spec PHPDoc tags after a coverage scan.
opsx-reverse-spec⌨️Reverse-engineers a spec from existing code.

Family 2: Quality + security gates (hydra-gate-*, 14 skills)

The parent skill hydra-gates is a dispatcher that calls all gates together. The script engine under the hood (scripts/run-hydra-gates.sh) runs all 14 gates in every container; the skill files serve Claude as reference material when fixing a failure.

Gate skillWhat it detects
hydra-gates (dispatcher)Calls all 14 gates in sequence, delivers pass/fail summary.
hydra-gate-spdxMissing @license/@copyright headers on PHP/Vue files.
hydra-gate-forbidden-patternsDebug calls (var_dump, console.log), TODOs in production code, hardcoded secrets.
hydra-gate-stub-scanStub methods that return null; or do nothing — no test, no logic.
hydra-gate-composer-auditcomposer audit with vulnerabilities in dependencies.
hydra-gate-route-authroutes.php without an auth attribute where one belongs.
hydra-gate-orphan-authAuth checks in methods that are never called (orphaned).
hydra-gate-no-admin-idorApplication::APP_ID missing on admin-only methods → IDOR risk.
hydra-gate-unsafe-auth-resolvercatch(\Throwable) { return null; } around auth logic.
hydra-gate-semantic-auth#[NoAdminRequired] on a method that still calls requireAdmin().
hydra-gate-admin-routerAdmin routes that don't go through the admin router.
hydra-gate-initial-stateFrontend initialState without serverside hydration.
hydra-gate-modal-isolationVue modals that teleport outside their container.
hydra-gate-nc-input-labelsNextcloud input components without an attached label (a11y).

All 14 run 🤖 inside the Hydra factory (Reviewer + Security containers; Builder also runs them as a post-flight check during fix-quality).

Family 3: Team roles (team-*, 7 skills) — ⌨️ humans only

Skills that model a specific kind of work via a persona frame. Intended for a human working in a terminal alongside Hydra who wants to step into a single role for a bit.

  • team-po (Product Owner) — writes user stories and acceptance criteria.
  • team-sm (Scrum Master) — manages backlog and sprint planning artefacts.
  • team-architect — makes architecture decisions, writes ADR drafts.
  • team-backend — implements backend work (PHP, services, mappers).
  • team-frontend — implements frontend work (Vue 2, Pinia, NL Design System).
  • team-reviewer — a manual variant of the reviewer work.
  • team-qa — writes test cases and test plans.

None of these ship in a pipeline container — they have no place in the automated loop.

Family 4: Test suites (test-*, 19 skills) — almost all ⌨️ for humans

The largest family by count: around twenty skills that drive agentic browser and API testing, in three clusters:

Test types (one per test kind):

  • test-app — automated browser test of a whole Nextcloud app (Playwright MCP).
  • test-functional — functional scenarios against implemented features.
  • test-api — API checks against the PHP built-in server.
  • test-accessibility, test-performance, test-security, test-regression — specialised variants.

Personas (test-persona-*) — eight Dutch user profiles, each looking at an app from their own angle: annemarie, fatima, henk, janwillem, mark, noor, priya, sem. Henk reads with large type and looks for simple navigation; Noor hammers on RBAC and audit trails; Annemarie checks NLGov/GEMMA mapping. The persona cards themselves live in hydra/personas/.

Scenario managementtest-scenario-create, test-scenario-edit, test-scenario-run write, edit and run reusable TS-NNN-*.md scenarios per app.

The dispatcher for this family is test-counsel: it coordinates all eight personas against one feature and delivers a combined report. Same pattern as hydra-gates for the quality-gates family.

The Hydra factory has its own browser tester (hydra-ui-test) that runs in the Browser UI Tester step. That stands apart from this test-* family. The test-* skills are for when you — as a human — want to test a feature before or after the pipeline has touched it. Gates from part 3 run static checks (regex/AST); test-* runs the actual application in a browser.

Family 5: Utility & maintenance (~13 skills) — almost all ⌨️ for humans

The rest — those are the remaining ~13 skills. Mostly dev comfort and meta work:

SkillDoes
create-prCreates a PR from a feature branch — local checks → branch pick → PR body.
review-prReviews a GitHub PR (note: manual variant; the factory has its own Juan Claude).
report-outEnd-of-day report: today's commits + GitHub activity → Dutch Slack notification.
clean-envFully resets the local Docker dev environment.
local-runBring up the local Nextcloud dev environment.
sync-docsSync {app}/docs/ or .github/docs/claude/ with reality in the repo.
skill-creatorWizard for building a new skill (scaffold, frontmatter, evals).
feature-counselPre-build spec analysis from eight persona perspectives (sibling of test-counsel).
persistence-auditAudit how an app handles data persistence (object store, sessions, etc.).
journeydoc-init / journeydoc-add-story / journeydoc-instrumentManual instrumentation + extension of Journey docs.
verify-global-settings-versionCheck whether global-settings/VERSION was bumped after a change.

Nothing in this family runs in the pipeline. They are your daily / commands.

Vendor skills (community)

Next to its own skills, Hydra has vendor skills under hydra/vendor/skills/:

  • code-review — community review skill from Anthropic. → 🤖 Reviewer container.
  • trailofbits — Semgrep-based static-analysis methodology from Trail of Bits. → 🤖 Security container.
  • owasp — OWASP top 10:2025 + ASVS 5.0 checklists. → 🤖 Security container.

So those three do ship inside the factory (containers). They get loaded onto Juan and Clyde for extra coverage on top of our own hydra-gate-*. Maintenance sits with external parties — updates happen by tracking upstream, not by editing them yourself.

When do you write a new skill?

The pragmatic test: write a skill if…

  1. The check / behaviour is repeatable — needed more than once, in more than one place.
  2. It's mechanically describable — you can instruct it in 1-3 paragraphs without it devolving into "it depends on the context".
  3. A persona or a human would benefit from it. Don't write a skill because you can.

For a false-positive gate (part 3): you adjust the existing skill, you don't write a new one. For a new class of mistake that you see come by 3x: yes, that earns its own hydra-gate-* skill.

Test yourself

Four short questions to check whether you've grasped this part. Stuck? Click Hint. Curious about the answer? Click Answer.

1. What are the two ways a skill can be activated, and how do you control that per skill?

Hint

One way requires a human to type something. The other lets Claude decide for itself based on the skill description. Which frontmatter fields determine which mode is allowed?

Answer
  • Manual: you type /<name> in a Claude session. Direct, predictable.
  • Automatic: Claude reads the description of all skills and picks one itself when the current conversation matches.

In the skill's frontmatter you set:

  • disable-model-invocation: true — manual only. Used for skills with side-effects (/opsx-apply, /create-pr).
  • user-invocable: false — automatic only. Used for background knowledge that isn't useful as an action.
  • Both empty → both allowed. The default for most Hydra skills.

In the Hydra pipeline the containers mostly use auto-activation (Claude picks the right skill itself); a human on the CLI typically types / explicitly.

2. Which skills actually run inside the pipeline containers, and which ones sit in the repo only for humans?

Hint

Three containers (Builder, Reviewer, Security) each have a specific set in their image. What's inside — and which families sit entirely outside?

Answer

Inside the pipeline containers (🤖 factory):

  • Builder — all .claude/skills/* baked in, but the standard run is opsx-apply. The other opsx skills are there for context.
  • Reviewerhydra-gates + 13 individual hydra-gate-* + vendor code-review.
  • Securityhydra-gates + 13 individual hydra-gate-* + vendor trailofbits + vendor owasp.
  • Applier — no skills (pure CLAUDE.md).
  • Browser UI Tester — only hydra-ui-test.

Plus: scripts/run-hydra-gates.sh runs all 14 gates mechanically in every container, regardless of which skill files are in the image.

Not in the factory, only for humans on the CLI:

  • Almost the entire team-* family (7 skills).
  • Almost the entire test-* family (19 skills) — the factory has its own hydra-ui-test.
  • Most opsx-* skills except opsx-apply/verify/archive (humans start changes; the pipeline implements them).
  • The whole utility family (create-pr, review-pr, report-out, clean-env, sync-docs, skill-creator, feature-counsel, persistence-audit, journeydoc-*, verify-global-settings-version).

In total: of ~70 skills in the repo, the automated pipeline uses a handful; the rest is your toolkit.

3. When do you adjust an existing gate skill and when do you write a new one?

Hint

One decision is about "the gate doesn't do what we already wanted it to do". The other is about "we've discovered a new category of mistake".

Answer
  • Adjust existing for a false positive: the gate triggers too broadly or too narrowly on something it was already meant to check. Example: hydra-gate-forbidden-patterns matched $builder->add( incorrectly — you tighten the regex, you don't add a second gate.
  • Write new for a new class of mistake that you see come by 3× and that slips through all existing meshes. Example: hydra-gate-stub-scan was born when a builder shipped a return null; method that PHPCS swallowed and had no test — that was a new category, not a fix on an existing check.

Rule of three: once is chance, twice is coincidence, three times is a pattern that deserves its own gate.

4. What do the "vendor skills" do and why do we keep them separate from our own hydra-gate-*?

Hint

Think about provenance (who wrote them?), which pipeline container they get loaded into, and what happens when you need an external party to update their work.

Answer

Vendor skills (hydra/vendor/skills/) are skills that were not written by us, and that do ship inside the factory:

  • code-review (Anthropic community) → Reviewer container (Juan Claude).
  • trailofbits (Trail of Bits, Semgrep methodology) → Security container (Clyde).
  • owasp (OWASP top 10:2025 + ASVS 5.0) → Security container.

We keep them separate because:

  • Maintenance sits with external parties — we update them by tracking upstream (see vendor/skills/VERSIONS.md), not by editing them ourselves. Our own gates we mutate freely; vendor skills we leave alone.
  • Audit trail stays clear: what's ours vs. what's community/external? On a failure you immediately know which camp is responsible for the fix.

Next step

In part 5 we get practical: starting a real Hydra run on a real app, including the label-prefix trick for parallel dev runs.