> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://nemo-platform.docs.buildwithfern.com/nemo/platform/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://nemo-platform.docs.buildwithfern.com/nemo/platform/_mcp/server.

# NeMo Platform Skills Spec

**Status:** Draft.

This document defines the conventions for skills shipped with NeMo Platform: how they're structured, what their frontmatter must contain, how they get tested, and where they live in the repo.

***

## TL;DR: what are we actually solving for?

NeMo Platform is positioned as "the toolkit for safer, more accurate, cheaper agents." Two distinct populations of users hit that pitch with very different starting points:

1. **The user who doesn't have an agent yet.** They have a *problem* and a *use case*. They want NeMo Platform to take them from idea → deployed working agent.
2. **The user who already has a deployed agent.** They have *metrics that aren't good enough*. They want NeMo Platform to make their agent measurably better.

Today's catalog half-served both. There was no top-level orchestrator for the build journey (users had to guess after setup completed), and no overview / theory / outer-loop for the optimization journey (users had per-technique skills but no map of when to use which).

The two skills overhauls in flight this weekend fix this:

* **Lifecycle skills** serve the first population. Build journey.
* **Optimization skills** serve the second population. Improvement journey.

They don't overlap. We need both. This doc proposes a unified spec so future skill authors aren't choosing between two conventions.

***

## The two user journeys, in detail

### Journey A: "I want to build an agent for my use case"

Entry point: a coding agent (Claude Code, Cursor, Codex, OpenCode) opened inside a freshly installed NeMo Platform repo. The user has just run `make bootstrap` and `nemo setup` in their shell.

**What they care about:**

* Going from a fuzzy idea to a working, testable agent in their preferred coding agent.
* Not having to read every doc.
* The CLI telling them what to do next at every junction, not dumping them at a chat playground.

**What they don't care about:**

* The platform's internal architecture.
* Optimization theory.
* Which microservice does what.

**The skills that serve them:**

| Step | Skill                  | What it does                                                             |
| ---- | ---------------------- | ------------------------------------------------------------------------ |
| 1    | `nemo-skill-selection` | Router: parses natural-language intent, picks the right downstream skill |
| 2    | `nemo-explore`         | Design conversation: captures goal, audience, tools, constraints         |
| 3    | `nemo-spec`            | Writes the design to `agents/<name>-spec/AGENT-SPEC.md`                  |
| 4    | `nemo-build-agent`     | Scaffolds NAT workflow YAML and deploys                                  |
| 5    | `nemo-try-agent`       | Sends queries to the deployed agent                                      |
| 6    | `nemo-status`          | Read-only platform health dashboard                                      |
| 7    | `nemo-teardown`        | Guided shutdown                                                          |

**Where this journey breaks today** (post-install CTA): after `nemo setup` finishes, the CLI tells the user to chat with the built-in calculator agent. That's the wrong handoff for Journey A: the user came here to build *their* agent, not chat with someone else's. The CTA should be goal-oriented ("what do you want to build?" or "ask me to design an agent for X"). The CLI is the journey transition point, not the README. Most users never read the README all the way through.

### Journey B: "I have an agent. Make it better."

Entry point: a coding agent opened inside a NeMo Platform repo where they have a deployed agent. Metrics aren't where they want them.

**What they care about:**

* Knowing what kinds of optimization exist and when each applies.
* A reproducible loop (baseline → analyze → suggest → apply → re-evaluate → promote).
* Honest measurement: did the change actually help, or did I imagine it?

**What they don't care about:**

* How to deploy an agent (they already have one).
* The first-touch flow.

**The skills that serve them:**

| Step | Skill                                                                                                                               | What it does                                                                                       |
| ---- | ----------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| 1    | `nemo-platform-overview`                                                                                                            | Map of surfaces, plugins, and which skill owns each capability                                     |
| 2    | `agent-optimization-concepts` (proposed rename: `nemo-optimization-concepts`)                                                       | Tool-agnostic theory: four axes (quality/cost/latency/safety), decision tree, eval-as-ground-truth |
| 3    | `optimize-loop` (proposed rename: `nemo-optimize-loop`)                                                                             | Orchestrator: baseline → analyze → suggest → apply via sibling-and-deploy → re-evaluate → promote  |
| 4+   | `nemo-agents-optimize`, `nemo-agent-skills-optimization`, `nemo-agents-secure`, `nemo-evaluator`, `nemo-auditor`, `nemo-guardrails` | Per-technique deep-dives, called from the loop                                                     |

### Why both matter

The two journeys are **sequential**, not parallel. A user typically does A first, then B weeks or months later. Without A, no one has an agent to optimize. Without B, agents stay at their initial quality forever.

Without a coherent post-setup CTA, the user finishes A and never starts B. They get stuck at the calculator chat and walk away.

***

## Required frontmatter fields

Every skill ships with the following YAML frontmatter at the top of `SKILL.md`. Coding agents (Claude Code, Cursor, Codex, OpenCode) read these fields directly: `description` and `triggers` drive routing, `not-for` prevents collisions, `allowed-tools` tells the agent harness which tools the skill expects to invoke.

| Field            | Required | Purpose                                                                                                                                            |
| ---------------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`           | Yes      | Kebab-case identifier matching the directory name.                                                                                                 |
| `description`    | Yes      | One paragraph with embedded trigger phrases. Gives the router enough semantic surface to match user intent.                                        |
| `triggers`       | Yes      | Explicit list of natural-language phrases. Used by `scripts/skill-cli-lint.py` and the collision audit, not directly by the router.                |
| `not-for`        | Yes      | What this skill is NOT (other skills it could collide with, situations to bail out of). Prevents the wrong skill from firing as the catalog grows. |
| `compatibility`  | Optional | Real constraints only (Python version, OS, GPU, required platform components).                                                                     |
| `maturity`       | Yes      | `alpha`, `beta`, `active`, or `deprecated`. Makes lifecycle explicit to users.                                                                     |
| `license`        | Yes      | Default `Apache-2.0`.                                                                                                                              |
| `user-invocable` | Yes      | `true` for skills users invoke directly; `false` for internal helpers. Distinguishes user-facing skills from internal helpers.                     |
| `allowed-tools`  | Yes      | List of tool names this skill is permitted to use (e.g. `[Bash, Read, Write, Edit]`). Real safety signal: the agent harness honors this.           |

Library-prefix naming (`nemo-*`) is required for user-invocable skills (skills install into shared agent catalogs alongside skills from other plugins; the prefix prevents collisions). It's optional for internal helpers.

**Canonical location:** `packages/nemo_platform_ext/src/nemo_platform_ext/skills/<name>/`. Skills there ship with `pip install nemo-platform[all]`.

***

## The merged frontmatter, by example

```yaml
---
name: nemo-build-agent
description: >
  Scaffold and deploy a NeMo Agent Toolkit (NAT) agent from a spec.
  Generates the workflow YAML, creates the agent, deploys it, optionally
  wires guardrails and an eval suite. Use when the user says "build me an
  agent," "scaffold from this spec," "deploy my agent," or has finished
  `nemo-explore` and is ready to ship.
triggers:
  - build the agent
  - scaffold from spec
  - deploy my agent
not-for:
  - nemo-explore (use for design conversation before a spec exists)
  - nemo-try-agent (use to test an already-deployed agent)
  - nemo-agents-optimize (use to improve a deployed agent's metrics)
compatibility: nemo-platform >= 0.1.0; macOS or Linux; requires `agents/<name>-spec/AGENT-SPEC.md`
maturity: active
license: Apache-2.0
user-invocable: true
allowed-tools: [Bash, Read, Write, Edit]
---
```

***

## Playbook: how to write a new skill

The same checklist that `scripts/skill-cli-lint.py` and `scripts/skill-test.py` enforce at PR time.

### 1. Decide if it's user-invocable

* **User-invocable:** the user (or their coding agent) directly says "do X" and this skill answers. Examples: `nemo-build-agent`, `nemo-status`.
* **Internal helper:** only invoked by other skills. Examples: shared validators, template lookups.

If user-invocable, the `nemo-*` prefix is required. If internal, optional.

### 2. Pick the name

* Kebab-case.
* `nemo-<verb>-<noun>` or `nemo-<noun>` for user-invocable. Verb is preferred for action skills (`nemo-build-agent`, not `nemo-agent-build`).
* Run `nemo skills list` to confirm no existing name collides.

### 3. Pick the canonical location

* User-invocable: `packages/nemo_platform_ext/src/nemo_platform_ext/skills/<name>/` (ships with the platform package).
* Plugin-owned (skill is specific to a plugin): `plugins/<plugin-name>/src/<plugin_module>/skills/<name>/`.
* Internal-only dev skills: `.agents/skills/<name>/` (gitignored from skills install by default).

### 4. Write the frontmatter

Required fields:

* `name`
* `description` (verbose, embed natural-language trigger phrases for the LLM router)
* `triggers` (explicit list, ≥ 3 phrases, for the audit script)
* `not-for` (≥ 2 sibling-skill names with the reason to use them instead)
* `maturity` (alpha | beta | active | deprecated)
* `license` (Apache-2.0 unless overridden)
* `user-invocable` (true or false)
* `allowed-tools` (list of tools the skill expects to invoke)

Optional:

* `compatibility` (only when there are real environment constraints)

### 5. Write the SKILL.md body

* Lead with one-sentence purpose.
* Step-by-step instructions in the order the agent should run them.
* One verification step after any state-changing action. Skills must not claim success without verification.
* Lean: under 500 lines. Lift detail into `references/<topic>.md` if needed.
* Use real `nemo` CLI commands; never improvise flags. `scripts/skill-cli-lint.py` enforces this against `nemo --help`.

### 6. Write `tests.json` (four-mode routing tests)

Every skill needs four kinds of routing tests:

* **Explicit:** user names the skill directly. `"Use nemo-build-agent to deploy my agent."`
* **Implicit:** user describes the intent without naming the skill. `"Scaffold and deploy my agent."`
* **Contextual:** user describes the intent with surrounding situation. `"I have a spec at agents/calculator-spec/AGENT-SPEC.md and a working dev cluster. Take it from here."`
* **Negative-control:** unrelated request that should NOT route to this skill. `"Set up a new Postgres database with seed data."`

At least 3 examples per mode. `scripts/skill-test.py` runs them and fails if routing diverges.

### 7. Add `references/` only if needed

If the skill body needs reference material (template files, troubleshooting tables, deep-dive documentation), put it under `references/<topic>.md` and point at it from the body. Don't pre-emptively create `references/` for short skills.

### 8. Local verification before PR

```bash
.venv/bin/python scripts/skill-cli-lint.py --root packages/nemo_platform_ext/src/nemo_platform_ext/skills/<your-skill>
.venv/bin/python scripts/skill-test.py --skill <your-skill>
nemo skills list | grep <your-skill>
nemo skills show <your-skill>
```

All four must pass before opening the PR.

### 9. PR conventions

* One PR per skill (or per closely-related group, e.g. all eight lifecycle skills as a single set).
* Label: `skills`.
* Reviewers: one platform owner + one author of an adjacent skill (collision check).
* CI: skill-cli-lint and skill-test both run automatically.

***

## Out of scope

* The post-setup CTA in the CLI (the journey-A transition point); separate from the skills spec.
* Dependency management for NAT and related packages; covered elsewhere.
* A `nemo skills new` scaffolder that emits skeletons following this spec; planned as a follow-up.