Optimize Agents | NVIDIA NeMo Platform

Use the Agent Optimizer to analyze a deployed agent and act on improvement suggestions. The optimizer inspects the agent’s config, the workspace model catalog, any prior optimizer snapshots, and optional evaluation baselines, then writes suggestions you can review from the CLI or hand off to a coding agent.

This page covers the main path: establish a baseline, generate optimization suggestions, apply a candidate change to a sibling agent, and review the evaluation result before promotion.

What the Optimizer Checks

Suggestion type	Signal	Result
Model optimization	An agent uses a single frontier model where a smaller model or route split may preserve quality at lower cost	Suggests a model swap or Switchyard random-routing virtual model
Skill optimization	The agent uses skills and has an evaluation suite	Suggests running `nemo agents optimize-skills` to improve skill files and keep changes that pass evaluation
Prompt optimization	The agent has an optimization config and baseline dataset	Suggests `nemo agents optimize run` for NAT prompt or parameter tuning
New model scan	Difference between the current model list and the previous optimizer snapshot	Suggests evaluating or auditing newly available models

Optimizer state is stored in the nemo-agent-optimizer fileset:

optimizer_suggestions.jsonl: one suggestion per line, including applied state.
optimizer_snapshot.json: model and agent names from the latest run.

Security-oriented suggestions such as missing guardrails, PII exposure, or leaked secrets are covered in Secure Agents.

Prerequisites

Before running the optimizer, make sure you have:

Local services running (nemo services run).

The agents plugin installed. For local development from this repository:

$ uv pip install -e packages/nemo_platform_plugin -e plugins/nemo-agents

A workspace with at least one model provider and discovered model entities.
At least one deployed platform-managed agent.
An evaluation baseline before promoting a candidate agent.

If you need a demo agent, start the platform and create the ReAct example:

$ nemo services run

In another terminal:

$ export NMP_BASE_URL=http://127.0.0.1:8080
$ cd plugins/nemo-agents
$ 
$ printf '%s' "$NVIDIA_API_KEY" | nemo secrets create ngc-api-key --from-file -
$ nemo inference providers create nvidia-build \
>   --host-url https://integrate.api.nvidia.com \
>   --api-key-secret-name ngc-api-key
$ nemo wait inference provider nvidia-build
$ 
$ nemo agents create \
>   --name react-agent \
>   --agent-config examples/react-agent/react-agent.yml
$ nemo agents deploy --agent react-agent
$ nemo agents deployments wait --agent react-agent

The example agent uses nvidia-nemotron-3-nano-30b-a3b, so it can produce a model optimization suggestion when the workspace model catalog contains a smaller compatible model.

Optimize with Switchyard Routing

Switchyard is the inference middleware that lets a virtual model split traffic across multiple backend models. The common optimization pattern is to create a virtual model with a strong model and a weaker, cheaper model, then evaluate whether the route split preserves application quality.

Run nemo models list first and replace the placeholders below with model entity names from your workspace that use the OPENAI_CHAT backend format.

CLI

Skill

Python SDK

The command below creates a virtual model that sends 80% of traffic to the strong model and 20% to the weak one.

$ nemo inference virtual-models create routed-agent-model \
>   --workspace default \
>   --models '[
>     {"model":"default/<strong-model-entity>","backend_format":"OPENAI_CHAT"},
>     {"model":"default/<weak-model-entity>","backend_format":"OPENAI_CHAT"}
>   ]' \
>   --request-middleware '[{
>     "name":"nemo-switchyard",
>     "config_type":"random_routing",
>     "config":{
>       "strong":{"model":"default/<strong-model-entity>"},
>       "weak":{"model":"default/<weak-model-entity>"},
>       "strong_probability":0.8,
>       "enable_stats":false
>     }
>   }]'

Before wiring the virtual model to an agent, smoke-test the route by making several minimal chat-completions calls and checking the returned model name. The observed split should roughly match strong_probability.

Optimize Skills

Skill optimization applies when the agent depends on local skill files and has an evaluation suite. The loop runs evaluations, analyzes failures, lets the coding agent edit only the configured skills directory, reruns verification, and keeps the change only when the evaluation result improves.

CLI

Skill

Python SDK

$ nemo agents optimize-skills run --spec-file .agent-improver.yml

Set open_pr: true in the YAML when you want the loop to prepare a reviewable branch.

A sample .agent-improver.yml is in plugins/nemo-agents/examples/agent-improver.example.yml.

Inspect Saved Results

Use the Files service to inspect what the optimizer saved:

$ nemo files list nemo-agent-optimizer
$ 
$ nemo files download nemo-agent-optimizer \
>   --remote-path optimizer_suggestions.jsonl \
>   -o optimizer_suggestions.jsonl
$ 
$ nemo files download nemo-agent-optimizer \
>   --remote-path optimizer_snapshot.json \
>   -o optimizer_snapshot.json

Telemetry is optional. If agents use the nemo_files telemetry exporter, trace files are written to nemo-agent-telemetry, and the optimizer samples the largest JSONL file:

$ nemo files list nemo-agent-telemetry

Run Prompt and Parameter Tuning

The nemo agents optimize run command runs the NAT optimizer path for parameter or prompt tuning. Use it when you already have a NAT optimization YAML and want to run nat optimize through the Agents plugin.

For the ReAct example:

CLI

Skill

Python SDK

$ nemo agents optimize run \
>   --optimize-config plugins/nemo-agents/examples/react-agent/react-optimize.yml \
>   --agent react-agent

When --agent is a platform-managed agent name, the job fetches the stored agent config, merges it with the optimization config, injects the Inference Gateway URL, and runs trials locally. When --agent is a raw HTTP endpoint, the endpoint is treated as an opaque remote service, so local parameter sweeps do not change the remote agent behavior.

Troubleshooting

No suggestions appear. Confirm the workspace has agents, model entities, and a model catalog entry smaller than the agent’s current model. New-model suggestions require a previous optimizer snapshot, so they do not appear on the first run.

The model evaluation fails. Confirm the judge model in the eval config is available through the workspace Inference Gateway. You can replace the eval files in <agent-name>-eval with your own evaluation config and dataset.

Data safety suggestions do not appear. Telemetry is optional. The optimizer only scans nemo-agent-telemetry when that fileset exists and contains JSONL trace files.

Next steps

Agent overview: review how platform-managed agents are registered, deployed, invoked, evaluated, and optimized.
Agent evaluation: configure agents as online evaluation targets and choose the right agent response mapping.
CLI reference: look up complete command options and global CLI flags for scripted workflows.