Agent Configuration
Online evaluations can target an agent instead of a model. An agent is an HTTP endpoint that accepts a request and returns a response, optionally with a trajectory of intermediate steps. Use agents when you want to evaluate an agentic system end to end rather than a standalone LLM endpoint.
Provide an Agent as target=.... The metric, prompt template, target, dataset rows, and runtime parameters are all passed through the Evaluator plugin SDK call.
Agent Formats
Two agent formats are supported:
Initialize the SDK
Managing Secrets for Agent Endpoints
If your agent endpoint requires authentication, configure api_key_secret on the Agent.
For local evaluator.run(...) calls, api_key_secret must name an environment variable available to the local Python process. For remote evaluator.submit(...) jobs, it must name a NeMo platform secret in the target workspace. See Model API Authentication for the local-versus-remote behavior.
For remote evaluator.submit(...) jobs, create the secret in the platform workspace before submitting the job:
The secret name may be a workspace-local name such as "my-agent-api-key" or a full reference such as "my-workspace/my-agent-api-key" for remote jobs.
Generic Agent
A generic agent is any HTTP endpoint that:
- Accepts a
POSTrequest withContent-Type: application/json. - Returns a JSON response containing the answer and, optionally, a trajectory.
You control the request shape with body and extract values from the response with JSONPath expressions.
Generic Agent Fields
Run a Generic Agent Evaluation
Use evaluator.submit(...) with the same argument shape when you want a durable remote job, but set api_key_secret to a platform secret name for the target workspace.
Example Generic Agent Endpoint
Your agent endpoint might look like this:
NeMo Agent Toolkit Agent
Use the nemo_agent_toolkit format when evaluating agents built with the NeMo Agent Toolkit. This format uses the NAT streaming protocol:
- Sends a POST to
{url}/generate/full?filter_steps=nonewith{"input_message": "<text>"}. - Reads the SSE (Server-Sent Events) stream.
- Extracts the final
valuefrom the last SSEdata:chunk. - Returns it as the agent response.
NeMo Agent Toolkit Fields
Run a NAT Agent Evaluation
Model vs Agent: When to Use Which
Online evaluations accept either a model or an agent as the request target, never both.
Related
- Model Configuration - Inline model targets for LLM endpoints.
- Agentic Evaluation Metrics - Metrics for evaluating agent tool calling, goal accuracy, and trajectory.
- LLM-as-a-Judge - Custom judge-based evaluation with flexible scoring criteria.
- Bring Your Own Metric - Integrate custom evaluation endpoints.