> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://nemo-platform.docs.buildwithfern.com/nemo/platform/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://nemo-platform.docs.buildwithfern.com/nemo/platform/_mcp/server.

# Agent Configuration

<a id="eval-metrics-agent-configuration" />

Online evaluations can target an **agent** instead of a model. An agent is an HTTP endpoint that accepts a request and returns a response, optionally with a trajectory of intermediate steps. Use agents when you want to evaluate an agentic system end to end rather than a standalone LLM endpoint.

Provide an `Agent` as `target=...`. The metric, prompt template, target, dataset rows, and runtime parameters are all passed through the Evaluator plugin SDK call.

## Agent Formats

Two agent formats are supported:

| Format                 | Value                | Description                                                                                                      |
| ---------------------- | -------------------- | ---------------------------------------------------------------------------------------------------------------- |
| **Generic**            | `generic`            | Configurable HTTP POST with a Jinja-templated request body and JSONPath extraction for response and trajectory.  |
| **NeMo Agent Toolkit** | `nemo_agent_toolkit` | Fixed protocol for [NeMo Agent Toolkit](https://docs.nvidia.com/nemo/agent-toolkit/latest/index.html) endpoints. |

## Initialize the SDK

```python
import os

from nemo_evaluator.sdk import Evaluator
from nemo_platform import NeMoPlatform


client = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace="default",
)
evaluator: Evaluator = client.evaluator  # this object is an Evaluator resource
```

## Managing Secrets for Agent Endpoints

If your agent endpoint requires authentication, configure `api_key_secret` on the `Agent`.

For local `evaluator.run(...)` calls, `api_key_secret` must name an environment variable available to the local Python process. For remote `evaluator.submit(...)` jobs, it must name a NeMo platform secret in the target workspace. See [Model API Authentication](/documentation/evaluate-models/metrics/model-configuration#model-api-authentication) for the local-versus-remote behavior.

For remote `evaluator.submit(...)` jobs, create the secret in the platform workspace before submitting the job:

```python
client.secrets.create(
    name="my-agent-api-key",
    value=os.environ["MY_AGENT_API_KEY"],
)
```

The secret name may be a workspace-local name such as `"my-agent-api-key"` or a full reference such as `"my-workspace/my-agent-api-key"` for remote jobs.

## Generic Agent

A generic agent is any HTTP endpoint that:

1. Accepts a `POST` request with `Content-Type: application/json`.
2. Returns a JSON response containing the answer and, optionally, a trajectory.

You control the request shape with `body` and extract values from the response with JSONPath expressions.

### Generic Agent Fields

| Field             | Required | Type   | Description                                                                                                                             |
| ----------------- | -------- | ------ | --------------------------------------------------------------------------------------------------------------------------------------- |
| `url`             | Yes      | string | Base URL of the agent endpoint.                                                                                                         |
| `name`            | Yes      | string | Agent name or identifier.                                                                                                               |
| `format`          | No       | string | `generic` (default) or `nemo_agent_toolkit`.                                                                                            |
| `api_key_secret`  | No       | string | API key reference. See [Model API Authentication](/documentation/evaluate-models/metrics/model-configuration#model-api-authentication). |
| `body`            | Yes      | dict   | Jinja template for the request payload. Use `{{ prompt }}`, `{{ messages }}`, or fields from the rendered prompt context.               |
| `response_path`   | Yes      | string | [JSONPath](https://datatracker.ietf.org/doc/html/rfc9535) expression to extract the response text.                                      |
| `trajectory_path` | No       | string | JSONPath expression to extract the trajectory.                                                                                          |

### Run a Generic Agent Evaluation

```python
from nemo_evaluator_sdk import Agent, RunConfigOnline
from nemo_evaluator_sdk import ExactMatchMetric
metric = ExactMatchMetric(reference="{{item.expected_answer}}")
agent = Agent(
    url="https://my-agent.example.com/invoke",
    name="qa-agent",
    format="generic",
    api_key_secret="MY_AGENT_API_KEY",
    body={"question": "{{ prompt }}"},
    response_path="$.answer",
    trajectory_path="$.reasoning_steps",
)

result = evaluator.run(
    metric=metric,
    dataset=[
        {"question": "What is the capital of France?", "expected_answer": "Paris"},
    ],
    config=RunConfigOnline(parallelism=4, request_timeout=60, max_retries=2),
    target=agent,
    prompt_template="Question: {{item.question}}\nAnswer:",
)
for score in result.aggregate_scores.scores:
    print(f"{score.name}: mean={score.mean}")
```

Use `evaluator.submit(...)` with the same argument shape when you want a durable remote job, but set `api_key_secret` to a platform secret name for the target workspace.

### Example Generic Agent Endpoint

Your agent endpoint might look like this:

```python
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()


class AgentRequest(BaseModel):
    question: str


class AgentResponse(BaseModel):
    answer: str
    reasoning_steps: list[dict]


@app.post("/invoke")
async def invoke(request: AgentRequest) -> AgentResponse:
    return AgentResponse(
        answer="Paris",
        reasoning_steps=[
            {"step": "search", "result": "Found relevant documents"},
            {"step": "synthesize", "result": "Generated answer from context"},
        ],
    )
```

## NeMo Agent Toolkit Agent

Use the `nemo_agent_toolkit` format when evaluating agents built with the [NeMo Agent Toolkit](https://docs.nvidia.com/nemo/agent-toolkit/latest/index.html). This format uses the NAT streaming protocol:

1. Sends a POST to `{url}/generate/full?filter_steps=none` with `{"input_message": "<text>"}`.
2. Reads the SSE (Server-Sent Events) stream.
3. Extracts the final `value` from the last SSE `data:` chunk.
4. Returns it as the agent response.

### NeMo Agent Toolkit Fields

| Field            | Required | Type   | Description                                                                                                                             |
| ---------------- | -------- | ------ | --------------------------------------------------------------------------------------------------------------------------------------- |
| `url`            | Yes      | string | Base URL of the agent endpoint.                                                                                                         |
| `name`           | Yes      | string | Agent name or identifier.                                                                                                               |
| `format`         | Yes      | string | Set to `nemo_agent_toolkit`.                                                                                                            |
| `api_key_secret` | No       | string | API key reference. See [Model API Authentication](/documentation/evaluate-models/metrics/model-configuration#model-api-authentication). |

### Run a NAT Agent Evaluation

```python
from nemo_evaluator_sdk import Agent, RunConfigOnline


from nemo_evaluator_sdk import ExactMatchMetric
metric = ExactMatchMetric(reference="{{item.expected_answer}}")
agent = Agent(
    url="https://my-nat-agent.example.com",
    name="nat-research-agent",
    format="nemo_agent_toolkit",
    api_key_secret="my-nat-api-key",
)

job = evaluator.submit(
    metric=metric,
    dataset=[
        {"question": "What is the capital of France?", "expected_answer": "Paris"},
    ],
    config=RunConfigOnline(parallelism=4),
    target=agent,
    prompt_template={
        "messages": [
            {"role": "user", "content": "{{item.question}}"},
        ],
    },
)
job.wait_until_done()
result = job.get_result()
```

## Model vs Agent: When to Use Which

| Use Case                                                          | Use `Model` | Use `Agent` |
| ----------------------------------------------------------------- | :---------: | :---------: |
| Evaluate a standalone LLM endpoint                                |      x      |             |
| Evaluate an agentic system with tool use and multi-step reasoning |             |      x      |
| Evaluate a NeMo Agent Toolkit workflow                            |             |      x      |
| Evaluate a custom HTTP endpoint with non-standard response format |             |      x      |
| Use a standard chat completions API                               |      x      |             |

Online evaluations accept **either** a model **or** an agent as the request target, never both.

## Related

* [Model Configuration](/documentation/evaluate-models/metrics/model-configuration) - Inline model targets for LLM endpoints.
* [Agentic Evaluation Metrics](/documentation/evaluate-models/metrics/agentic-metrics) - Metrics for evaluating agent tool calling, goal accuracy, and trajectory.
* [LLM-as-a-Judge](/documentation/evaluate-models/metrics/llm-as-a-judge) - Custom judge-based evaluation with flexible scoring criteria.
* [Bring Your Own Metric](/documentation/evaluate-models/metrics/bring-your-own-metric) - Integrate custom evaluation endpoints.