> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://nemo-platform.docs.buildwithfern.com/nemo/platform/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://nemo-platform.docs.buildwithfern.com/nemo/platform/_mcp/server.

# Evaluator NeMo Platform SDK Resources

<a id="evaluator-nmp-sdk-resources" />

The `nemo_evaluator_sdk` package provides context-agnostic objects for defining metrics, datasets, evaluation configuration, and result handling.
When you want to execute those evaluations through the NeMo Platform Evaluator plugin, use the Evaluator SDK resource mounted on the `nemo_platform` SDK.
This page explains the NeMo Platform-specific objects used to run local plugin jobs, submit durable platform jobs, and retrieve evaluator job results.

## Evaluator

The `Evaluator` resource is the sync SDK object for working with the Evaluator plugin on NeMo Platform.
It is accessed directly from a `NeMoPlatform` instance:

```python
import os
from nemo_evaluator.sdk import Evaluator
from nemo_platform import NeMoPlatform


client = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace="default",
)
evaluator: Evaluator = client.evaluator  # this object is an Evaluator resource
```

The primary execution methods are `run` and `submit`.
Use `run` when you want a local in-process plugin execution that returns a completed `EvaluationResult`.
Use `submit` when you want to create a durable remote platform job and manage the job lifecycle separately.

| Method                                                 | Description                                                       | Returns                                                  |                        |
| ------------------------------------------------------ | ----------------------------------------------------------------- | -------------------------------------------------------- | ---------------------- |
| `run()`                                                | Runs one metric locally through the Evaluator plugin job runtime. | `EvaluationResult`                                       |                        |
| `submit()`                                             | Submits one metric evaluation as a durable platform job.          | `EvaluatorJobResource`                                   |                        |
| `plugin_status()`                                      | Returns Evaluator plugin health information from the service.     | `dict[str, object]`                                      |                        |
| \`get\_job\_resource(job\_name: str, workspace: str \\ | None = None)\`                                                    | Returns a resource for an existing Evaluator plugin job. | `EvaluatorJobResource` |

The `dataset` argument accepts inline rows, local dataset paths, local glob paths, and fileset references with optional fragment selectors. Use `config` for evaluator runtime settings, `aggregate_fields` on result-returning calls to shape aggregate scores, and `target` plus `prompt_template` when the evaluator should generate model or agent responses before scoring.

### `run()` arguments

| Argument           | Type                                 | Required           | Description                                                                                                 |                                                           |                                                                                       |                                                                                    |
| ------------------ | ------------------------------------ | ------------------ | ----------------------------------------------------------------------------------------------------------- | --------------------------------------------------------- | ------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------- |
| `metric`           | `Metric`                             | Yes                | Metric configuration used to score each row.                                                                |                                                           |                                                                                       |                                                                                    |
| `dataset`          | `PluginDatasetInput`                 | Yes                | Inline rows, local dataset paths, local glob paths, or fileset references with optional fragment selectors. |                                                           |                                                                                       |                                                                                    |
| `config`           | \`RunConfig \\                       | RunConfigOnline \\ | RunConfigOnlineModel \\                                                                                     | None\`                                                    | No                                                                                    | Runtime settings such as sample limits, parallelism, timeouts, and retry behavior. |
| `aggregate_fields` | \`tuple\[AggregateFieldName, ...] \\ | None\`             | No                                                                                                          | Aggregate score fields to include in the returned result. |                                                                                       |                                                                                    |
| `target`           | \`Model \\                           | Agent \\           | None\`                                                                                                      | No                                                        | Model or agent target used when the evaluator should generate outputs before scoring. |                                                                                    |
| `prompt_template`  | \`str \\                             | dict\[str, Any] \\ | None\`                                                                                                      | No                                                        | Prompt template used with `target` for online model or agent evaluation.              |                                                                                    |

### `submit()` arguments

| Argument          | Type                 | Required           | Description                                                                                                 |        |                                                                          |                                                                                                             |
| ----------------- | -------------------- | ------------------ | ----------------------------------------------------------------------------------------------------------- | ------ | ------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------- |
| `metric`          | `Metric`             | Yes                | Metric configuration serialized into the durable platform job.                                              |        |                                                                          |                                                                                                             |
| `dataset`         | `PluginDatasetInput` | Yes                | Inline rows, local dataset paths, local glob paths, or fileset references with optional fragment selectors. |        |                                                                          |                                                                                                             |
| `config`          | \`RunConfig \\       | RunConfigOnline \\ | RunConfigOnlineModel \\                                                                                     | None\` | No                                                                       | Runtime settings applied when the submitted job executes.                                                   |
| `target`          | \`Model \\           | ModelRef \\        | Agent \\                                                                                                    | None\` | No                                                                       | Model, model reference, or agent target used when the submitted job should generate outputs before scoring. |
| `prompt_template` | \`str \\             | dict\[str, Any] \\ | None\`                                                                                                      | No     | Prompt template used with `target` for online model or agent evaluation. |                                                                                                             |

### Run locally

```python
from nemo_evaluator_sdk import ExactMatchMetric


metric = ExactMatchMetric(reference="{{item.expected}}", candidate="{{item.output}}")
dataset = [
    {"expected": "Paris", "output": "Paris"},
    {"expected": "Berlin", "output": "Munich"},
]

result = evaluator.run(metric=metric, dataset=dataset)
print(result.aggregate_scores)
```

### Submit a platform job

```python
from nemo_evaluator_sdk import ExactMatchMetric


metric = ExactMatchMetric(reference="{{item.expected}}", candidate="{{item.output}}")
dataset = [
    {"expected": "Paris", "output": "Paris"},
    {"expected": "Berlin", "output": "Munich"},
]

job = evaluator.submit(metric=metric, dataset=dataset)
job.wait_until_done()
result = job.get_result()
print(result.aggregate_scores)
```

## AsyncEvaluator

The `AsyncEvaluator` resource provides the same Evaluator plugin surface for `AsyncNeMoPlatform`.
Async methods must be awaited:

```python
import os
from nemo_evaluator.sdk import AsyncEvaluator
from nemo_platform import AsyncNeMoPlatform


client = AsyncNeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace="default",
)
evaluator: AsyncEvaluator = client.evaluator
```

| Method                                                 | Description                                                       | Returns                                                  |                             |
| ------------------------------------------------------ | ----------------------------------------------------------------- | -------------------------------------------------------- | --------------------------- |
| `run()`                                                | Runs one metric locally through the Evaluator plugin job runtime. | `EvaluationResult`                                       |                             |
| `submit()`                                             | Submits one metric evaluation as a durable platform job.          | `AsyncEvaluatorJobResource`                              |                             |
| `plugin_status()`                                      | Returns Evaluator plugin health information from the service.     | `dict[str, object]`                                      |                             |
| \`get\_job\_resource(job\_name: str, workspace: str \\ | None = None)\`                                                    | Returns a resource for an existing Evaluator plugin job. | `AsyncEvaluatorJobResource` |

`AsyncEvaluator.run()` and `AsyncEvaluator.submit()` accept the same arguments as the sync methods [above](#run-arguments).

```python
import asyncio

from nemo_evaluator_sdk import ExactMatchMetric


metric = ExactMatchMetric(reference="{{item.expected}}", candidate="{{item.output}}")
dataset = [
    {"expected": "Paris", "output": "Paris"},
    {"expected": "Berlin", "output": "Munich"},
]


async def main() -> None:
    job = await evaluator.submit(metric=metric, dataset=dataset)
    await job.wait_until_done()
    result = await job.get_result()
    print(result.aggregate_scores)


asyncio.run(main())
```

## EvaluatorJobResource

The `EvaluatorJobResource` is the sync job handle returned by `Evaluator.submit`.
You can also reconnect to an existing job with `Evaluator.get_job_resource`.

Some of the most useful methods and properties are described below.

| Method or property                                       | Description                                                                                                                                                 |
| -------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`                                                   | Returns the evaluator job name.                                                                                                                             |
| `job`                                                    | Returns the raw evaluator job payload captured at resource creation.                                                                                        |
| `get_job_status()`                                       | Fetches the current evaluator job status from the Evaluator plugin API.                                                                                     |
| `check_if_complete(raise_if_not_complete: bool = False)` | Returns whether the job is complete. When `raise_if_not_complete` is true, raises for any status other than `completed`.                                    |
| `wait_until_done()`                                      | Polls the job until it reaches a terminal platform status. Raises if the job fails or times out.                                                            |
| `get_result(aggregate_fields=None)`                      | Downloads aggregate-score and row-score artifacts and returns an `EvaluationResult`. Optional `aggregate_fields` shapes the returned aggregate scores only. |
| `download_artifacts(path=None)`                          | Downloads and extracts the full job artifacts archive under a job-specific directory.                                                                       |
| `as_async()`                                             | Returns an `AsyncEvaluatorJobResource` view over the same job.                                                                                              |

## AsyncEvaluatorJobResource

The `AsyncEvaluatorJobResource` is the async job handle returned by `AsyncEvaluator.submit`.
It mirrors `EvaluatorJobResource`, but status and result methods are awaited.