Evaluator NeMo Platform SDK Resources

The nemo_evaluator_sdk package provides context-agnostic objects for defining metrics, datasets, evaluation configuration, and result handling. When you want to execute those evaluations through the NeMo Platform Evaluator plugin, use the Evaluator SDK resource mounted on the nemo_platform SDK. This page explains the NeMo Platform-specific objects used to run local plugin jobs, submit durable platform jobs, and retrieve evaluator job results.

Evaluator

The Evaluator resource is the sync SDK object for working with the Evaluator plugin on NeMo Platform. It is accessed directly from a NeMoPlatform instance:

1 import os
2 from nemo_evaluator.sdk import Evaluator
3 from nemo_platform import NeMoPlatform
4 
5 
6 client = NeMoPlatform(
7     base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
8     workspace="default",
9 )
10 evaluator: Evaluator = client.evaluator  # this object is an Evaluator resource

The primary execution methods are run and submit. Use run when you want a local in-process plugin execution that returns a completed EvaluationResult. Use submit when you want to create a durable remote platform job and manage the job lifecycle separately.

Method	Description	Returns
`run()`	Runs one metric locally through the Evaluator plugin job runtime.	`EvaluationResult`
`submit()`	Submits one metric evaluation as a durable platform job.	`EvaluatorJobResource`
`plugin_status()`	Returns Evaluator plugin health information from the service.	`dict[str, object]`
`get_job_resource(job_name: str, workspace: str \	None = None)`	Returns a resource for an existing Evaluator plugin job.

The dataset argument accepts inline rows, local dataset paths, local glob paths, and fileset references with optional fragment selectors. Use config for evaluator runtime settings, aggregate_fields on result-returning calls to shape aggregate scores, and target plus prompt_template when the evaluator should generate model or agent responses before scoring.

`run()` arguments

Argument	Type	Required	Description
`metric`	`Metric`	Yes	Metric configuration used to score each row.
`dataset`	`PluginDatasetInput`	Yes	Inline rows, local dataset paths, local glob paths, or fileset references with optional fragment selectors.
`config`	`RunConfig \	RunConfigOnline \	RunConfigOnlineModel \
`aggregate_fields`	`tuple[AggregateFieldName, …] \	None`	No
`target`	`Model \	Agent \	None`
`prompt_template`	`str \	dict[str, Any] \	None`

`submit()` arguments

Argument	Type	Required	Description
`metric`	`Metric`	Yes	Metric configuration serialized into the durable platform job.
`dataset`	`PluginDatasetInput`	Yes	Inline rows, local dataset paths, local glob paths, or fileset references with optional fragment selectors.
`config`	`RunConfig \	RunConfigOnline \	RunConfigOnlineModel \
`target`	`Model \	ModelRef \	Agent \
`prompt_template`	`str \	dict[str, Any] \	None`

Run locally

1 from nemo_evaluator_sdk import ExactMatchMetric
2 
3 
4 metric = ExactMatchMetric(reference="{{item.expected}}", candidate="{{item.output}}")
5 dataset = [
6     {"expected": "Paris", "output": "Paris"},
7     {"expected": "Berlin", "output": "Munich"},
8 ]
9 
10 result = evaluator.run(metric=metric, dataset=dataset)
11 print(result.aggregate_scores)

Submit a platform job

1 from nemo_evaluator_sdk import ExactMatchMetric
2 
3 
4 metric = ExactMatchMetric(reference="{{item.expected}}", candidate="{{item.output}}")
5 dataset = [
6     {"expected": "Paris", "output": "Paris"},
7     {"expected": "Berlin", "output": "Munich"},
8 ]
9 
10 job = evaluator.submit(metric=metric, dataset=dataset)
11 job.wait_until_done()
12 result = job.get_result()
13 print(result.aggregate_scores)

AsyncEvaluator

The AsyncEvaluator resource provides the same Evaluator plugin surface for AsyncNeMoPlatform. Async methods must be awaited:

1 import os
2 from nemo_evaluator.sdk import AsyncEvaluator
3 from nemo_platform import AsyncNeMoPlatform
4 
5 
6 client = AsyncNeMoPlatform(
7     base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
8     workspace="default",
9 )
10 evaluator: AsyncEvaluator = client.evaluator

Method	Description	Returns
`run()`	Runs one metric locally through the Evaluator plugin job runtime.	`EvaluationResult`
`submit()`	Submits one metric evaluation as a durable platform job.	`AsyncEvaluatorJobResource`
`plugin_status()`	Returns Evaluator plugin health information from the service.	`dict[str, object]`
`get_job_resource(job_name: str, workspace: str \	None = None)`	Returns a resource for an existing Evaluator plugin job.

AsyncEvaluator.run() and AsyncEvaluator.submit() accept the same arguments as the sync methods above.

1 import asyncio
2 
3 from nemo_evaluator_sdk import ExactMatchMetric
4 
5 
6 metric = ExactMatchMetric(reference="{{item.expected}}", candidate="{{item.output}}")
7 dataset = [
8     {"expected": "Paris", "output": "Paris"},
9     {"expected": "Berlin", "output": "Munich"},
10 ]
11 
12 
13 async def main() -> None:
14     job = await evaluator.submit(metric=metric, dataset=dataset)
15     await job.wait_until_done()
16     result = await job.get_result()
17     print(result.aggregate_scores)
18 
19 
20 asyncio.run(main())

EvaluatorJobResource

The EvaluatorJobResource is the sync job handle returned by Evaluator.submit. You can also reconnect to an existing job with Evaluator.get_job_resource.

Some of the most useful methods and properties are described below.

Method or property	Description
`name`	Returns the evaluator job name.
`job`	Returns the raw evaluator job payload captured at resource creation.
`get_job_status()`	Fetches the current evaluator job status from the Evaluator plugin API.
`check_if_complete(raise_if_not_complete: bool = False)`	Returns whether the job is complete. When `raise_if_not_complete` is true, raises for any status other than `completed`.
`wait_until_done()`	Polls the job until it reaches a terminal platform status. Raises if the job fails or times out.
`get_result(aggregate_fields=None)`	Downloads aggregate-score and row-score artifacts and returns an `EvaluationResult`. Optional `aggregate_fields` shapes the returned aggregate scores only.
`download_artifacts(path=None)`	Downloads and extracts the full job artifacts archive under a job-specific directory.
`as_async()`	Returns an `AsyncEvaluatorJobResource` view over the same job.

AsyncEvaluatorJobResource

The AsyncEvaluatorJobResource is the async job handle returned by AsyncEvaluator.submit. It mirrors EvaluatorJobResource, but status and result methods are awaited.