Evaluator NeMo Platform SDK Resources

View as Markdown

The nemo_evaluator_sdk package provides context-agnostic objects for defining metrics, datasets, evaluation configuration, and result handling. When you want to execute those evaluations through the NeMo Platform Evaluator plugin, use the Evaluator SDK resource mounted on the nemo_platform SDK. This page explains the NeMo Platform-specific objects used to run local plugin jobs, submit durable platform jobs, and retrieve evaluator job results.

Evaluator

The Evaluator resource is the sync SDK object for working with the Evaluator plugin on NeMo Platform. It is accessed directly from a NeMoPlatform instance:

1import os
2from nemo_evaluator.sdk import Evaluator
3from nemo_platform import NeMoPlatform
4
5
6client = NeMoPlatform(
7 base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
8 workspace="default",
9)
10evaluator: Evaluator = client.evaluator # this object is an Evaluator resource

The primary execution methods are run and submit. Use run when you want a local in-process plugin execution that returns a completed EvaluationResult. Use submit when you want to create a durable remote platform job and manage the job lifecycle separately.

MethodDescriptionReturns
run()Runs one metric locally through the Evaluator plugin job runtime.EvaluationResult
submit()Submits one metric evaluation as a durable platform job.EvaluatorJobResource
plugin_status()Returns Evaluator plugin health information from the service.dict[str, object]
`get_job_resource(job_name: str, workspace: str \None = None)`Returns a resource for an existing Evaluator plugin job.

The dataset argument accepts inline rows, local dataset paths, local glob paths, and fileset references with optional fragment selectors. Use config for evaluator runtime settings, aggregate_fields on result-returning calls to shape aggregate scores, and target plus prompt_template when the evaluator should generate model or agent responses before scoring.

run() arguments

ArgumentTypeRequiredDescription
metricMetricYesMetric configuration used to score each row.
datasetPluginDatasetInputYesInline rows, local dataset paths, local glob paths, or fileset references with optional fragment selectors.
config`RunConfig \RunConfigOnline \RunConfigOnlineModel \
aggregate_fields`tuple[AggregateFieldName, …] \None`No
target`Model \Agent \None`
prompt_template`str \dict[str, Any] \None`

submit() arguments

ArgumentTypeRequiredDescription
metricMetricYesMetric configuration serialized into the durable platform job.
datasetPluginDatasetInputYesInline rows, local dataset paths, local glob paths, or fileset references with optional fragment selectors.
config`RunConfig \RunConfigOnline \RunConfigOnlineModel \
target`Model \ModelRef \Agent \
prompt_template`str \dict[str, Any] \None`

Run locally

1from nemo_evaluator_sdk import ExactMatchMetric
2
3
4metric = ExactMatchMetric(reference="{{item.expected}}", candidate="{{item.output}}")
5dataset = [
6 {"expected": "Paris", "output": "Paris"},
7 {"expected": "Berlin", "output": "Munich"},
8]
9
10result = evaluator.run(metric=metric, dataset=dataset)
11print(result.aggregate_scores)

Submit a platform job

1from nemo_evaluator_sdk import ExactMatchMetric
2
3
4metric = ExactMatchMetric(reference="{{item.expected}}", candidate="{{item.output}}")
5dataset = [
6 {"expected": "Paris", "output": "Paris"},
7 {"expected": "Berlin", "output": "Munich"},
8]
9
10job = evaluator.submit(metric=metric, dataset=dataset)
11job.wait_until_done()
12result = job.get_result()
13print(result.aggregate_scores)

AsyncEvaluator

The AsyncEvaluator resource provides the same Evaluator plugin surface for AsyncNeMoPlatform. Async methods must be awaited:

1import os
2from nemo_evaluator.sdk import AsyncEvaluator
3from nemo_platform import AsyncNeMoPlatform
4
5
6client = AsyncNeMoPlatform(
7 base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
8 workspace="default",
9)
10evaluator: AsyncEvaluator = client.evaluator
MethodDescriptionReturns
run()Runs one metric locally through the Evaluator plugin job runtime.EvaluationResult
submit()Submits one metric evaluation as a durable platform job.AsyncEvaluatorJobResource
plugin_status()Returns Evaluator plugin health information from the service.dict[str, object]
`get_job_resource(job_name: str, workspace: str \None = None)`Returns a resource for an existing Evaluator plugin job.

AsyncEvaluator.run() and AsyncEvaluator.submit() accept the same arguments as the sync methods above.

1import asyncio
2
3from nemo_evaluator_sdk import ExactMatchMetric
4
5
6metric = ExactMatchMetric(reference="{{item.expected}}", candidate="{{item.output}}")
7dataset = [
8 {"expected": "Paris", "output": "Paris"},
9 {"expected": "Berlin", "output": "Munich"},
10]
11
12
13async def main() -> None:
14 job = await evaluator.submit(metric=metric, dataset=dataset)
15 await job.wait_until_done()
16 result = await job.get_result()
17 print(result.aggregate_scores)
18
19
20asyncio.run(main())

EvaluatorJobResource

The EvaluatorJobResource is the sync job handle returned by Evaluator.submit. You can also reconnect to an existing job with Evaluator.get_job_resource.

Some of the most useful methods and properties are described below.

Method or propertyDescription
nameReturns the evaluator job name.
jobReturns the raw evaluator job payload captured at resource creation.
get_job_status()Fetches the current evaluator job status from the Evaluator plugin API.
check_if_complete(raise_if_not_complete: bool = False)Returns whether the job is complete. When raise_if_not_complete is true, raises for any status other than completed.
wait_until_done()Polls the job until it reaches a terminal platform status. Raises if the job fails or times out.
get_result(aggregate_fields=None)Downloads aggregate-score and row-score artifacts and returns an EvaluationResult. Optional aggregate_fields shapes the returned aggregate scores only.
download_artifacts(path=None)Downloads and extracts the full job artifacts archive under a job-specific directory.
as_async()Returns an AsyncEvaluatorJobResource view over the same job.

AsyncEvaluatorJobResource

The AsyncEvaluatorJobResource is the async job handle returned by AsyncEvaluator.submit. It mirrors EvaluatorJobResource, but status and result methods are awaited.