Manage Metrics | NVIDIA NeMo Platform

Instantiate the metric class you want to run and pass it with dataset and optional configuration to evaluator.run(...) or evaluator.submit(...).

Initialize the SDK

1 import os
2 
3 from nemo_evaluator.sdk import Evaluator
4 from nemo_platform import NeMoPlatform
5 
6 
7 client = NeMoPlatform(
8     base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
9     workspace="default",
10 )
11 evaluator: Evaluator = client.evaluator  # this object is an Evaluator resource

Create Metric Objects Inline

Metric objects are normal Python objects from nemo_evaluator_sdk.metrics.*. Keep them close to the evaluation code so the definition, dataset fields, and execution request stay in sync.

1 from nemo_evaluator_sdk import ExactMatchMetric
2 
3 metric = ExactMatchMetric(
4     reference="{{item.expected}}",
5     candidate="{{item.output}}",
6 )
7 
8 result = evaluator.run(
9     metric=metric,
10     dataset=[
11         {"expected": "Paris", "output": "Paris"},
12         {"expected": "Berlin", "output": "Munich"},
13     ],
14 )
15 
16 for score in result.aggregate_scores.scores:
17     print(f"{score.name}: mean={score.mean}")

Use run for fast local execution while developing a metric. Use submit for durable remote execution through the platform job service.

Reuse a Metric Definition

Because metrics are inline objects, reuse is usually just a Python helper function or module-level factory.

1 from nemo_evaluator_sdk import F1Metric
2 
3 def answer_f1_metric() -> F1Metric:
4     return F1Metric(
5         reference="{{item.expected_answer}}",
6         candidate="{{item.generated_answer}}",
7         description="Token-level F1 between expected and generated answers.",
8     )
9 
10 
11 metric = answer_f1_metric()

Choose Metric Classes

Use the metric-specific pages for configuration details and examples:

Metric family	Common classes
Similarity	`ExactMatchMetric`, `F1Metric`, `BLEUMetric`, `ROUGEMetric`, `StringCheckMetric`, `NumberCheckMetric`
LLM-as-a-Judge	`LLMJudgeMetric`
RAG and agentic	`FaithfulnessMetric`, `ResponseRelevancyMetric`, `TopicAdherenceMetric`, `ToolCallingMetric`, and related RAGAS-backed classes
Custom endpoints	Remote metric classes from `nemo_evaluator_sdk.metrics.remote`

Configure Runtime Parameters

Pass execution settings through the config argument.

1 from nemo_evaluator_sdk import RunConfig
2 
3 config = RunConfig(parallelism=4, limit_samples=100)

For online evaluations, provide a model or agent target and use the online parameter classes described in Model Configuration and Agent Configuration.

Submit a Durable Job

1 from nemo_evaluator_sdk import RunConfig, ExactMatchMetric
2 
3 metric = ExactMatchMetric(reference="{{item.expected}}", candidate="{{item.output}}")
4 
5 job = evaluator.submit(
6     metric=metric,
7     dataset=[
8         {"expected": "Paris", "output": "Paris"},
9         {"expected": "Berlin", "output": "Munich"},
10     ],
11     config=RunConfig(parallelism=4),
12 )
13 
14 job.wait_until_done()
15 result = job.get_result()

Metric Results - Work with EvaluationResult, aggregate scores, and row scores
Manage Metric Jobs - Submit, monitor, reconnect to, and download job results
Similarity Metrics - Configure exact match, F1, BLEU, ROUGE, and string/number checks