Manage Metrics

View as Markdown

Instantiate the metric class you want to run and pass it with dataset and optional configuration to evaluator.run(...) or evaluator.submit(...).

Initialize the SDK

1import os
2
3from nemo_evaluator.sdk import Evaluator
4from nemo_platform import NeMoPlatform
5
6
7client = NeMoPlatform(
8 base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
9 workspace="default",
10)
11evaluator: Evaluator = client.evaluator # this object is an Evaluator resource

Create Metric Objects Inline

Metric objects are normal Python objects from nemo_evaluator_sdk.metrics.*. Keep them close to the evaluation code so the definition, dataset fields, and execution request stay in sync.

1from nemo_evaluator_sdk import ExactMatchMetric
2
3metric = ExactMatchMetric(
4 reference="{{item.expected}}",
5 candidate="{{item.output}}",
6)
7
8result = evaluator.run(
9 metric=metric,
10 dataset=[
11 {"expected": "Paris", "output": "Paris"},
12 {"expected": "Berlin", "output": "Munich"},
13 ],
14)
15
16for score in result.aggregate_scores.scores:
17 print(f"{score.name}: mean={score.mean}")

Use run for fast local execution while developing a metric. Use submit for durable remote execution through the platform job service.

Reuse a Metric Definition

Because metrics are inline objects, reuse is usually just a Python helper function or module-level factory.

1from nemo_evaluator_sdk import F1Metric
2
3def answer_f1_metric() -> F1Metric:
4 return F1Metric(
5 reference="{{item.expected_answer}}",
6 candidate="{{item.generated_answer}}",
7 description="Token-level F1 between expected and generated answers.",
8 )
9
10
11metric = answer_f1_metric()

Choose Metric Classes

Use the metric-specific pages for configuration details and examples:

Metric familyCommon classes
SimilarityExactMatchMetric, F1Metric, BLEUMetric, ROUGEMetric, StringCheckMetric, NumberCheckMetric
LLM-as-a-JudgeLLMJudgeMetric
RAG and agenticFaithfulnessMetric, ResponseRelevancyMetric, TopicAdherenceMetric, ToolCallingMetric, and related RAGAS-backed classes
Custom endpointsRemote metric classes from nemo_evaluator_sdk.metrics.remote

Configure Runtime Parameters

Pass execution settings through the config argument.

1from nemo_evaluator_sdk import RunConfig
2
3config = RunConfig(parallelism=4, limit_samples=100)

For online evaluations, provide a model or agent target and use the online parameter classes described in Model Configuration and Agent Configuration.

Submit a Durable Job

1from nemo_evaluator_sdk import RunConfig, ExactMatchMetric
2
3metric = ExactMatchMetric(reference="{{item.expected}}", candidate="{{item.output}}")
4
5job = evaluator.submit(
6 metric=metric,
7 dataset=[
8 {"expected": "Paris", "output": "Paris"},
9 {"expected": "Berlin", "output": "Munich"},
10 ],
11 config=RunConfig(parallelism=4),
12)
13
14job.wait_until_done()
15result = job.get_result()
  • Metric Results - Work with EvaluationResult, aggregate scores, and row scores
  • Manage Metric Jobs - Submit, monitor, reconnect to, and download job results
  • Similarity Metrics - Configure exact match, F1, BLEU, ROUGE, and string/number checks