Evaluation Tutorials | NVIDIA NeMo Platform

Use these tutorials to become familiar with evaluation with NeMo Platform.

Before You Start

Set up a local instance of the platform for the following tutorials.

Run an LLM Judge Eval

Learn how to evaluate a fine-tuned model using the LLM Judge metric with a custom dataset.

intermediate custom evaluation llm judge nemo-evaluator

Define and Run Custom Python Metrics

Learn how to write a domain-specific Python metric, test it locally, and run it through the Evaluator service.

intermediate custom metric remote execution nemo-evaluator

How It Works

For the conceptual overview of how Evaluator separates definition (library) from execution (platform), see About Evaluating → How It Works. For runnable SDK examples, see SDK Resources.