For AI agents: a documentation index is available at the root level at /llms.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
LogoLogo
  • Documentation
    • Home
    • Get Started
    • Studio
    • Models and Inference
    • Agents
    • Design Synthetic Data
    • Synthesize Safe Data
    • Anonymize Data
    • Guardrail Models
    • Evaluate Models
      • Tutorials
        • Run LLM-as-a-Judge Evaluation
        • Define and Run Custom Python Metrics
      • SDK Resources
      • Metrics
    • Vulnerability Scanning
  • Home
  • Get Started
  • Core Concepts
  • Entities
  • Entity References
  • Filtering
  • Manage Files
  • Manage Secrets
  • Projects
  • Workspaces
  • Studio
  • Agents
  • Monitor
  • Suggestions
  • Models and Inference
  • Tutorials
  • Run Inference
  • Agents
  • Optimize Agents
  • Secure Agents
  • Plugins and Skills
  • Design Synthetic Data
  • Execution Modes
  • CLI
  • Tutorials
  • The Basics
  • Seeding with External Datasets
  • SDK Resources
  • Migrating from Standalone Library
  • Synthesize Safe Data
  • About
  • Data Synthesis
  • PII Replacement
  • Evaluation
  • Jobs
  • Local and Subprocess Execution
  • Parameters Reference
  • Tutorials
  • Safe Synthesizer 101
  • Differential Privacy
  • SDK Resources
  • Anonymize Data
  • Tutorials
  • Preview a Config
  • Run an Anonymizer Job
  • SDK Resources
  • CLI Reference
  • Guardrail Models
  • Core Concepts
  • Architecture
  • Configurations
  • Configuration Structure
  • Default Configurations
  • Manage Configurations
  • Running Inference
  • Running Checks
  • Tutorials
  • Content Safety
  • Deploy NemoGuard NIMs
  • Injection Detection
  • Multimodal Data
  • Parallel Rails
  • Terminology
  • Observability
  • Evaluate Models
  • Tutorials
  • Run LLM-as-a-Judge Evaluation
  • Define and Run Custom Python Metrics
  • SDK Resources
  • Metrics
  • Manage Metrics
  • LLM-as-a-Judge
  • RAG Metrics
  • Similarity Metrics
  • Agentic Metrics
  • Bring Your Own Metric
  • Agent Configuration
  • Model Configuration
  • Vulnerability Scanning
  • Tutorials
  • Run an Audit Locally
  • SDK Resources
  • Configurations
  • Selecting Probes
  • Schema
  • Targets
  • Inference Gateway
  • Schema
  • Release Notes
  • Current Release
  • System Requirements
  • Support Matrix
  • Discover auth configuration
  • List role bindings
  • Create role binding
  • Get role binding
  • Revoke role binding
  • Get entity by ID (debug/internal)
  • List all workspaces
  • Create a new workspace
  • Get workspace by ID
  • Update workspace
  • Delete workspace
  • List entities
  • Create a new entity
  • Get entity by name
  • Update entity by name
  • Delete entity by name
  • List workspace members
  • Add workspace member
  • Update workspace member roles
  • Remove workspace member
  • List all projects
  • Create a new project
  • Get project by name
  • Update project
  • Delete project
  • List Filesets
  • Create Fileset
  • Get Fileset by Workspace and Name
  • Delete Fileset
  • Update Fileset Metadata
  • Download File Content
  • Upload Fileset Content
  • Delete a specific file from a fileset
  • Get File Metadata
  • List Fileset Files
  • Upload OTLP Logs to Fileset
  • Query OTLP Logs from Fileset
  • Guardrail check request
  • List Guardrail Configs
  • Create Config
  • Get Guardrail Config
  • Delete Config
  • Update Config
  • Model Inference Proxy GET
  • Model Inference Proxy POST
  • Model Inference Proxy PUT
  • Model Inference Proxy DELETE
  • Model Inference Proxy PATCH
  • OpenAI List Models
  • OpenAI Get Model
  • OpenAI Inference Proxy GET
  • OpenAI Inference Proxy POST
  • OpenAI Inference Proxy PUT
  • OpenAI Inference Proxy DELETE
  • OpenAI Inference Proxy PATCH
  • Provider Inference Proxy GET
  • Provider Inference Proxy POST
  • Provider Inference Proxy PUT
  • Provider Inference Proxy DELETE
  • Provider Inference Proxy PATCH
  • Check Provider Readiness
  • List VirtualModels
  • Create VirtualModel
  • Get VirtualModel
  • Delete VirtualModel
  • Update VirtualModel
  • List Annotations
  • Create Annotation
  • Get Annotation
  • Delete Annotation
  • List Evaluator Results
  • Create Evaluator Result
  • Get Evaluator Result
  • List Evaluator Results For Span
  • List Experiment Groups
  • Create Experiment Group
  • Get Experiment Group
  • Update Experiment Group
  • Delete Experiment Group
  • List Experiments
  • Create Experiment
  • Get Experiment
  • Update Experiment
  • Delete Experiment
  • List Experiment Sessions
  • Ingest Atif
  • Ingest Chat Completion
  • Ingest Otlp Traces
  • List Spans
  • Get Span
  • List Traces
  • Get Trace
  • Get Execution Profiles
  • List Jobs
  • Create Job
  • Get Job Result
  • Create Job Result
  • Download Job Result
  • Get Job Step
  • Update Job Step Status
  • List Job Step Tasks
  • Get Job Step Task
  • Update Job Step Task
  • Get Job
  • Delete Job
  • Cancel Job
  • Page Job Logs
  • Pause Job
  • List Job Results
  • Resume Job
  • Get Job Status
  • Update Job Status Details
  • List Steps
  • List Adapters
  • Create Adapter
  • Get Adapter
  • Delete Adapter
  • Update Adapter
  • List ModelDeploymentConfigs By Workspace
  • Create ModelDeploymentConfig
  • Get Specific ModelDeploymentConfig Version
  • Delete Specific ModelDeploymentConfig Version
  • Get Latest ModelDeploymentConfig Version
  • Update ModelDeploymentConfig
  • Delete All ModelDeploymentConfig Versions
  • List ModelDeploymentConfig Versions
  • List ModelDeployments
  • Create ModelDeployment
  • Get Specific ModelDeployment Version
  • Delete Specific ModelDeployment Version
  • Get Latest ModelDeployment
  • Update ModelDeployment
  • Delete All ModelDeployment Versions
  • Get Latest ModelDeployment's Model Entities
  • Update ModelDeployment Status
  • List ModelDeployment Versions
  • List Models
  • Create Model
  • Add Model Adapter
  • Delete Model Adapter
  • Update Adapter
  • Get Model by Workspace and Name
  • Delete Model
  • Update Model
  • List ModelProviders By Workspace
  • Create ModelProvider
  • Get ModelProvider
  • Upsert ModelProvider
  • Delete ModelProvider
  • Update ModelProvider Status Fields
  • Admin Rotate Encryption Keys
  • List Secrets
  • Create Secret
  • Get Secret
  • Delete Secret
  • Update Secret
  • Access Secret
  • Python SDK
  • Client APIs
  • CLI Reference
  • Configuration
  • Working with Resources
  • Full CLI Reference
  • Troubleshooting
  • Troubleshooting
  • Data Designer
  • Evaluator
  • Guardrails
  • Studio
  • Skills Spec
  • EULA
  • Acknowledgements
On this page
  • Before You Start
  • How It Works
DocumentationEvaluate Models

Evaluation Tutorials

||View as Markdown|

Use these tutorials to become familiar with evaluation with NeMo Platform.

Before You Start

Set up a local instance of the platform for the following tutorials.

Run an LLM Judge Eval

Learn how to evaluate a fine-tuned model using the LLM Judge metric with a custom dataset.

intermediate custom evaluation llm judge nemo-evaluator
Define and Run Custom Python Metrics

Learn how to write a domain-specific Python metric, test it locally, and run it through the Evaluator service.

intermediate custom metric remote execution nemo-evaluator

How It Works

For the conceptual overview of how Evaluator separates definition (library) from execution (platform), see About Evaluating → How It Works. For runnable SDK examples, see SDK Resources.

Previous

About Evaluating

Next

@nemo-nb: hide

Built with
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.