Deploy NemoGuard NIMs | NVIDIA NeMo Platform

NemoGuard NIMs are specialized models built for specific use cases supported by the Guardrails service. Learn how to deploy NemoGuard NIMs in your environment and apply them to a guardrail configuration.

NIM	Use Case
`nvidia/llama-3.1-nemotron-safety-guard-8b-v3`	Content safety: classifies inputs and outputs as safe or unsafe across 23 content categories
`nvidia/llama-3.1-nemoguard-8b-topic-control`	Topic control: restricts conversations to a defined set of allowed topics
`nvidia/nemoguard-jailbreak-detect`	Jailbreak detection: detects prompt injection and jailbreak attempts

Prerequisites

Before you begin:

You have access to a running NeMo Platform.
NMP_BASE_URL is set to the NeMo Platform base URL.
Your infrastructure has 1 GPU available per NIM deployment.

Step 1: Configure the Client

Instantiate the NeMoPlatform SDK.

1 import os
2 from nemo_platform import NeMoPlatform
3 
4 client = NeMoPlatform(
5     base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
6     workspace="default",
7 )

Step 2: Deploy the NIMs

Use the Platform’s Inference Gateway service to deploy each NIM. This process creates a DeploymentConfig that specifies the NIM image, and a Deployment that runs it.

Enabling KV cache reuse on the LLM-based NIMs could improve inference speed. These examples enable this feature by setting NIM_ENABLE_KV_CACHE_REUSE=1 via the nim_deployment.additional_envs option.

Deploy a Content-Safety NIM

CLI

Python SDK

$ nemo inference deployment-configs create \
> --name "nemotron-safety-guard-config" \
> --nim-deployment '{
> "gpu": 1,
> "image_name": "nvcr.io/nim/nvidia/llama-3.1-nemotron-safety-guard-8b-v3",
> "image_tag": "1.14.0",
> "additional_envs": {"NIM_ENABLE_KV_CACHE_REUSE": "1"}
> }'
$ 
$ nemo inference deployments create \
> --name "nemotron-safety-guard" \
> --config "nemotron-safety-guard-config"
$ 
$ nemo wait inference deployment nemotron-safety-guard

Deploy a Topic-Control NIM

CLI

Python SDK

$ nemo inference deployment-configs create \
> --name "nemoguard-topic-control-config" \
> --nim-deployment '{
> "gpu": 1,
> "image_name": "nvcr.io/nim/nvidia/llama-3.1-nemoguard-8b-topic-control",
> "image_tag": "1.10.1",
> "additional_envs": {"NIM_ENABLE_KV_CACHE_REUSE": "1"}
> }'
$ 
$ nemo inference deployments create \
> --name "nemoguard-topic-control" \
> --config "nemoguard-topic-control-config"
$ 
$ nemo wait inference deployment nemoguard-topic-control

Deploy a Jailbreak-Detection NIM

CLI

Python SDK

$ nemo inference deployment-configs create \
> --name "nemoguard-jailbreak-config" \
> --nim-deployment '{
> "gpu": 1,
> "image_name": "nvcr.io/nim/nvidia/nemoguard-jailbreak-detect",
> "image_tag": "1.10.1"
> }'
$ 
$ nemo inference deployments create \
> --name "nemoguard-jailbreak" \
> --config "nemoguard-jailbreak-config"
$ 
$ nemo wait inference deployment nemoguard-jailbreak

Step 3: Verify the Model Entity Names

After the content safety and topic control NIMs are deployed, the Inference Gateway discovers the models served by each NIM and registers them as Model Entities in your workspace. Use these entities in guardrail configurations with the workspace/name format.

List all Model Entities in your workspace to find the names:

1 models = client.models.list(workspace="default")
2 for model in models:
3     print(f"{model.workspace}/{model.name}")

The NemoGuard NIMs register Model Entities with the following default names:

NIM	Model Entity Reference
`llama-3.1-nemotron-safety-guard-8b-v3`	`default/nvidia-llama-3-1-nemotron-safety-guard-8b-v3`
`llama-3.1-nemoguard-8b-topic-control`	`default/nvidia-llama-3-1-nemoguard-8b-topic-control`

The jailbreak detection NIM exposes a /v1/classify endpoint rather than an OpenAI-compatible chat completions endpoint, so it does not register a Model Entity. Reference the NIM by setting nim_base_url to its Inference Gateway URL — see Step 4 below.

Step 4: Use the NIMs in Guardrail Configurations

Content Safety and Topic Control

Reference the Model Entities in your guardrail configuration using the workspace/name format. For a complete example combining content safety and topic control rails, see Executing Input and Output Rails in Parallel.

Jailbreak Detection

Configure the jailbreak detection NIM using the rails.config.jailbreak_detection field. Set nim_base_url to the Inference Gateway provider route exposed by the deployment you created in Step 2. The URL follows the pattern /apis/inference-gateway/v2/workspaces/{workspace}/provider/{deployment_name}/-/v1, where deployment_name matches the deployment name from Step 2.

1 config = client.guardrail.configs.create(
2     name="nemoguard-jailbreak-config",
3     description="Jailbreak detection using self-hosted NemoGuard NIM",
4     data={
5         "rails": {
6             "config": {
7                 "jailbreak_detection": {
8                     "nim_base_url": f"{os.environ['NMP_BASE_URL']}/apis/inference-gateway/v2/workspaces/default/provider/nemoguard-jailbreak/-/v1",
9                 }
10             },
11             "input": {
12                 "flows": ["jailbreak detection model"],
13             },
14         },
15     },
16 )
17 print(f"Created config: {config.name}")

Cleanup

CLI

Python SDK

$ nemo guardrail configs delete nemoguard-jailbreak-config
$ 
$ # Note: Deleting the deployment will free up its GPU(s) when complete
$ nemo inference deployments delete nemotron-safety-guard
$ nemo inference deployments delete nemoguard-topic-control
$ nemo inference deployments delete nemoguard-jailbreak
$ 
$ nemo wait inference deployment nemotron-safety-guard --status DELETED
$ nemo wait inference deployment nemoguard-topic-control --status DELETED
$ nemo wait inference deployment nemoguard-jailbreak --status DELETED
$ 
$ nemo inference deployment-configs delete nemotron-safety-guard-config
$ nemo inference deployment-configs delete nemoguard-topic-control-config
$ nemo inference deployment-configs delete nemoguard-jailbreak-config

Next Steps

Improving Content Safety with NemoGuard NIMs - Full content safety tutorial using build.nvidia.com-hosted NIMs
Executing Input and Output Rails in Parallel - Combine multiple rails for comprehensive safety coverage