> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://nemo-platform.docs.buildwithfern.com/nemo/platform/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://nemo-platform.docs.buildwithfern.com/nemo/platform/_mcp/server.

# Deploy NemoGuard NIMs

<a id="guardrails-deploy-nemoguard-nims" />

NemoGuard NIMs are specialized models built for specific use cases supported by the Guardrails service. Learn how to deploy NemoGuard NIMs in your environment and apply them to a guardrail configuration.

| NIM                                            | Use Case                                                                                     |
| ---------------------------------------------- | -------------------------------------------------------------------------------------------- |
| `nvidia/llama-3.1-nemotron-safety-guard-8b-v3` | Content safety: classifies inputs and outputs as safe or unsafe across 23 content categories |
| `nvidia/llama-3.1-nemoguard-8b-topic-control`  | Topic control: restricts conversations to a defined set of allowed topics                    |
| `nvidia/nemoguard-jailbreak-detect`            | Jailbreak detection: detects prompt injection and jailbreak attempts                         |

## Prerequisites

Before you begin:

* You have access to a running NeMo Platform.
* `NMP_BASE_URL` is set to the NeMo Platform base URL.
* Your infrastructure has 1 GPU available per NIM deployment.

***

## Step 1: Configure the Client

Instantiate the NeMoPlatform SDK.

```python
import os
from nemo_platform import NeMoPlatform

client = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace="default",
)
```

***

## Step 2: Deploy the NIMs

Use the Platform's Inference Gateway service to deploy each NIM. This process creates a `DeploymentConfig` that specifies the NIM image, and a `Deployment` that runs it.

Enabling [KV cache reuse](https://docs.nvidia.com/nim/large-language-models/latest/kv-cache-reuse.html) on the LLM-based NIMs could improve inference speed. These examples enable this feature by setting `NIM_ENABLE_KV_CACHE_REUSE=1` via the `nim_deployment.additional_envs` option.

### Deploy a Content-Safety NIM

```bash
nemo inference deployment-configs create \
--name "nemotron-safety-guard-config" \
--nim-deployment '{
"gpu": 1,
"image_name": "nvcr.io/nim/nvidia/llama-3.1-nemotron-safety-guard-8b-v3",
"image_tag": "1.14.0",
"additional_envs": {"NIM_ENABLE_KV_CACHE_REUSE": "1"}
}'

nemo inference deployments create \
--name "nemotron-safety-guard" \
--config "nemotron-safety-guard-config"

nemo wait inference deployment nemotron-safety-guard
```

```python
client.inference.deployment_configs.create(
    name="nemotron-safety-guard-config",
    nim_deployment={
        "gpu": 1,
        "image_name": "nvcr.io/nim/nvidia/llama-3.1-nemotron-safety-guard-8b-v3",
        "image_tag": "1.14.0",
        "additional_envs": {
            "NIM_ENABLE_KV_CACHE_REUSE": "1",
        },
    },
)

client.inference.deployments.create(
    name="nemotron-safety-guard",
    config="nemotron-safety-guard-config",
)

client.models.wait_for_status(
    deployment_name="nemotron-safety-guard",
    desired_status="READY",
)

print("Content safety NIM ready")
```

### Deploy a Topic-Control NIM

```bash
nemo inference deployment-configs create \
--name "nemoguard-topic-control-config" \
--nim-deployment '{
"gpu": 1,
"image_name": "nvcr.io/nim/nvidia/llama-3.1-nemoguard-8b-topic-control",
"image_tag": "1.10.1",
"additional_envs": {"NIM_ENABLE_KV_CACHE_REUSE": "1"}
}'

nemo inference deployments create \
--name "nemoguard-topic-control" \
--config "nemoguard-topic-control-config"

nemo wait inference deployment nemoguard-topic-control
```

```python
client.inference.deployment_configs.create(
    name="nemoguard-topic-control-config",
    nim_deployment={
        "gpu": 1,
        "image_name": "nvcr.io/nim/nvidia/llama-3.1-nemoguard-8b-topic-control",
        "image_tag": "1.10.1",
        "additional_envs": {
            "NIM_ENABLE_KV_CACHE_REUSE": "1",
        },
    },
)

client.inference.deployments.create(
    name="nemoguard-topic-control",
    config="nemoguard-topic-control-config",
)

client.models.wait_for_status(
    deployment_name="nemoguard-topic-control",
    desired_status="READY",
)

print("Topic control NIM ready")
```

### Deploy a Jailbreak-Detection NIM

```bash
nemo inference deployment-configs create \
--name "nemoguard-jailbreak-config" \
--nim-deployment '{
"gpu": 1,
"image_name": "nvcr.io/nim/nvidia/nemoguard-jailbreak-detect",
"image_tag": "1.10.1"
}'

nemo inference deployments create \
--name "nemoguard-jailbreak" \
--config "nemoguard-jailbreak-config"

nemo wait inference deployment nemoguard-jailbreak
```

```python
client.inference.deployment_configs.create(
    name="nemoguard-jailbreak-config",
    nim_deployment={
        "gpu": 1,
        "image_name": "nvcr.io/nim/nvidia/nemoguard-jailbreak-detect",
        "image_tag": "1.10.1",
    },
)

client.inference.deployments.create(
    name="nemoguard-jailbreak",
    config="nemoguard-jailbreak-config",
)

client.models.wait_for_status(
    deployment_name="nemoguard-jailbreak",
    desired_status="READY",
)

print("Jailbreak detection NIM ready")
```

***

## Step 3: Verify the Model Entity Names

After the content safety and topic control NIMs are deployed, the Inference Gateway discovers the models served by each NIM and registers them as **Model Entities** in your workspace. Use these entities in guardrail configurations with the `workspace/name` format.

List all Model Entities in your workspace to find the names:

```python
models = client.models.list(workspace="default")
for model in models:
    print(f"{model.workspace}/{model.name}")
```

The NemoGuard NIMs register Model Entities with the following default names:

| NIM                                     | Model Entity Reference                                 |
| --------------------------------------- | ------------------------------------------------------ |
| `llama-3.1-nemotron-safety-guard-8b-v3` | `default/nvidia-llama-3-1-nemotron-safety-guard-8b-v3` |
| `llama-3.1-nemoguard-8b-topic-control`  | `default/nvidia-llama-3-1-nemoguard-8b-topic-control`  |

The jailbreak detection NIM exposes a `/v1/classify` endpoint rather than an OpenAI-compatible chat completions endpoint, so it does not register a Model Entity. Reference the NIM by setting `nim_base_url` to its Inference Gateway URL — see [Step 4](#step-4-use-the-nims-in-guardrail-configurations) below.

***

## Step 4: Use the NIMs in Guardrail Configurations

### Content Safety and Topic Control

Reference the Model Entities in your guardrail configuration using the `workspace/name` format. For a complete example combining content safety and topic control rails, see [Executing Input and Output Rails in Parallel](/documentation/guardrail-models/tutorials/parallel-rails).

### Jailbreak Detection

Configure the jailbreak detection NIM using the `rails.config.jailbreak_detection` field. Set `nim_base_url` to the Inference Gateway provider route exposed by the deployment you created in Step 2. The URL follows the pattern `/apis/inference-gateway/v2/workspaces/{workspace}/provider/{deployment_name}/-/v1`, where `deployment_name` matches the deployment name from Step 2.

```python
config = client.guardrail.configs.create(
    name="nemoguard-jailbreak-config",
    description="Jailbreak detection using self-hosted NemoGuard NIM",
    data={
        "rails": {
            "config": {
                "jailbreak_detection": {
                    "nim_base_url": f"{os.environ['NMP_BASE_URL']}/apis/inference-gateway/v2/workspaces/default/provider/nemoguard-jailbreak/-/v1",
                }
            },
            "input": {
                "flows": ["jailbreak detection model"],
            },
        },
    },
)
print(f"Created config: {config.name}")
```

***

## Cleanup

```bash
nemo guardrail configs delete nemoguard-jailbreak-config

# Note: Deleting the deployment will free up its GPU(s) when complete
nemo inference deployments delete nemotron-safety-guard
nemo inference deployments delete nemoguard-topic-control
nemo inference deployments delete nemoguard-jailbreak

nemo wait inference deployment nemotron-safety-guard --status DELETED
nemo wait inference deployment nemoguard-topic-control --status DELETED
nemo wait inference deployment nemoguard-jailbreak --status DELETED

nemo inference deployment-configs delete nemotron-safety-guard-config
nemo inference deployment-configs delete nemoguard-topic-control-config
nemo inference deployment-configs delete nemoguard-jailbreak-config
```

```python
client.guardrail.configs.delete(name="nemoguard-jailbreak-config")

# Note: Deleting the deployment will free up its GPU(s) when complete
client.inference.deployments.delete(name="nemotron-safety-guard")
client.inference.deployments.delete(name="nemoguard-topic-control")
client.inference.deployments.delete(name="nemoguard-jailbreak")

client.models.wait_for_status(
    deployment_name="nemotron-safety-guard", desired_status="DELETED"
)
client.models.wait_for_status(
    deployment_name="nemoguard-topic-control", desired_status="DELETED"
)
client.models.wait_for_status(
    deployment_name="nemoguard-jailbreak", desired_status="DELETED"
)

client.inference.deployment_configs.delete(name="nemotron-safety-guard-config")
client.inference.deployment_configs.delete(name="nemoguard-topic-control-config")
client.inference.deployment_configs.delete(name="nemoguard-jailbreak-config")

print("Cleanup complete")
```

***

## Next Steps

* [Improving Content Safety with NemoGuard NIMs](/documentation/guardrail-models/tutorials/content-safety) - Full content safety tutorial using `build.nvidia.com`-hosted NIMs
* [Executing Input and Output Rails in Parallel](/documentation/guardrail-models/tutorials/parallel-rails) - Combine multiple rails for comprehensive safety coverage