Improving Content Safety with NemoGuard NIMs
Learn how to use NeMo Platform to apply content safety checks to user inputs and LLM outputs with the NVIDIA Nemotron Content Safety NIM. Content safety checks detect and block harmful, abusive, or policy-violating content before it reaches users.
For the content safety checks, this tutorial uses the Llama-3.1-Nemotron-Safety-Guard-8B-v3 NIM, which is trained to classify input or output content as safe or unsafe.
For the main model, this tutorial uses the Llama-3.1-8B-Instruct NIM.
Prerequisites
Before you begin:
- You have access to a running NeMo Platform.
NMP_BASE_URLis set to the NeMo Platform base URL.- A
ModelProvideris configured with an LLM provider. Follow Setup if you haven’t done this yet.
This tutorial uses the following NIMs, available on build.nvidia.com:
mainmodel:meta/llama-3.1-8b-instructcontent_safetymodel:nvidia/llama-3.1-nemotron-safety-guard-8b-v3
What You Will Build
You will:
- Create a Guardrail configuration that uses the NVIDIA NemoGuard Content Safety NIM
- Route model requests through the Inference Gateway service
- Verify that unsafe inputs are blocked and safe inputs are allowed
Step 1: Configure the Client
Instantiate the platform client.
Step 2: Create a Guardrail Configuration
This config executes content safety checks on both user inputs and model outputs. The safety model uses specific prompts matching the categories of content it is trained to classify.
Using Model Entity references (workspace/name format), the plugin resolves task model endpoints through IGW’s route table.
Step 3: Create a VirtualModel
The plugin runs only when a VirtualModel references it in its middleware list. Wire the config on both request_middleware (input rails) and response_middleware (output rails).
Step 4: Verify Blocked Content
Get a pre-configured OpenAI client from the SDK and send a request that the content safety input rail blocks.
Example Response
The Nemotron-Safety-Guard-8B-v3 and Llama-3.1-8B-Instruct NIMs both support multiple languages. For the full list, refer to the model cards:
Send an unsafe request in French that the content safety input rail blocks.
Example Response
Step 5: Verify Allowed Content
Send a safe request and confirm you receive an allowed response.
Example Response
Send a safe request in French and confirm you receive an allowed response.