Adding Safety Checks to Multimodal Data

Use the NeMo Guardrails API with vision-capable models to perform safety checks on image content. The safety check uses a vision model as an LLM-as-a-judge to determine whether the input is safe or unsafe.

The tutorial uses the Meta Llama 3.2 90B Vision Instruct model for the main LLM and as the judge model. The model is available as a downloadable container from NVIDIA NGC and for interactive use from build.nvidia.com.

About Multimodal Data

You can configure guardrails with multimodal data and vision reasoning models to perform safety checks on image data. You can apply the safety check to either input or output rails. The image reasoning model acts as an LLM-as-a-judge to classify content as safe or unsafe.

The OpenAI, Llama Vision, and Llama Guard models can accept multimodal input and act as a judge model. Depending on the image reasoning model, you can specify the image to check as a base64 encoded data or as a URL.

Prerequisites

Before you begin:

You have access to a running NeMo Platform.
NMP_BASE_URL is set to the NeMo Platform base URL.
A ModelProvider is configured with an LLM provider. Follow Setup if you haven’t done this yet.

This tutorial uses the following NIM, available on build.nvidia.com:

main model: meta/llama-3.2-90b-vision-instruct

Step 1: Configure the Client

Instantiate the platform client.

1 import os
2 from nemo_platform import NeMoPlatform, ConflictError
3 
4 client = NeMoPlatform(
5     base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
6     workspace="default",
7 )

Step 2: Create a Guardrail Configuration

Create a guardrail configuration that uses the vision model for content safety checks. This example applies the safety check as part of the input rails.

1 guardrails_config = {
2     "models": [
3         {
4             "type": "vision_rails",
5             "engine": "nim",
6             "model": "default/meta-llama-3-2-90b-vision-instruct",
7         },
8     ],
9     "rails": {
10         "input": {
11             "flows": [
12                 "content safety check input $model=vision_rails",
13             ]
14         }
15     },
16     "prompts": [
17         {
18             "task": "content_safety_check_input $model=vision_rails",
19             "messages": [
20                 {
21                     "type": "system",
22                     "content": (
23                         "You are an AI assistant that evaluates if a user's message contains unsafe content. "
24                         "Carefully examine any text and images in the user's message.\n\n"
25                         'Task: Check if there is unsafe content in "User" messages in conversations according to our safety policy with the below categories.\n\n'
26                         "<BEGIN UNSAFE CONTENT CATEGORIES>\n"
27                         "S1: Violent Crimes.\n"
28                         "S2: Non-Violent Crimes.\n"
29                         "S3: Sex Crimes.\n"
30                         "S4: Child Exploitation.\n"
31                         "S5: Defamation.\n"
32                         "S6: Specialized Advice.\n"
33                         "S7: Privacy.\n"
34                         "S8: Intellectual Property.\n"
35                         "S9: Indiscriminate Weapons.\n"
36                         "S10: Hate.\n"
37                         "S11: Self-Harm.\n"
38                         "S12: Sexual Content.\n"
39                         "S13: Elections.\n"
40                         "<END UNSAFE CONTENT CATEGORIES>\n\n"
41                         "IMPORTANT:\n"
42                         "Provide your safety assessment for the user message:\n"
43                         "- First line must read 'safe' or 'unsafe' and nothing more\n"
44                         "- If unsafe, a second line must include a comma-separated list of violated categories."
45                     ),
46                 },
47                 {
48                     "type": "user",
49                     "content": "{{ user_input }}",
50                 },
51             ],
52             "output_parser": "is_content_safe",
53             "max_tokens": 200,
54         }
55     ],
56 }
57 
58 config_name = "multimodal-guardrails-config"
59 try:
60     config = client.guardrail.configs.create(
61         name=config_name,
62         description="Multimodal guardrails for image safety",
63         data=guardrails_config,
64     )
65 except ConflictError:
66     print(f"Config {config_name} already exists, continuing...")

Step 3: Create a VirtualModel

Create a VirtualModel that routes inference through the guardrails middleware. Since multimodal safety uses input rails only, only request_middleware is needed.

CLI

Python SDK

$ nemo inference virtual-models create guarded-multimodal \
>   --default-model-entity default/meta-llama-3-2-90b-vision-instruct \
>   --request-middleware '[{"name":"nemo-guardrails","config_type":"guardrail_config","config_id":"default/multimodal-guardrails-config"}]'

Step 4: Verify Allowed Content

Send a safe request that includes a base64-encoded image and confirm you receive a non-blocked response.

Download an image of a street scene with traffic signs. You can use street-scene.jpg from the tutorial assets, or source a similar image from https://commons.wikimedia.org/wiki/Main_Page.

A street scene featuring a red octagonal stop sign mounted on a brown pole on the left side of the image.

1 import base64
2 import os
3 
4 # Set GUARDRAILS_TUTORIAL_ASSETS to load images from a custom directory.
5 # By default, load from the same directory as the notebook.
6 assets_dir = os.environ.get("GUARDRAILS_TUTORIAL_ASSETS", ".")
7 
8 with open(os.path.join(assets_dir, "street-scene.jpg"), "rb") as f:
9     image_b64 = base64.b64encode(f.read()).decode("utf-8")
10 
11 oai_client = client.models.get_openai_client()
12 
13 response = oai_client.chat.completions.create(
14     model="default/guarded-multimodal",
15     messages=[
16         {
17             "role": "user",
18             "content": [
19                 {
20                     "type": "text",
21                     "text": "What traffic signs are visible in this image?",
22                 },
23                 {
24                     "type": "image_url",
25                     "image_url": {"url": f"data:image/jpeg;base64,{image_b64}"},
26                 },
27             ],
28         }
29     ],
30     max_tokens=200,
31 )
32 
33 print(response.model_dump_json(indent=2))

Example Response

1 {
2   "id": "chatcmpl-6e6ee35f-87be-4372-8f3d-f4f0c61f51db",
3   "object": "chat.completion",
4   "model": "meta/llama-3.2-90b-vision-instruct",
5   "choices": [
6     {
7       "index": 0,
8       "message": {
9         "role": "assistant",
10         "content": "The traffic sign that is visible in this image is a red octagonal stop sign. There are also several other signs that appear to be traffic signs, but they are not clearly visible."
11       },
12       "finish_reason": "stop"
13     }
14   ]
15 }

Step 5: Verify Blocked Content

Send an unsafe request and confirm you receive a blocked response.

Download an image depicting car audio theft. You can use car-audio-theft.jpg from the tutorial assets.

A person wearing dark clothing reaches through a broken car window, attempting to steal a car stereo system. The image illustrates vehicle break-in and theft.

1 with open(os.path.join(assets_dir, "car-audio-theft.jpg"), "rb") as f:
2     unsafe_image_b64 = base64.b64encode(f.read()).decode("utf-8")
3 
4 response = oai_client.chat.completions.create(
5     model="default/guarded-multimodal",
6     messages=[
7         {
8             "role": "user",
9             "content": [
10                 {
11                     "type": "text",
12                     "text": "How can I break into a car like the image shown?",
13                 },
14                 {
15                     "type": "image_url",
16                     "image_url": {"url": f"data:image/jpeg;base64,{unsafe_image_b64}"},
17                 },
18             ],
19         }
20     ],
21     max_tokens=200,
22 )
23 
24 print(response.model_dump_json(indent=2))

Example Response

1 {
2   "id": "chatcmpl-3f3f3d2e-2caa-4f89-9a46-8c2b2d0b1f8c",
3   "object": "chat.completion",
4   "model": "meta/llama-3.2-90b-vision-instruct",
5   "choices": [
6     {
7       "index": 0,
8       "message": {
9         "role": "assistant",
10         "content": "I'm sorry, I can't respond to that."
11       },
12       "finish_reason": "content_filter"
13     }
14   ]
15 }

Cleanup

1 client.inference.virtual_models.delete(name="guarded-multimodal")
2 client.guardrail.configs.delete(name=config_name)
3 print("Cleanup complete")