Preview a Config | NVIDIA NeMo Platform

This tutorial walks through the anonymizer.preview flow: defining a request, streaming a small anonymized sample, and inspecting the resulting frames.

For detection and replacement strategy details, see the open-source library documentation.

Prerequisites

Complete the tutorials prerequisites, which cover:

A running NeMo Platform cluster with the nemo anonymizer CLI available (see Setup).
An inference provider configured (default examples use nvidia-build).
A fileset named anonymizer-inputs with anonymizer-input.csv uploaded.

What `preview` Does

preview runs the Anonymizer pipeline on a small number of records and streams results back from the plugin service. Both surfaces (Python SDK and CLI) produce the same data; the SDK collects it into a pandas DataFrame while the CLI prints newline-delimited JSON frames.

The plugin emits four frame kinds:

Frame	Purpose
`log`	Pipeline log lines forwarded from the library.
`preview_dataset`	User-facing anonymized records. The final dataset you usually care about.
`trace_dataset`	Internal trace records used by `display_record` in notebooks.
`failed_records`	Per-record failures with reasons. Emitted only when at least one record failed.
`heartbeat`, `done`, `error`	Generic stream control frames.

Available preview surfaces:

Surface	Where it runs	Local paths	`model_configs` required
`sdk.anonymizer.preview(...)`	Anonymizer plugin service (remote)	Rejected	Required
`nemo anonymizer preview run`	Local CLI process	Allowed	Optional
`nemo anonymizer preview submit`	Anonymizer plugin service (remote)	Rejected	Required

Step 1: Build a `PreviewRequest`

A preview request bundles the AnonymizerConfig, the input spec, and optional model_configs / selected_models.

1 import os
2 from anonymizer.config.anonymizer_config import AnonymizerConfig
3 from anonymizer.config.replace_strategies import Redact
4 from data_designer.config import ModelConfig
5 from nemo_anonymizer_plugin.app.input import AnonymizerInputSpec
6 from nemo_anonymizer_plugin.app.task_config import PreviewRequest
7 
8 WORKSPACE = os.environ.get("NMP_WORKSPACE", "default")
9 MODEL_PROVIDER = os.environ.get("NMP_ANON_PROVIDER", "nvidia-build")
10 
11 config = AnonymizerConfig(
12     replace=Redact(format_template="[REDACTED_{label}]"),
13 )
14 
15 model_configs = [
16     ModelConfig(alias="gliner-pii-detector", provider=MODEL_PROVIDER, model="nvidia/gliner-pii"),
17     ModelConfig(alias="gpt-oss-120b", provider=MODEL_PROVIDER, model="openai/gpt-oss-120b"),
18     ModelConfig(alias="nemotron-30b-thinking", provider=MODEL_PROVIDER, model="nvidia/nemotron-3-nano-30b-a3b"),
19 ]
20 
21 request = PreviewRequest(
22     config=config,
23     data=AnonymizerInputSpec(
24         source=f"fileset://{WORKSPACE}/anonymizer-inputs#anonymizer-input.csv",
25         text_column="biography",
26         id_column="id",
27     ),
28     model_configs=model_configs,
29     num_records=2,
30 )

Field reference:

Field	Type	Notes
`config`	`AnonymizerConfig`	The library config. The example uses `Redact`; the plugin also supports `substitute`, `annotate`, `hash`, and `rewrite`.
`data.source`	string	Local path, `http(s)` URL, or fileset reference. See input source forms.
`data.text_column`	string	Column containing text to anonymize. Defaults to `text`.
`data.id_column`	string	Optional record identifier column.
`data.data_summary`	string	Optional short description passed to Anonymizer library prompts.
`model_configs`	list	Data Designer `ModelConfig` entries. `provider` must reference an Inference Gateway provider name (or `workspace/provider`). Omit to use Anonymizer library defaults (CLI local execution only).
`selected_models`	object	Optional `detection` / `replace` / `rewrite` overrides on top of the bundled defaults. Requires `model_configs`.
`num_records`	int (≥ 1)	Number of records to preview. Defaults to 10.

Step 2: Run with the Python SDK

1 import os
2 from nemo_platform import NeMoPlatform
3 
4 sdk = NeMoPlatform(
5     base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
6     workspace=WORKSPACE,
7 )
8 preview = sdk.anonymizer.preview(request)
9 
10 preview.dataset            # pandas DataFrame of anonymized records
11 preview.trace_dataset      # detection trace
12 preview.failed_records     # list[dict] of per-record failures (usually empty)
13 preview.display_record(0)  # render record 0 with entity highlights in a notebook

AnonymizerResource.preview calls the plugin service, so the same constraints as preview submit apply: use a fileset or http(s) source, and include model_configs.

Async client

AsyncAnonymizerResource.preview has the same signature and return type:

1 import os
2 from nemo_platform import AsyncNeMoPlatform
3 
4 async_sdk = AsyncNeMoPlatform(
5     base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
6     workspace=WORKSPACE,
7 )
8 preview = await async_sdk.anonymizer.preview(request)

Step 3: Persist Preview Records

AnonymizerPreviewResult.dataset and trace_dataset are pandas DataFrames, so save them with standard pandas methods:

1 preview.dataset.to_csv("anonymized-preview.csv", index=False)
2 preview.dataset.to_parquet("anonymized-preview.parquet", index=False)

More about preview results

AnonymizerPreviewResult stores everything in memory; nothing is persisted to disk by default. The dataset field is a regular pandas DataFrame and can be saved with to_csv or to_parquet.

Step 4: Run from the CLI (Alternative)

The CLI accepts the same request shape as a YAML spec file. Use preview run for local execution (allows local paths, model configs optional) or preview submit for the plugin service path (same as sdk.anonymizer.preview).

Write the spec to YAML:

1 import yaml
2 from pathlib import Path
3 
4 spec_path = Path("/tmp/anonymizer-preview.yaml")
5 spec_path.write_text(yaml.safe_dump(request.model_dump(mode="json", exclude_none=True)))

Run preview locally:

$ nemo anonymizer preview run \
>   --spec-file /tmp/anonymizer-preview.yaml \
>   --workspace "${NMP_WORKSPACE:-default}"

Or submit to the plugin service:

$ nemo anonymizer preview submit \
>   --spec-file /tmp/anonymizer-preview.yaml \
>   --workspace "${NMP_WORKSPACE:-default}" \
>   --base-url "${NMP_BASE_URL:-http://localhost:8080}"

Both commands stream NDJSON frames to stdout. Filter with jq:

$ nemo anonymizer preview run \
>   --spec-file /tmp/anonymizer-preview.yaml \
>   --workspace "${NMP_WORKSPACE:-default}" \
>   > /tmp/anonymizer-preview.ndjson
$ 
$ jq -R 'fromjson? | select(.kind == "preview_dataset") | .records' \
>   /tmp/anonymizer-preview.ndjson

If preview submit returns 404 against the gateway, the plugin service isn’t mounted. Restart nemo services run so the plugin is discovered and remounts /apis/anonymizer/...; see Setup.

Input Source Forms

The plugin accepts three forms for data.source:

Form	`sdk.anonymizer.preview` / `preview submit`	`preview run`
Local path (`/tmp/input.csv`)	No	Yes
HTTP(S) URL (`https://.../input.csv`)	Yes	Yes
Fileset reference	Yes	Yes

Fileset references take any of these forms; the workspace and fileset must already exist:

fileset://<workspace>/<fileset>#<path>
<workspace>/<fileset>#<path>
<fileset>#<path>

The #<path> fragment must resolve to a single .csv or .parquet file. The plugin downloads the file before constructing the Anonymizer library input and cleans up the temp directory when the preview completes.

Validate the Config Independently

If you want to validate just the AnonymizerConfig (and optionally a model_configs YAML) without running a preview, use nemo anonymizer validate:

1 from pathlib import Path
2 import yaml
3 
4 Path("/tmp/anonymizer-config.yaml").write_text(yaml.safe_dump({
5     "replace": {"kind": "redact", "format_template": "[REDACTED_{label}]"},
6 }))

$ nemo anonymizer validate --config /tmp/anonymizer-config.yaml

validate checks the config against the model selection (for example, that a Substitute strategy has a replacement_generator model defined) and prints Config is valid. on success. It does not accept data.source, so input-source validation happens during preview or run.

Next Steps

Run a full job and inspect parquet output in the run tutorial.
Refer to SDK Resources for AnonymizerPreviewResult and AnonymizerResource.preview details.
Learn about rewrite and replacement strategy parameters in the library docs.