Tutorials

These tutorials cover the two user-facing surfaces of the Anonymizer plugin: the streaming preview workflow for iteration, and the run job for full datasets.

Library vs. Service

Anonymizer separates configuration (what to detect and how to replace it) from execution (where the work runs and how models are reached).

Part 1: Build the config (library)

Use anonymizer.config to define the rewrite or replacement strategy and detection options. This code is identical whether you run Anonymizer standalone or through the NeMo Platform service.

1 from anonymizer.config.anonymizer_config import AnonymizerConfig
2 from anonymizer.config.replace_strategies import Redact
3 
4 config = AnonymizerConfig(
5     replace=Redact(format_template="[REDACTED_{label}]"),
6 )

Part 2: Execute (platform)

Submit the config to the Anonymizer service. The plugin owns the request shape (PreviewRequest, AnonymizerRequest) so it can also describe the input source and model routing:

1 import os
2 from anonymizer.config.anonymizer_config import AnonymizerConfig
3 from anonymizer.config.replace_strategies import Redact
4 from data_designer.config import ModelConfig
5 from nemo_anonymizer_plugin.app.input import AnonymizerInputSpec
6 from nemo_anonymizer_plugin.app.task_config import PreviewRequest
7 from nemo_platform import NeMoPlatform
8 
9 WORKSPACE = os.environ.get("NMP_WORKSPACE", "default")
10 MODEL_PROVIDER = os.environ.get("NMP_ANON_PROVIDER", "nvidia-build")
11 
12 config = AnonymizerConfig(
13     replace=Redact(format_template="[REDACTED_{label}]"),
14 )
15 
16 model_configs = [
17     ModelConfig(alias="gliner-pii-detector", provider=MODEL_PROVIDER, model="nvidia/gliner-pii"),
18     ModelConfig(alias="gpt-oss-120b", provider=MODEL_PROVIDER, model="openai/gpt-oss-120b"),
19     ModelConfig(alias="nemotron-30b-thinking", provider=MODEL_PROVIDER, model="nvidia/nemotron-3-nano-30b-a3b"),
20 ]
21 
22 sdk = NeMoPlatform(
23     base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
24     workspace=WORKSPACE,
25 )
26 anonymizer = sdk.anonymizer
27 
28 preview = anonymizer.preview(PreviewRequest(
29     config=config,
30     data=AnonymizerInputSpec(
31         source=f"fileset://{WORKSPACE}/anonymizer-inputs#anonymizer-input.csv",
32         text_column="biography",
33         id_column="id",
34     ),
35     model_configs=model_configs,
36     num_records=10,
37 ))

Service-Specific Considerations

When using Anonymizer as a NeMo Platform service:

Feature	Difference	Details
Inference	Routes through the Inference Gateway	Configure providers once and reference them by name from `model_configs`.
Input data	Filesets and HTTP(S) URLs (local paths only in local CLI execution)	Use `sdk.files.filesets.create` / `sdk.files.upload`, then reference with `#<path>`.
Artifacts	Local or platform-managed	`run run` writes to `persistent/results/artifacts` locally; `run submit` stores artifacts in NeMo Platform job storage.

Prerequisites

Complete Setup to install NeMo Platform, run nemo services run, and configure an inference provider. The root workspace includes the Anonymizer plugin, so nemo services run discovers it automatically and mounts /apis/anonymizer/... on the gateway — no separate plugin install step is needed. Verify the CLI is registered:

$ nemo anonymizer --help

You should see validate, preview, and run command groups.

These tutorials route inference through an Inference Gateway provider, so a NeMo Platform cluster must be running before you preview or run a job. The examples reference the default NVIDIA Build provider created during setup.

nemo setup pre-configures a default/nvidia-build model provider during local startup. This provider routes inference requests to models hosted on build.nvidia.com using the API base URL https://integrate.api.nvidia.com and the NGC API key with Public API Endpoints permissions provided during deployment.

You can verify this provider exists by running nemo inference providers list --workspace default.

The tutorials in these docs use this provider for inference, but you can alternatively create your own and use it instead.

Upload an Input Fileset

sdk.anonymizer.preview, preview submit, and run submit reject local file paths, so the tutorials read from a fileset. Create a small CSV containing PII and upload it to a fileset named anonymizer-inputs:

1 import os
2 import tempfile
3 from pathlib import Path
4 
5 from nemo_platform import NeMoPlatform
6 from nemo_platform._exceptions import ConflictError
7 
8 WORKSPACE = os.environ.get("NMP_WORKSPACE", "default")
9 FILESET = "anonymizer-inputs"
10 INPUT_FILENAME = "anonymizer-input.csv"
11 
12 sdk = NeMoPlatform(
13     base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
14     workspace=WORKSPACE,
15 )
16 
17 with tempfile.NamedTemporaryFile("w", suffix=".csv", delete=False) as f:
18     f.write(
19         "id,biography\n"
20         "1,Alice Johnson lives in Seattle and works at NVIDIA.\n"
21         "2,Bob Smith can be reached at bob.smith@example.com.\n"
22     )
23     input_path = Path(f.name)
24 
25 try:
26     sdk.files.filesets.create(
27         name=FILESET,
28         workspace=WORKSPACE,
29         description="Anonymizer input files",
30     )
31 except ConflictError:
32     pass  # already exists
33 
34 sdk.files.upload(
35     local_path=str(input_path),
36     fileset=FILESET,
37     workspace=WORKSPACE,
38     remote_path=INPUT_FILENAME,
39 )

The tutorials reference this file with fileset://{WORKSPACE}/anonymizer-inputs#anonymizer-input.csv.

Preview a Config

Stream a small anonymized sample to iterate on AnonymizerConfig and model_configs. Covers sdk.anonymizer.preview, nemo anonymizer preview run / preview submit, and the NDJSON frame stream.

beginner anonymizer

Run an Anonymizer Job

Run the full pipeline locally with nemo anonymizer run run or submit it to the Jobs worker with nemo anonymizer run submit. Load dataset.parquet, trace.parquet, and failed_records.json artifacts.

intermediate anonymizer