Tutorials

View as Markdown

These tutorials cover the two user-facing surfaces of the Anonymizer plugin: the streaming preview workflow for iteration, and the run job for full datasets.

Library vs. Service

Anonymizer separates configuration (what to detect and how to replace it) from execution (where the work runs and how models are reached).

Part 1: Build the config (library)

Use anonymizer.config to define the rewrite or replacement strategy and detection options. This code is identical whether you run Anonymizer standalone or through the NeMo Platform service.

1from anonymizer.config.anonymizer_config import AnonymizerConfig
2from anonymizer.config.replace_strategies import Redact
3
4config = AnonymizerConfig(
5 replace=Redact(format_template="[REDACTED_{label}]"),
6)

Part 2: Execute (platform)

Submit the config to the Anonymizer service. The plugin owns the request shape (PreviewRequest, AnonymizerRequest) so it can also describe the input source and model routing:

1import os
2from anonymizer.config.anonymizer_config import AnonymizerConfig
3from anonymizer.config.replace_strategies import Redact
4from data_designer.config import ModelConfig
5from nemo_anonymizer_plugin.app.input import AnonymizerInputSpec
6from nemo_anonymizer_plugin.app.task_config import PreviewRequest
7from nemo_platform import NeMoPlatform
8
9WORKSPACE = os.environ.get("NMP_WORKSPACE", "default")
10MODEL_PROVIDER = os.environ.get("NMP_ANON_PROVIDER", "nvidia-build")
11
12config = AnonymizerConfig(
13 replace=Redact(format_template="[REDACTED_{label}]"),
14)
15
16model_configs = [
17 ModelConfig(alias="gliner-pii-detector", provider=MODEL_PROVIDER, model="nvidia/gliner-pii"),
18 ModelConfig(alias="gpt-oss-120b", provider=MODEL_PROVIDER, model="openai/gpt-oss-120b"),
19 ModelConfig(alias="nemotron-30b-thinking", provider=MODEL_PROVIDER, model="nvidia/nemotron-3-nano-30b-a3b"),
20]
21
22sdk = NeMoPlatform(
23 base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
24 workspace=WORKSPACE,
25)
26anonymizer = sdk.anonymizer
27
28preview = anonymizer.preview(PreviewRequest(
29 config=config,
30 data=AnonymizerInputSpec(
31 source=f"fileset://{WORKSPACE}/anonymizer-inputs#anonymizer-input.csv",
32 text_column="biography",
33 id_column="id",
34 ),
35 model_configs=model_configs,
36 num_records=10,
37))

Service-Specific Considerations

When using Anonymizer as a NeMo Platform service:

FeatureDifferenceDetails
InferenceRoutes through the Inference GatewayConfigure providers once and reference them by name from model_configs.
Input dataFilesets and HTTP(S) URLs (local paths only in local CLI execution)Use sdk.files.filesets.create / sdk.files.upload, then reference with #<path>.
ArtifactsLocal or platform-managedrun run writes to persistent/results/artifacts locally; run submit stores artifacts in NeMo Platform job storage.

Prerequisites

Complete Setup to install NeMo Platform, run nemo services run, and configure an inference provider. The root workspace includes the Anonymizer plugin, so nemo services run discovers it automatically and mounts /apis/anonymizer/... on the gateway — no separate plugin install step is needed. Verify the CLI is registered:

$nemo anonymizer --help

You should see validate, preview, and run command groups.

These tutorials route inference through an Inference Gateway provider, so a NeMo Platform cluster must be running before you preview or run a job. The examples reference the default NVIDIA Build provider created during setup.

nemo setup pre-configures a default/nvidia-build model provider during local startup. This provider routes inference requests to models hosted on build.nvidia.com using the API base URL https://integrate.api.nvidia.com and the NGC API key with Public API Endpoints permissions provided during deployment.

You can verify this provider exists by running nemo inference providers list --workspace default.

The tutorials in these docs use this provider for inference, but you can alternatively create your own and use it instead.

Upload an Input Fileset

sdk.anonymizer.preview, preview submit, and run submit reject local file paths, so the tutorials read from a fileset. Create a small CSV containing PII and upload it to a fileset named anonymizer-inputs:

1import os
2import tempfile
3from pathlib import Path
4
5from nemo_platform import NeMoPlatform
6from nemo_platform._exceptions import ConflictError
7
8WORKSPACE = os.environ.get("NMP_WORKSPACE", "default")
9FILESET = "anonymizer-inputs"
10INPUT_FILENAME = "anonymizer-input.csv"
11
12sdk = NeMoPlatform(
13 base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
14 workspace=WORKSPACE,
15)
16
17with tempfile.NamedTemporaryFile("w", suffix=".csv", delete=False) as f:
18 f.write(
19 "id,biography\n"
20 "1,Alice Johnson lives in Seattle and works at NVIDIA.\n"
21 "2,Bob Smith can be reached at bob.smith@example.com.\n"
22 )
23 input_path = Path(f.name)
24
25try:
26 sdk.files.filesets.create(
27 name=FILESET,
28 workspace=WORKSPACE,
29 description="Anonymizer input files",
30 )
31except ConflictError:
32 pass # already exists
33
34sdk.files.upload(
35 local_path=str(input_path),
36 fileset=FILESET,
37 workspace=WORKSPACE,
38 remote_path=INPUT_FILENAME,
39)

The tutorials reference this file with fileset://{WORKSPACE}/anonymizer-inputs#anonymizer-input.csv.

Tutorials