Anonymizer Service
The Anonymizer service detects personally identifiable information (PII) in text data on the NeMo Platform and replaces or rewrites it.
Overview
The service wraps the open-source NVIDIA NeMo Anonymizer library and exposes it through the NeMo Platform’s Python SDK and CLI. The library still owns PII detection, replacement, rewrite, and config validation. The platform adds inference routing through the Inference Gateway, fileset-backed inputs, plugin-service execution for streaming preview, and a Jobs-worker path for full anonymization runs.
How It Works: Library + Platform
The library defines what to anonymize and how. The platform decides where the work runs and how models are reached.
The code snippets below are for conceptual demonstration purposes only. For runnable examples, see the tutorials.
1. Build a config with the library
Use anonymizer.config (installed automatically with the nemo-anonymizer-plugin) to define the replacement strategy:
The library handles: PII detection, the four replacement strategies (Substitute, Redact, Annotate, Hash), the Rewrite mode, and config validation.
Learn more: See the open-source library documentation for detailed coverage of detection, replacement strategies, and rewrite mode.
2. Execute on the platform
Submit the config to the Anonymizer service with the NeMo Platform SDK:
For a full anonymization run, execute the job locally or submit it to the Jobs worker:
The SDK equivalent of run submit is sdk.anonymizer.run(request), which returns an AnonymizerJobResource you can poll with wait_until_done() and pull artifacts from with download_artifacts().
The platform handles: Inference routing through the Inference Gateway, fileset-backed inputs, and authentication.
Key Differences from Standalone Library
When using Anonymizer as a NeMo Platform service:
Replacement Strategies
The library supports four replacement strategies plus a full-passage rewrite mode. The plugin exposes all of them unchanged.
See the library documentation for the configuration shape of each strategy.
What the Plugin Adds
This package is a thin wrapper around the NVIDIA NeMo Anonymizer library. It does not re-document detection, replacement, or rewrite semantics. It adds:
- A
nemo anonymizerCLI withvalidate,preview, andruncommand groups. - An
sdk.anonymizerSDK accessor (AnonymizerResource,AsyncAnonymizerResource). - A streaming
anonymizer.previewfunction that emitspreview_dataset,trace_dataset, andfailed_recordsframes from the plugin service. - An
anonymizer.runjob that writesdataset.parquet,trace.parquet,metadata.json, and optionalfailed_records.json. The job can execute in the local CLI process (nemo anonymizer run run) or on the NeMo Platform Jobs worker (nemo anonymizer run submit/sdk.anonymizer.run). - Fileset input handling (
fileset://<workspace>/<fileset>#<path>). - Inference Gateway routing for model providers referenced from
model_configs.
Next Steps
Walk through preview (anonymizer.preview) and job execution (anonymizer.run) end to end.
Reference for the anonymizer SDK accessor, preview result, and job result objects.
Reference for nemo anonymizer commands and their spec files.
Detection, replacement strategies, rewrite mode, and other library internals.