Preview a Config
This tutorial walks through the anonymizer.preview flow: defining a request, streaming a small anonymized sample, and inspecting the resulting frames.
For detection and replacement strategy details, see the open-source library documentation.
Prerequisites
Complete the tutorials prerequisites, which cover:
- A running NeMo Platform cluster with the
nemo anonymizerCLI available (see Setup). - An inference provider configured (default examples use
nvidia-build). - A fileset named
anonymizer-inputswithanonymizer-input.csvuploaded.
What preview Does
preview runs the Anonymizer pipeline on a small number of records and streams results back from the plugin service. Both surfaces (Python SDK and CLI) produce the same data; the SDK collects it into a pandas DataFrame while the CLI prints newline-delimited JSON frames.
The plugin emits four frame kinds:
Available preview surfaces:
Step 1: Build a PreviewRequest
A preview request bundles the AnonymizerConfig, the input spec, and optional model_configs / selected_models.
Field reference:
Step 2: Run with the Python SDK
AnonymizerResource.preview calls the plugin service, so the same constraints as preview submit apply: use a fileset or http(s) source, and include model_configs.
Async client
AsyncAnonymizerResource.preview has the same signature and return type:
Step 3: Persist Preview Records
AnonymizerPreviewResult.dataset and trace_dataset are pandas DataFrames, so save them with standard pandas methods:
More about preview results
AnonymizerPreviewResult stores everything in memory; nothing is persisted to disk by default. The dataset field is a regular pandas DataFrame and can be saved with to_csv or to_parquet.
Step 4: Run from the CLI (Alternative)
The CLI accepts the same request shape as a YAML spec file. Use preview run for local execution (allows local paths, model configs optional) or preview submit for the plugin service path (same as sdk.anonymizer.preview).
Write the spec to YAML:
Run preview locally:
Or submit to the plugin service:
Both commands stream NDJSON frames to stdout. Filter with jq:
If preview submit returns 404 against the gateway, the plugin service isn’t mounted. Restart nemo services run so the plugin is discovered and remounts /apis/anonymizer/...; see Setup.
Input Source Forms
The plugin accepts three forms for data.source:
Fileset references take any of these forms; the workspace and fileset must already exist:
The #<path> fragment must resolve to a single .csv or .parquet file. The plugin downloads the file before constructing the Anonymizer library input and cleans up the temp directory when the preview completes.
Validate the Config Independently
If you want to validate just the AnonymizerConfig (and optionally a model_configs YAML) without running a preview, use nemo anonymizer validate:
validate checks the config against the model selection (for example, that a Substitute strategy has a replacement_generator model defined) and prints Config is valid. on success. It does not accept data.source, so input-source validation happens during preview or run.
Next Steps
- Run a full job and inspect parquet output in the run tutorial.
- Refer to SDK Resources for
AnonymizerPreviewResultandAnonymizerResource.previewdetails. - Learn about rewrite and replacement strategy parameters in the library docs.