Anonymizer NeMo Platform SDK Resources

The anonymizer.config module (from the NVIDIA NeMo Anonymizer library) builds AnonymizerConfig objects in a context-agnostic way. Once you are ready to execute that config against the NeMo Platform Anonymizer service, you use objects from the nemo_platform SDK. This page describes the NeMo Platform-specific objects.

AnonymizerResource

The AnonymizerResource is the entry point for working with Anonymizer on NeMo Platform. It wraps the streaming preview endpoint and job submission for the plugin service.

A AnonymizerResource is accessed directly from a NeMoPlatform instance:

1 import os
2 from nemo_platform import NeMoPlatform
3 
4 sdk = NeMoPlatform(
5     base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
6     workspace="default",
7 )
8 anonymizer = sdk.anonymizer  # AnonymizerResource

An AsyncAnonymizerResource with the same surface is available via AsyncNeMoPlatform.anonymizer.

Method	Description
`preview(request, *, workspace=None)`	Runs a streaming preview against the plugin service and returns an `AnonymizerPreviewResult` after the stream completes.
`run(request, *, workspace=None, wait_until_done=False)`	Submits an `anonymizer.run` job to the NeMo Platform Jobs worker. Returns an `AnonymizerJobResource`. When `wait_until_done=True`, blocks until the job reaches a terminal state.
`get_job_resource(job_name, workspace=None)`	Returns an `AnonymizerJobResource` for an existing job (by job name).

request is a PreviewRequest or AnonymizerRequest instance from nemo_anonymizer_plugin.app.task_config. Both accept the same config, data, model_configs, and selected_models fields; PreviewRequest adds num_records.

Both preview and run call the plugin service, so they require model_configs and reject local file paths in data.source — use a fileset reference or http(s) URL.

AnonymizerPreviewResult

AnonymizerResource.preview collects the frame stream and returns an AnonymizerPreviewResult once the stream completes.

Attribute / Method	Description
`dataset`	`pandas.DataFrame` of anonymized records (the `preview_dataset` frame contents).
`trace_dataset`	`pandas.DataFrame` with detection trace columns (the `trace_dataset` frame contents).
`failed_records`	`list[dict]` of per-record failures with reasons. Empty when nothing failed.
`display_record(index=None)`	Renders a single trace record as HTML in a notebook. When `index` is omitted, cycles through records.

More about preview results

AnonymizerPreviewResult holds everything in memory; nothing is persisted to disk by default. The dataset and trace_dataset fields are regular pandas DataFrames and can be saved with to_csv / to_parquet.

AnonymizerJobResource

AnonymizerResource.run returns an AnonymizerJobResource. You can also use AnonymizerResource.get_job_resource to get one for an existing job.

1 job = sdk.anonymizer.run(run_request)
2 job.wait_until_done()
3 results = job.download_artifacts()
4 dataset = results.load_dataset()

Method	Description
`get_job()`	Returns the raw job record from the jobs service.
`get_job_status()`	Returns the current `PlatformJobStatus`.
`check_if_complete(*, raise_if_not_complete=False)`	Returns `True` when the job is `completed`. Returns `False` (or raises) for terminal incomplete and running states.
`wait_until_done()`	Polls the jobs service until the job reaches a terminal state. Logs progress as it goes.
`get_logs()`	Returns logs from the job as a list of dicts. Handles pagination automatically.
`download_artifacts(path=None)`	Downloads the job artifacts tarball and unarchives it. Returns an `AnonymizerJobResults` object.

The async variant (AsyncAnonymizerJobResource) exposes the same surface with async def methods.

AnonymizerJobResults

download_artifacts returns an AnonymizerJobResults object that loads parquet / JSON artifacts into memory. The same class also works for the local run run flow — point it at the artifact directory the local job results manager logs:

1 from pathlib import Path
2 from nemo_anonymizer_plugin.sdk.job_results import AnonymizerJobResults
3 
4 results = AnonymizerJobResults(Path("/path/to/persistent/results/artifacts"))
5 dataset = results.load_dataset()

Method	Description
`load_dataset()`	Returns the anonymized dataset as a `pandas.DataFrame` (`dataset.parquet`).
`load_trace()`	Returns the trace dataframe (`trace.parquet`). The `original_text_column` from `metadata.json` is attached for `display_record`.
`load_failed_records()`	Returns `failed_records.json` as `list[dict]`. Returns `[]` when the file isn’t present.
`display_record(index=None)`	Renders a single trace record as HTML in a notebook. When `index` is omitted, cycles through records.

More about job results

AnonymizerJobResults reads files lazily — methods load the corresponding parquet or JSON only when called. The underlying directory layout is:

<artifacts_dir>/
  dataset.parquet
  trace.parquet
  metadata.json
  failed_records.json   # only when there were failures

By default, download_artifacts saves the tarball contents to a local directory named after the job; pass path= to override.

Request Models

Both request models live in nemo_anonymizer_plugin.app.task_config.

Request Fields

AnonymizerRequest defines the execution fields below, run jobs use AnonymizerRequest directly and process the full input file.

Field	Type	Description
`config`	`AnonymizerConfig`	Upstream library config (replace strategy or rewrite, detection params).
`data`	`AnonymizerInputSpec`	Input source plus column metadata. See below.
`model_configs`	`list[data_designer.config.ModelConfig] \	None`
`selected_models`	`SelectedModelsOverrides \	None`

PreviewRequest extends AnonymizerRequest with num_records

Field	Type	Description
`config`	`AnonymizerConfig`	Upstream library config (replace strategy or rewrite, detection params).
`data`	`AnonymizerInputSpec`	Input source plus column metadata. See below.
`model_configs`	`list[data_designer.config.ModelConfig] \	None`
`selected_models`	`SelectedModelsOverrides \	None`
`num_records`	`int` (≥ 1, default `10`)	Preview-only. Number of records to preview. Capped by the service’s `preview_num_records.max`.

AnonymizerInputSpec

The plugin-owned API-boundary input spec:

Field	Type	Description
`source`	`str`	Local path, `http(s)` URL, or fileset reference for a CSV / Parquet file.
`text_column`	`str` (default `"text"`)	Column containing text to anonymize.
`id_column`	`str \	None`
`data_summary`	`str \	None`

Fileset references can take any of the three forms fileset://<workspace>/<fileset>#<path>, <workspace>/<fileset>#<path>, or <fileset>#<path>, and must resolve to a single .csv or .parquet file.

SelectedModelsOverrides

Partial role → alias overrides for the three workflows. Each section is optional and is merged on top of the bundled default selection by the library.

Field	Type	Description
`detection`	`dict[str, str \	list[str]] \
`replace`	`dict[str, str] \	None`
`rewrite`	`dict[str, str] \	None`

Supplying overrides without model_configs raises a config validation error.