> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://nemo-platform.docs.buildwithfern.com/nemo/platform/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://nemo-platform.docs.buildwithfern.com/nemo/platform/_mcp/server.

# Preview a Config

<a id="anonymizer-tutorials-preview" />

This tutorial walks through the `anonymizer.preview` flow: defining a request, streaming a small anonymized sample, and inspecting the resulting frames.

For detection and replacement strategy details, see the [open-source library documentation](https://github.com/NVIDIA-NeMo/Anonymizer/tree/main/docs).

## Prerequisites

Complete the [tutorials prerequisites](/documentation/anonymize-data/tutorials#prerequisites), which cover:

* A running NeMo Platform cluster with the `nemo anonymizer` CLI available (see [Setup](/documentation/get-started)).
* An inference provider configured (default examples use `nvidia-build`).
* A fileset named `anonymizer-inputs` with `anonymizer-input.csv` uploaded.

## What `preview` Does

`preview` runs the Anonymizer pipeline on a small number of records and streams results back from the plugin service. Both surfaces (Python SDK and CLI) produce the same data; the SDK collects it into a `pandas` DataFrame while the CLI prints newline-delimited JSON frames.

The plugin emits four frame kinds:

| Frame                        | Purpose                                                                         |
| ---------------------------- | ------------------------------------------------------------------------------- |
| `log`                        | Pipeline log lines forwarded from the library.                                  |
| `preview_dataset`            | User-facing anonymized records. The final dataset you usually care about.       |
| `trace_dataset`              | Internal trace records used by `display_record` in notebooks.                   |
| `failed_records`             | Per-record failures with reasons. Emitted only when at least one record failed. |
| `heartbeat`, `done`, `error` | Generic stream control frames.                                                  |

Available preview surfaces:

| Surface                          | Where it runs                      | Local paths | `model_configs` required |
| -------------------------------- | ---------------------------------- | ----------- | ------------------------ |
| `sdk.anonymizer.preview(...)`    | Anonymizer plugin service (remote) | Rejected    | Required                 |
| `nemo anonymizer preview run`    | Local CLI process                  | Allowed     | Optional                 |
| `nemo anonymizer preview submit` | Anonymizer plugin service (remote) | Rejected    | Required                 |

## Step 1: Build a `PreviewRequest`

A preview request bundles the `AnonymizerConfig`, the input spec, and optional `model_configs` / `selected_models`.

```python
import os
from anonymizer.config.anonymizer_config import AnonymizerConfig
from anonymizer.config.replace_strategies import Redact
from data_designer.config import ModelConfig
from nemo_anonymizer_plugin.app.input import AnonymizerInputSpec
from nemo_anonymizer_plugin.app.task_config import PreviewRequest

WORKSPACE = os.environ.get("NMP_WORKSPACE", "default")
MODEL_PROVIDER = os.environ.get("NMP_ANON_PROVIDER", "nvidia-build")

config = AnonymizerConfig(
    replace=Redact(format_template="[REDACTED_{label}]"),
)

model_configs = [
    ModelConfig(alias="gliner-pii-detector", provider=MODEL_PROVIDER, model="nvidia/gliner-pii"),
    ModelConfig(alias="gpt-oss-120b", provider=MODEL_PROVIDER, model="openai/gpt-oss-120b"),
    ModelConfig(alias="nemotron-30b-thinking", provider=MODEL_PROVIDER, model="nvidia/nemotron-3-nano-30b-a3b"),
]

request = PreviewRequest(
    config=config,
    data=AnonymizerInputSpec(
        source=f"fileset://{WORKSPACE}/anonymizer-inputs#anonymizer-input.csv",
        text_column="biography",
        id_column="id",
    ),
    model_configs=model_configs,
    num_records=2,
)
```

Field reference:

| Field               | Type                                                                           | Notes                                                                                                                                                                                            |
| ------------------- | ------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `config`            | [`AnonymizerConfig`](https://github.com/NVIDIA-NeMo/Anonymizer/tree/main/docs) | The library config. The example uses `Redact`; the plugin also supports `substitute`, `annotate`, `hash`, and `rewrite`.                                                                         |
| `data.source`       | string                                                                         | Local path, `http(s)` URL, or fileset reference. See [input source forms](#input-source-forms).                                                                                                  |
| `data.text_column`  | string                                                                         | Column containing text to anonymize. Defaults to `text`.                                                                                                                                         |
| `data.id_column`    | string                                                                         | Optional record identifier column.                                                                                                                                                               |
| `data.data_summary` | string                                                                         | Optional short description passed to Anonymizer library prompts.                                                                                                                                 |
| `model_configs`     | list                                                                           | Data Designer `ModelConfig` entries. `provider` must reference an Inference Gateway provider name (or `workspace/provider`). Omit to use Anonymizer library defaults (CLI local execution only). |
| `selected_models`   | object                                                                         | Optional `detection` / `replace` / `rewrite` overrides on top of the bundled defaults. Requires `model_configs`.                                                                                 |
| `num_records`       | int (≥ 1)                                                                      | Number of records to preview. Defaults to 10.                                                                                                                                                    |

## Step 2: Run with the Python SDK

```python
import os
from nemo_platform import NeMoPlatform

sdk = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace=WORKSPACE,
)
preview = sdk.anonymizer.preview(request)

preview.dataset            # pandas DataFrame of anonymized records
preview.trace_dataset      # detection trace
preview.failed_records     # list[dict] of per-record failures (usually empty)
preview.display_record(0)  # render record 0 with entity highlights in a notebook
```

`AnonymizerResource.preview` calls the plugin service, so the same constraints as `preview submit` apply: use a fileset or `http(s)` source, and include `model_configs`.

`AsyncAnonymizerResource.preview` has the same signature and return type:

```python
import os
from nemo_platform import AsyncNeMoPlatform

async_sdk = AsyncNeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace=WORKSPACE,
)
preview = await async_sdk.anonymizer.preview(request)
```

## Step 3: Persist Preview Records

`AnonymizerPreviewResult.dataset` and `trace_dataset` are pandas DataFrames, so save them with standard pandas methods:

```python
preview.dataset.to_csv("anonymized-preview.csv", index=False)
preview.dataset.to_parquet("anonymized-preview.parquet", index=False)
```

`AnonymizerPreviewResult` stores everything in memory; nothing is persisted to disk by default. The `dataset` field is a regular pandas DataFrame and can be saved with `to_csv` or `to_parquet`.

## Step 4: Run from the CLI (Alternative)

The CLI accepts the same request shape as a YAML spec file. Use `preview run` for local execution (allows local paths, model configs optional) or `preview submit` for the plugin service path (same as `sdk.anonymizer.preview`).

Write the spec to YAML:

```python
import yaml
from pathlib import Path

spec_path = Path("/tmp/anonymizer-preview.yaml")
spec_path.write_text(yaml.safe_dump(request.model_dump(mode="json", exclude_none=True)))
```

Run preview locally:

```bash
nemo anonymizer preview run \
  --spec-file /tmp/anonymizer-preview.yaml \
  --workspace "${NMP_WORKSPACE:-default}"
```

Or submit to the plugin service:

```bash
nemo anonymizer preview submit \
  --spec-file /tmp/anonymizer-preview.yaml \
  --workspace "${NMP_WORKSPACE:-default}" \
  --base-url "${NMP_BASE_URL:-http://localhost:8080}"
```

Both commands stream NDJSON frames to stdout. Filter with `jq`:

```bash
nemo anonymizer preview run \
  --spec-file /tmp/anonymizer-preview.yaml \
  --workspace "${NMP_WORKSPACE:-default}" \
  > /tmp/anonymizer-preview.ndjson

jq -R 'fromjson? | select(.kind == "preview_dataset") | .records' \
  /tmp/anonymizer-preview.ndjson
```

If `preview submit` returns 404 against the gateway, the plugin service isn't mounted. Restart `nemo services run` so the plugin is discovered and remounts `/apis/anonymizer/...`; see [Setup](/documentation/get-started).

## Input Source Forms

The plugin accepts three forms for `data.source`:

| Form                                  | `sdk.anonymizer.preview` / `preview submit` | `preview run` |
| ------------------------------------- | ------------------------------------------- | ------------- |
| Local path (`/tmp/input.csv`)         | No                                          | Yes           |
| HTTP(S) URL (`https://.../input.csv`) | Yes                                         | Yes           |
| Fileset reference                     | Yes                                         | Yes           |

Fileset references take any of these forms; the workspace and fileset must already exist:

```text
fileset://<workspace>/<fileset>#<path>
<workspace>/<fileset>#<path>
<fileset>#<path>
```

The `#<path>` fragment must resolve to a single `.csv` or `.parquet` file. The plugin downloads the file before constructing the Anonymizer library input and cleans up the temp directory when the preview completes.

## Validate the Config Independently

If you want to validate just the `AnonymizerConfig` (and optionally a `model_configs` YAML) without running a preview, use `nemo anonymizer validate`:

```python
from pathlib import Path
import yaml

Path("/tmp/anonymizer-config.yaml").write_text(yaml.safe_dump({
    "replace": {"kind": "redact", "format_template": "[REDACTED_{label}]"},
}))
```

```bash
nemo anonymizer validate --config /tmp/anonymizer-config.yaml
```

`validate` checks the config against the model selection (for example, that a `Substitute` strategy has a `replacement_generator` model defined) and prints `Config is valid.` on success. It does not accept `data.source`, so input-source validation happens during `preview` or `run`.

## Next Steps

* Run a full job and inspect parquet output in the [run tutorial](/documentation/anonymize-data/tutorials/run-an-anonymizer-job).
* Refer to [SDK Resources](/documentation/anonymize-data/sdk-resources) for `AnonymizerPreviewResult` and `AnonymizerResource.preview` details.
* Learn about rewrite and replacement strategy parameters in the [library docs](https://github.com/NVIDIA-NeMo/Anonymizer/tree/main/docs).