Getting Started with NeMo Safe Synthesizer

View as Markdown

Get started with NeMo Safe Synthesizer for generating private synthetic versions of sensitive tabular datasets on a host GPU.

Prerequisites

Before using NeMo Safe Synthesizer, complete Setup to install the CLI/SDK.

NeMo Safe Synthesizer has the following additional requirements:

  • An NVIDIA GPU on the host machine with 80GB+ VRAM (check with nvidia-smi). This is separate from any GPU inside a NIM container; Safe Synthesizer training runs directly on the host.
  • Sufficient disk space for generated datasets (50GB+ recommended)

For general platform troubleshooting (port conflicts, health checks, and so on), refer to Setup.

nemo setup pre-configures a default/nvidia-build model provider during local startup. This provider routes inference requests to models hosted on build.nvidia.com using the API base URL https://integrate.api.nvidia.com and the NGC API key with Public API Endpoints permissions provided during deployment.

You can verify this provider exists by running nemo inference providers list --workspace default.

The tutorials in these docs use this provider for inference, but you can alternatively create your own and use it instead.


Host-local CLI

For GPU development on your machine, install the Safe Synthesizer plugin from this repository and use nemo safe-synthesizer run-local (see Local and Subprocess Execution):

$BOOTSTRAP_LOCAL_PLUGIN_DIRS=plugins/nemo-safe-synthesizer make bootstrap-python
$uv run nemo safe-synthesizer runtime setup
$uv run nemo safe-synthesizer run-local \
> --spec-file ./nss-job.json \
> --data-source ./input.csv \
> --output-dir ./nss-output

The run-local command launches the Safe Synthesizer task in a separate runtime Python subprocess. The nemo safe-synthesizer CLI today exposes run-local and runtime only; platform job submission uses the Jobs API or SDK.


Next Steps

Create your first synthetic dataset: