Data Designer CLI

View as Markdown

The NeMo Data Designer plugin adds the nemo data-designer command group. Use it to execute Data Designer workloads locally in the CLI process or submit them to NeMo Services.

Configuration Sources

The preview and create commands accept a configuration source path. The most flexible form is a Python file that defines load_config_builder() and returns a configured DataDesignerConfigBuilder instance. This file can have any name; the examples below use product_reviews.py.

1import data_designer.config as dd
2
3
4def load_config_builder() -> dd.DataDesignerConfigBuilder:
5 model_configs = [
6 dd.ModelConfig(
7 provider="default/nvidia-build",
8 model="nvidia/nemotron-3-nano-30b-a3b",
9 alias="text",
10 )
11 ]
12
13 config_builder = dd.DataDesignerConfigBuilder(model_configs)
14 # Add columns, constraints, seed datasets, processors, and profilers here.
15 return config_builder

The same configuration source can usually be used with run or submit. Resource choices determine whether it is compatible with NeMo Services execution; see Execution Modes.

Run Versus Submit

run executes the Data Designer workload locally, in the CLI process. This can be fully local, but it is not an offline-only mode. A local run can still use the Files API, Secrets API, and Inference Gateway API from a running NeMo Services cluster when the configuration references the corresponding resources.

submit sends the workload to NeMo Services. The Data Designer API and Jobs API coordinate execution, job lifecycle, logs, and artifact persistence. The NeMo Services deployment may itself be local or remote.

CommandWorkload executionNeMo Services required?
preview runLocal CLI processOptional
create runLocal CLI processOptional
preview submitData Designer APIYes
create submitJobs workerYes

Preview Locally

Use local preview for fast iteration:

$nemo data-designer preview run product_reviews.py --num-records 5

The workload runs in your current Python environment. It can use local-only resources, NeMo resources, or both.

Create Locally

Use local create when you want to generate a larger dataset without submitting work to NeMo Services:

$nemo data-designer create run product_reviews.py --num-records 1000

This executes the plugin job locally. It is useful for development and for workloads that should stay in the local environment.

Submit Preview to NeMo Services

Submit preview when you want to exercise the Data Designer API path:

$nemo data-designer preview submit product_reviews.py --workspace default

Use this when your configuration should run against NeMo resources and service-side validation.

Submit Create to NeMo Services

Submit create for service-managed dataset generation:

$nemo data-designer create submit product_reviews.py --workspace default --profile default

NeMo Services creates and runs a job. Job logs, status, and artifacts are managed by the Jobs API.

Personas

The plugin also provides commands for Nemotron Personas datasets.

Install personas locally for local execution:

$nemo data-designer personas download --list
$nemo data-designer personas download --locale en_US

Create a Files API Fileset for a persona locale so submit and SDK execution can read it:

$nemo data-designer personas make-fileset \
> --locale en_US \
> --api-key-secret system/ngc-api-key

If you need to create the secret during the same command, set an environment variable with the NGC API key and pass --api-key-env-var:

$nemo data-designer personas make-fileset \
> --locale en_US \
> --api-key-secret system/ngc-api-key \
> --api-key-env-var NGC_API_KEY

SDK Relationship

The SDK currently executes through the Data Designer API. If you need local in-process execution today, use nemo data-designer ... run.