Tutorials

View as Markdown

These tutorials demonstrate how to build Data Designer configurations and execute them through the NeMo Data Designer plugin.

The code snippets on this page are for conceptual demonstration purposes only. For runnable examples, jump ahead to the Basics or Seeding tutorial.

Configuration and Execution

Data Designer separates configuration (building dataset schemas) from execution (generating the data).

Part 1: Build Configs (Library)

Use data_designer.config to define your dataset. See the library documentation for comprehensive guides on column types, constraints, and processors.

1import data_designer.config as dd
2
3config_builder = dd.DataDesignerConfigBuilder(model_configs)
4config_builder.add_column(dd.SamplerColumnConfig(...))
5config_builder.add_column(dd.LLMTextColumnConfig(...))

Part 2: Execute (Plugin)

Run the configuration locally with the CLI, submit it to NeMo Services, or call the Data Designer API from the SDK:

$nemo data-designer preview run product_reviews.py --num-records 5
$nemo data-designer create submit product_reviews.py --workspace default --num-records 30

SDK execution uses the Data Designer API today:

1import os
2from nemo_platform import NeMoPlatform
3
4client = NeMoPlatform(
5 base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
6 workspace="default",
7)
8data_designer = client.data_designer
9preview = data_designer.preview(config_builder)
10job = data_designer.create(config_builder, num_records=1000)

run versus submit primarily controls where the workload executes. Local run can still use the Files API, Secrets API, and Inference Gateway API from a running NeMo Services cluster when the configuration references the corresponding resources. See Execution Modes for details.

Execution-Specific Considerations

When running through the plugin, supported resources depend on the execution mode:

FeatureCLI runCLI submit / SDK
InferenceLocal providers and/or Inference Gateway providersInference Gateway providers
Seed dataLocal sources, HuggingFace, or Files API FilesetsHuggingFace or Files API Filesets
SecretsEnvironment, plaintext, or Secrets API secretsSecrets API secrets
ArtifactsLocal execution artifactsJob artifact storage

Prerequisites

These tutorials use an Inference Gateway provider for model calls, so a NeMo Services cluster must be running before you preview or create data — including with local CLI run (see Execution Modes for more about this distinction). Complete Setup to ensure you have the NeMo Services running locally and an inference provider available. These tutorials reference the default NVIDIA Build model provider, which is created as default/nvidia-build during setup.

Tutorials