> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://nemo-platform.docs.buildwithfern.com/nemo/platform/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://nemo-platform.docs.buildwithfern.com/nemo/platform/_mcp/server.

# Tutorials

<a id="data-designer-tutorials" />

These tutorials demonstrate how to build Data Designer configurations and execute them through the NeMo Data Designer plugin.

The code snippets on this page are for conceptual demonstration purposes only.
For runnable examples, jump ahead to the [Basics](/documentation/design-synthetic-data/tutorials/the-basics) or [Seeding](/documentation/design-synthetic-data/tutorials/seeding-with-external-datasets) tutorial.

## Configuration and Execution

Data Designer separates **configuration** (building dataset schemas) from **execution** (generating the data).

**Part 1: Build Configs (Library)**

Use `data_designer.config` to define your dataset. See the [library documentation](https://docs.nvidia.com/nemo/datadesigner/v0.6.1/getting-started/welcome) for comprehensive guides on column types, constraints, and processors.

```python
import data_designer.config as dd

config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(dd.SamplerColumnConfig(...))
config_builder.add_column(dd.LLMTextColumnConfig(...))
```

**Part 2: Execute (Plugin)**

Run the configuration locally with the CLI, submit it to NeMo Services, or call the Data Designer API from the SDK:

```bash
nemo data-designer preview run product_reviews.py --num-records 5
nemo data-designer create submit product_reviews.py --workspace default --num-records 30
```

SDK execution uses the Data Designer API today:

```python
import os
from nemo_platform import NeMoPlatform

client = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace="default",
)
data_designer = client.data_designer
preview = data_designer.preview(config_builder)
job = data_designer.create(config_builder, num_records=1000)
```

`run` versus `submit` primarily controls where the workload executes. Local `run` can still use the Files API, Secrets API, and Inference Gateway API from a running NeMo Services cluster when the configuration references the corresponding resources. See [Execution Modes](/documentation/design-synthetic-data/execution-modes) for details.

## Execution-Specific Considerations

When running through the plugin, supported resources depend on the execution mode:

| Feature       | CLI `run`                                          | CLI `submit` / SDK                |
| ------------- | -------------------------------------------------- | --------------------------------- |
| **Inference** | Local providers and/or Inference Gateway providers | Inference Gateway providers       |
| **Seed data** | Local sources, HuggingFace, or Files API Filesets  | HuggingFace or Files API Filesets |
| **Secrets**   | Environment, plaintext, or Secrets API secrets     | Secrets API secrets               |
| **Artifacts** | Local execution artifacts                          | Job artifact storage              |

## Prerequisites

These tutorials use an [Inference Gateway](/documentation/models-and-inference) provider for model calls, so a NeMo Services cluster must be running before you preview or create data — including with local CLI `run` (see [Execution Modes](/documentation/design-synthetic-data/execution-modes#local-nemo-services-execution) for more about this distinction).
Complete [Setup](/documentation/get-started) to ensure you have the NeMo Services running locally and an inference provider available.
These tutorials reference the default NVIDIA Build model provider, which is created as `default/nvidia-build` during setup.

## Tutorials

Generate a product review dataset using samplers and LLM-generated text. Learn the fundamentals of building configurations and executing jobs.

<small>
  beginner

   

  data-designer
</small>

Use external datasets to ground synthetic data generation. Generate realistic patient medical notes from symptom-to-diagnosis data.

<small>
  intermediate

   

  data-designer
</small>