> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://nemo-platform.docs.buildwithfern.com/nemo/platform/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://nemo-platform.docs.buildwithfern.com/nemo/platform/_mcp/server.

# Data Designer

<a id="data-designer" />

Data Designer on NeMo Platform enables high-quality synthetic data generation through the NeMo Data Designer plugin. You can execute workloads locally from the CLI, submit them to a running NeMo Services cluster, or call the Data Designer API from the SDK.

## Overview

Data Designer is a framework for orchestrating complex synthetic data generation workflows. It coordinates LLM calls, manages dependencies between data fields, handles batching and parallelization, and validates generated data against specifications.

The plugin is built on the open-source [NVIDIA NeMo Data Designer library](https://docs.nvidia.com/nemo/datadesigner/v0.6.1/getting-started/welcome) ([GitHub](https://github.com/NVIDIA-NeMo/DataDesigner)). The library provides the configuration and generation engine; the plugin provides CLI, SDK, Data Designer API, Jobs, Files API, Secrets API, and Inference Gateway API integration.

## How It Works

Data Designer separates **configuration** from **execution**.

The code snippets below are for conceptual demonstration purposes only.
For runnable examples, see the [tutorials](/documentation/design-synthetic-data/tutorials).

### 1. Build Configurations

Use `data_designer.config` to define the dataset you want to generate:

```python
import data_designer.config as dd

# Define models
model_configs = [
    dd.ModelConfig(
        provider="default/nvidia-build",
        model="nvidia/nemotron-3-nano-30b-a3b",
        alias="text",
    )
]

# Build configuration
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(dd.SamplerColumnConfig(...))
config_builder.add_column(dd.LLMTextColumnConfig(...))
```

Configuration code describes the dataset schema, columns, dependencies, constraints, seed data, processors, profilers, and inference settings.

**Learn more**: See the [library documentation](https://docs.nvidia.com/nemo/datadesigner/v0.6.1/getting-started/welcome) for comprehensive guides on column types, samplers, constraints, and advanced features.

### 2. Choose Where to Execute

The same configuration can run through different plugin surfaces:

| Interface                             | Execution location               | NeMo Services required? | Best for                                                                 |
| ------------------------------------- | -------------------------------- | ----------------------- | ------------------------------------------------------------------------ |
| `nemo data-designer ... run`          | Local CLI process                | Optional                | Fast local iteration, local files, library-equivalent workload behavior. |
| `nemo data-designer ... submit`       | Data Designer API or Jobs worker | Yes                     | Service-managed execution, logs, artifacts, and shared resources.        |
| `client.data_designer.preview/create` | Data Designer API or Jobs worker | Yes                     | Application code that calls Data Designer programmatically.              |

`run` versus `submit` primarily controls where the plugin workload execution happens. A local `run` can be fully local, but it is not an offline-only mode: it can still use the Files API, Secrets API, and Inference Gateway API from a running NeMo Services cluster when the configuration references the corresponding resources.

See [Execution Modes](/documentation/design-synthetic-data/execution-modes) for the full model.

## NeMo Services Integration

When you use CLI `submit`, SDK execution, or NeMo resources from a local `run`, the plugin integrates with these NeMo Services APIs:

| Integration               | What it provides                                                    |
| ------------------------- | ------------------------------------------------------------------- |
| **Inference Gateway API** | Centralized model providers and OpenAI-compatible inference routes. |
| **Files API**             | Filesets for seed data and persona datasets.                        |
| **Secrets API**           | API keys and tokens referenced from Data Designer configurations.   |
| **Jobs API**              | Service-managed create workloads, logs, status, and artifacts.      |

These integrations are required for `submit` and SDK execution. They are optional for CLI `run` execution, depending on the resources your configuration references.

## Next Steps

Understand local execution, NeMo Services execution, and NeMo resources.

Run previews and create datasets with `nemo data-designer`.

Learn through examples: basics, seeding, and more.

Move configurations between local CLI and NeMo Services execution.

Comprehensive guides on column types, constraints, and advanced features.