Parameters Reference

View as Markdown

Parameters Reference

This page summarizes the main configuration groups available when creating NeMo Safe Synthesizer jobs. For generated REST API schema details, see the Safe Synthesizer API Reference.

Job Spec (Plugin / REST)

Top-level fields on the Safe Synthesizer job spec (alongside config):

FieldDescription
data_sourceInput data as a platform fileset URL (workspace/fileset#path). With run-local, override via --data-source and use any placeholder in the spec.
pretrained_model_jobPrior completed job whose adapter result in Files is reused for generation-only synthesis. Format: <job> or <workspace>/<job>. Mutually exclusive with config.training.pretrained_model.
hf_token_secretPlatform secret name for Hugging Face token during model initialization

For host-local runs, see Local and Subprocess Execution. Reuse a local adapter with config.training.pretrained_model, not pretrained_model_job.

Top-Level Configuration

The SafeSynthesizerParameters schema defines the main configuration structure for Safe Synthesizer jobs.

SafeSynthesizerParameters

All fields are optional at the top level. For nested field constraints, see the Safe Synthesizer API Reference and search for the schema name in the Type column.

FieldTypeConstraints / description
dataDataParametersControls grouping, ordering, and train/evaluation holdout behavior.
evaluationEvaluationParametersControls synthetic data quality and privacy evaluation settings.
trainingTrainingHyperparamsControls fine-tuning behavior such as learning rate, batch size, LoRA settings, and optimization settings.
generationGenerateParametersControls synthetic data generation, including record count, sampling, structured generation, and validation.
privacyDifferentialPrivacyHyperparamsConfigures DP-SGD. When omitted, differential privacy is disabled.
time_seriesTimeSeriesParametersConfigures experimental time-series mode, including timestamp and interval handling.
replace_piiPiiReplacerConfigConfigures PII detection and replacement. If provided directly, steps must contain 1 to 10 transformation steps.

Data Parameters

Configuration for how to shape or use the input data, including grouping, ordering, and holdout settings.


Training Parameters

Hyperparameters for model fine-tuning, including learning rate, batch size, and LoRA configuration.


Generation Parameters

Configuration for synthetic data generation after training, including number of records, temperature, and structured generation options.


Differential Privacy Parameters

Hyperparameters for differential privacy during training using DP-SGD. Enable these for formal privacy guarantees.


Evaluation Parameters

Configuration for synthetic data quality and privacy assessment, including MIA, AIA, and PII replay detection.


PII Replacement Configuration

Configuration for PII detection and replacement. See pii-replacement for conceptual documentation.

Column Classification Config (replace_pii.globals.classify)

Column classification is configured via the SDK builder’s .with_classify_model_provider(provider_name) method. The provider name can be unqualified (the builder prepends the current workspace) or fully-qualified as workspace/provider_name.

If omitted, column classification is skipped and PII detection falls back to heuristic defaults, which may reduce accuracy.


Example Configuration

Here’s an example showing a complete job configuration using the Python SDK:

1import os
2import pandas as pd
3
4from nemo_platform import NeMoPlatform
5from nemo_safe_synthesizer_plugin.sdk.job_builder import SafeSynthesizerJobBuilder
6
7# Placeholders
8df: pd.DataFrame = pd.DataFrame()
9client = NeMoPlatform(
10 base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
11 workspace="default",
12)
13
14builder = (
15 SafeSynthesizerJobBuilder(client)
16 .with_data_source(df)
17 .with_train(
18 num_input_records_to_sample=10000,
19 learning_rate=0.0005,
20 batch_size=1,
21 )
22 .with_generate(
23 num_records=5000,
24 temperature=0.9,
25 )
26 .with_differential_privacy(
27 dp_enabled=True,
28 epsilon=8.0,
29 )
30 .with_replace_pii()
31 .synthesize()
32)
33job = builder.create_job(name="my-job", project="my-project")