> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://nemo-platform.docs.buildwithfern.com/nemo/platform/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://nemo-platform.docs.buildwithfern.com/nemo/platform/_mcp/server.

# PII Replacement

<a id="about-pii-replacement" />

PII (Personally Identifiable Information) replacement is a critical privacy protection step that detects and replaces sensitive information in your datasets before synthesis. This ensures that the model has no chance of learning the most sensitive information like names, addresses, and other identifiers.

## How It Works

The PII replacement pipeline operates in multiple stages:

1. **Detection**: Identifies PII entities using configurable detection methods
2. **Classification**: Categorizes detected entities by type (name, email, address, and so on)
3. **Transformation**: Replaces or redacts PII using configurable rules
4. **Validation**: Verifies that sensitive information has been properly handled

## Detection Methods

NeMo Safe Synthesizer supports multiple PII detection approaches:

### Nemotron PII Detection

Uses the Nemotron PII model for entity recognition:

* Zero-shot entity detection
* Supports custom entity types
* High accuracy for standard PII categories
* Configurable confidence thresholds

### LLM Classification

Leverages language models for PII detection:

* Contextual understanding of entities
* Handles complex PII patterns
* Flexible entity definitions
* Configurable prompts and models

### Regex Detection

Pattern-based detection for structured PII:

* Fast and deterministic
* Ideal for known formats (SSN, phone numbers)
* Customizable patterns
* Low computational overhead

## Replacement Strategies

After detection, PII can be handled in multiple ways:

* **Replacement**: Generate realistic replacements using [Faker library](https://faker.readthedocs.io/en/master/) or custom expressions.
* **Redaction**: Substitute with placeholder tokens.
* **Hashing**: Convert to a unique digital fingerprint (one-way).
* **Custom Rules**: Define your own transformation logic.

## Supported Entity Types

Nemotron PII has been specifically fine-tuned to recognize many entity types out of the box, organized by category:

### Personal Information

* `first_name` - Given names
* `last_name` - Surnames and family names
* `name` - Full names
* `email` - Email addresses
* `phone_number` - Phone numbers in various formats
* `fax_number` - Fax numbers in various formats

### Addresses

* `address` - Complete physical addresses (for example, 123 Main Street, Anytown, CA 90210)
* `street_address` - Street addresses (for example, 123 Main Street)
* `city` - City names
* `county` - County names
* `state` - State/province names
* `postcode` - Postal/ZIP codes
* `country` - Country names

### Personal Identifiers

* `ssn` - Social Security Numbers
* `national_id` - National ID numbers
* `tax_id` - Tax ID numbers
* `certificate_license_number` - Driver's license numbers
* `unique_identifier` - Generic unique IDs
* `customer_id` - Customer identifiers
* `employee_id` - Employee identifiers

### Financial Information

* `credit_debit_card` - Credit and debit card numbers
* `cvv` - Credit card verification code
* `pin` - Personal identification numbers
* `account_number` - Bank account numbers
* `bank_routing_number` - Bank routing numbers
* `swift_bic` - Swift/BIC codes
* `iban` - International bank account numbers

### Medical Information

* `medical_record_number` - Medical record numbers
* `health_plan_beneficiary_number` - Insurance IDs
* `biometric_identifier` - Biometric data references

### Technical Identifiers

* `url` - Web URLs
* `ipv4` - IPv4 addresses
* `ipv6` - IPv6 addresses
* `mac_address` - Hardware MAC addresses
* `api_key` - API keys and tokens
* `user_name` - Usernames
* `password` - Passwords
* `http_cookie` - HTTP Cookies
* `device_identifier` - Device IDs

### Vehicle Identifiers

* `vehicle_identifier` - Vehicle identification numbers (VINs)
* `license_plate` - License plates

### Geographic Information

* `latitude` - Latitude coordinates
* `longitude` - Longitude coordinates
* `coordinate` - Coordinate pairs

### Quasi Identifiers

* `date` - Date values
* `date_time` - Date and time values
* `date_of_birth` - Birth dates
* `time` - Time values
* `age` - Ages
* `blood_type` - Blood type information
* `gender` - Gender information
* `sexuality` - Sexual orientation
* `political_view` - Political affiliations
* `race_ethnicity` - Race and ethnicity information
* `religious_belief` - Religious affiliations
* `language` - Language preferences
* `education_level` - Education level
* `occupation` - Professional titles
* `employment_status` - Employment information
* `company_name` - Organization names

### Custom Entity Types

Beyond these built-in types, you can define custom entities using:

* **Nemotron PII**: Fast, accurate zero-shot NER for standard and custom entity types
* **Regex**: Deterministic pattern matching, best for consistent formats (SSN, credit cards)
* **LLM**: Contextual understanding, handles complex patterns and ambiguous cases

**Example Custom Entity:**

```json
{
  "classify": {
    "enable": true,
    "entities": [
      "first_name",
      "last_name",
      "email",
      "employee_id",
      "project_code"
    ]
  }
}
```

## Configuration

PII replacement is configured through the `replace_pii` section. For the full schema, refer to [reference](/documentation/synthesize-safe-data/about/parameters-reference).

```json
{
  "replace_pii": {
    "globals": {
      "locales": [
        "en_US"
      ]
    },
    "steps": [
      {
        "rows": {
          "update": [
            {
              "entity": [
                "email",
                "phone_number"
              ],
              "value": "column.entity | fake"
            }
          ]
        }
      }
    ]
  }
}
```

## When to Use PII Replacement

Consider using PII replacement when:

* Your data contains names, addresses, or other direct identifiers
* Compliance requires PII removal before processing
* You want to ensure the model cannot memorize sensitive values
* You need to share synthetic data with external parties

PII replacement is always recommended as a preprocessing step before synthesis.

## Related Topics

* [safe-synthesizer-101](/documentation/synthesize-safe-data/tutorials/safe-synthesizer-101): Getting started tutorial with PII replacement
* [index](/documentation/synthesize-safe-data/tutorials): More tutorials