> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://nemo-platform.docs.buildwithfern.com/nemo/platform/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://nemo-platform.docs.buildwithfern.com/nemo/platform/_mcp/server.

# Manage Files

<a id="manage-files" />

NeMo Platform provides a file storage interface through the **Files** service.
The Files service supports multiple storage backends and can be used to store datasets for training, evaluation results, model artifacts, and other files.

## Concepts

* **Fileset**: A named container that holds files.

Filesets are uniquely identified by a **name** within a given **workspace**.

* **Storage Backend**: Each fileset is backed by a storage backend where the files are actually persisted. Supported backends include:
* `local`: Local filesystem storage (default, read/write)
* `s3`: Amazon S3 or S3-compatible storage such as MinIO (read/write)
* `ngc`: NVIDIA GPU Cloud storage (read-only)
* `huggingface`: HuggingFace Hub repositories (read-only)

Read-only backends allow you to create a fileset that acts as a handle to external resources. This provides a unified interface to access files from different sources using the same SDK methods, and allows other platform services to reference external data through a fileset.

* **Purpose**: A fileset field that indicates the intended use. Each purpose enables specific **metadata** fields under the corresponding key. Select a tab below to see the available metadata fields for each purpose:

  Use `purpose="generic"` (default) for other files that don't fit the `dataset` or `model` categories.

  **Metadata fields**: No purpose-specific metadata fields.

  Use `purpose="dataset"` for training and evaluation data.

  **Metadata fields** (`metadata.dataset.*`):

  | Field                     | Type     | Description                                                          |
  | ------------------------- | -------- | -------------------------------------------------------------------- |
  | `metadata.dataset.schema` | `object` | Schema describing the dataset format (e.g., column names and types). |

  Use `purpose="model"` for model weights and checkpoints.

  **Metadata fields** (`metadata.model.*`):

  | Field                                          | Type      | Description                                                                                                                                                               |
  | ---------------------------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
  | `metadata.model.tool_calling.chat_template`    | `string`  | Jinja2 chat template for the model. Propagated to the model entity spec by the model-spec background task.                                                                |
  | `metadata.model.tool_calling.tool_call_parser` | `string`  | Name of the tool call parser (e.g., `hermes`, `llama3_json`, `mistral`).                                                                                                  |
  | `metadata.model.tool_calling.tool_call_plugin` | `string`  | Reference to a fileset containing a custom tool call plugin Python file (`{workspace}/{fileset_name}`). Requires `models.tool_call_plugin.enabled` at the platform level. |
  | `metadata.model.tool_calling.auto_tool_choice` | `boolean` | Whether to enable automatic tool choice.                                                                                                                                  |

  These fields are merged into the model entity spec by the model-spec background task.

* **Custom Fields**: Arbitrary key-value data attached to a fileset via `custom_fields` for user-defined metadata.

***

## Managing Filesets

Fileset management operations (create, retrieve, list, delete) are available through the CLI (`nemo files filesets`) or the SDK (`client.files.filesets`).

CLI commands use the workspace from your current context by default. Use `--workspace` to specify a different workspace:

```bash
nemo files filesets list --workspace my-workspace
```

### Creating Filesets

Creating a fileset involves specifying a name and workspace. You can optionally provide a description, purpose, and custom storage configuration.

```bash
nemo files filesets create my-files \
--description "Training data for model fine-tuning"
```

```json
{
  "id": "fileset-TeufFfapeKBrMtpBb42zdv",
  "created_at": "2026-01-20T03:00:00",
  "custom_fields": {},
  "description": "Training data for model fine-tuning",
  "metadata": {
    "dataset": null
  },
  "name": "my-files",
  "project": "",
  "purpose": "generic",
  "storage": {
    "path": "/var/mnt/filesets/default/my-files",
    "read_chunk_size": 16777216,
    "type": "local",
    "write_buffer_size": 16777216
  },
  "updated_at": "2026-01-20T03:00:00",
  "workspace": "default"
}
```

```python
import os

from nemo_platform import NeMoPlatform

client = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace="default",
)

# Create a fileset
fileset = client.files.filesets.create(
    name="my-files",
    description="Training data for model fine-tuning",
)

print(fileset.model_dump_json(indent=2))
```

```json
{
  "id": "fileset-TeufFfapeKBrMtpBb42zdv",
  "created_at": "2026-01-20T03:00:00",
  "custom_fields": {},
  "description": "Training data for model fine-tuning",
  "metadata": {
    "dataset": null
  },
  "name": "my-files",
  "project": "",
  "purpose": "generic",
  "storage": {
    "path": "/var/mnt/filesets/default/my-files",
    "read_chunk_size": 16777216,
    "type": "local",
    "write_buffer_size": 16777216
  },
  "updated_at": "2026-01-20T03:00:00",
  "workspace": "default"
}
```

### Listing Filesets

List all filesets in a given workspace:

```bash
nemo files filesets list
```

```text
┏━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ name ┃ workspace ┃ created_at ┃
┡━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ my-files │ default │ 2026-01-20T03:00:00 │
└──────────┴───────────┴────────────────────────────┘
```

```python
filesets = client.files.filesets.list()

for fileset in filesets:
    print(f"{fileset.name}: {fileset.description}")
```

Filter filesets by purpose or storage type:

```bash
# List only dataset filesets
nemo files filesets list --filter.purpose dataset

# List filesets using local storage
nemo files filesets list --filter.storage-type local
```

```python
# List only dataset filesets
datasets = client.files.filesets.list(filter={"purpose": "dataset"})

# List filesets using local storage
local_filesets = client.files.filesets.list(filter={"storage_type": "local"})
```

Use pagination for large result sets:

```bash
# The "-" prefix sorts in descending order (newest first)
nemo files filesets list --page 1 --page-size 10 --sort "-created_at"
```

```python
filesets = client.files.filesets.list(
    page=1,
    page_size=10,
    sort="-created_at",  # The "-" prefix sorts descending (newest first)
)
```

### Deleting Filesets

Delete an entire fileset:

```bash
nemo files filesets delete my-files
```

```text
✓ Deleted successfully
```

```python
deleted_fileset = client.files.filesets.delete(name="my-files")

print(f"Deleted fileset: {deleted_fileset.name}")
```

Deleting a fileset is permanent and cannot be undone. For `local` and `s3` storage backends, this also deletes all underlying files.

***

## Managing Files Within Filesets

High-level file operations are available through the CLI (`nemo files`) or the SDK (`client.files`), which provide convenient methods for uploading, downloading, and listing files.

For advanced use cases, a [fsspec](https://filesystem-spec.readthedocs.io/en/latest/)-compatible filesystem is available at `client.files.fsspec`. Refer to the [fsspec documentation](https://filesystem-spec.readthedocs.io/en/latest/usage.html#use-a-file-system) for additional methods.

### Uploading Files

Upload files to a fileset:

```bash
# Upload a single file
nemo files upload ./data.jsonl my-files --remote-path training/data.jsonl

# Upload an entire directory
nemo files upload ./training_data/ my-files --remote-path training/
```

```text
Uploading ━━━━━━━━━━━━━━━━ 100% • 3/3 files
Completed upload to my-files#training/
```

Upload without specifying a fileset to auto-create one:

```bash
# Auto-creates a new fileset with a generated name (fileset-<8 hex chars>)
nemo files upload ./data.jsonl
```

```text
Uploading ━━━━━━━━━━━━━━━━ 100% • 1/1 files
Completed upload to fileset-a1b2c3d4
```

```python
# Upload a single file
client.files.upload(
    fileset="my-files",
    local_path="./data.jsonl",
    remote_path="training/data.jsonl",
)

# Upload an entire directory
client.files.upload(
    fileset="my-files",
    local_path="./training_data/",
    remote_path="training/",
)

# Auto-create a new fileset (generates name like "fileset-a1b2c3d4")
result = client.files.upload(
    local_path="./data.jsonl",
    fileset_auto_create=True,
)
print(f"Uploaded to fileset: {result.name}")
```

If `fileset` is omitted, a new fileset is automatically created with a unique name following the pattern `fileset-<8-hex>` (e.g., `fileset-a1b2c3d4`). The generated name is returned so you can reference it in subsequent operations.

### Listing Files

List all files in a fileset:

```bash
nemo files list my-files
```

```text
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┓
┃ PATH ┃ SIZE ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━┩
│ training/data.jsonl │ 1024 │
│ training/validation.jsonl │ 512 │
└────────────────────────────┴──────┘
```

```python
response = client.files.list(fileset="my-files")

for file in response.data:
    print(f"{file.path}: {file.size} bytes")
```

List files under a specific directory:

```bash
nemo files list my-files --remote-path training/
```

```python
training_files = client.files.list(fileset="my-files", remote_path="training/")
```

### Downloading Files

Download files to a local path:

```bash
# Download a single file
nemo files download my-files --remote-path training/data.jsonl -o ./data.jsonl

# Download an entire directory
nemo files download my-files --remote-path training/ -o ./training_data/
```

```text
Downloading ━━━━━━━━━━━━━━━━ 100% • 2/2 files
Downloaded my-files#training/ to './training_data/'
```

```python
# Download a single file
client.files.download(
    fileset="my-files",
    remote_path="training/data.jsonl",
    local_path="./data.jsonl",
)

# Download an entire directory
client.files.download(
    fileset="my-files",
    remote_path="training/",
    local_path="./training_data/",
)
```

Read file content into memory (SDK only):

```python
content = client.files.download_content(
    fileset="my-files",
    remote_path="config.json",
)
print(content.decode("utf-8"))
```

### Deleting Files

Delete files from a fileset:

```bash
nemo files delete my-files --remote-path training/old-data.jsonl
```

```text
Deleted my-files#training/old-data.jsonl
```

```python
client.files.delete(
    fileset="my-files",
    remote_path="training/old-data.jsonl",
)
```

### Using Progress Callbacks

The CLI displays progress bars automatically during uploads and downloads. This section covers custom progress handling in the SDK.

Track progress during large file transfers using the `RichProgressCallback` context manager:

```python
from nemo_platform.filesets import RichProgressCallback

# Upload a directory with progress bar
with RichProgressCallback(description="Uploading dataset") as callback:
    client.files.upload(
        fileset="my-files",
        local_path="./large_dataset/",
        remote_path="",
        callback=callback,
    )

# Download all files from a fileset with progress bar
with RichProgressCallback(description="Downloading dataset") as callback:
    client.files.download(
        fileset="my-files",
        remote_path="",
        local_path="./downloaded_data/",
        callback=callback,
    )
```

***

## Use Cases

### Using External Storage Backends

Connect to files stored in NVIDIA GPU Cloud (NGC):

```bash
# Create a secret to store your NGC API key
echo "$NGC_API_KEY" | nemo secrets create my-ngc-api-key --from-file -

# Create a fileset pointing to NGC storage
nemo files filesets create my-nemotron-personas-dataset-en_us \
--description "Nemotron Personas USA" \
--storage '{
"type": "ngc",
"org": "nvidia",
"team": "nemotron-personas",
"resource": "nemotron-personas-dataset-en_us",
"version": "0.0.2",
"api_key_secret": "my-ngc-api-key"
}'
```

```python
import os

# Create a secret to store your NGC API key
secret = client.secrets.create(name="my-ngc-api-key", value="<your-ngc-api-key>")

# Create a fileset pointing to NGC storage
ngc_fileset = client.files.filesets.create(
    name="my-nemotron-personas-dataset-en_us",
    description="Nemotron Personas USA",
    storage={
        "type": "ngc",
        "org": "nvidia",
        "team": "nemotron-personas",
        "resource": "nemotron-personas-dataset-en_us",
        "version": "0.0.2",
        "api_key_secret": secret.name,
    },
)
```

Connect to a HuggingFace repository:

```bash
# Create a secret to store your HuggingFace token (needed for gated and private repos)
echo "$HF_TOKEN" | nemo secrets create hf_token --from-file -

# Create a fileset pointing to a HuggingFace repo
nemo files filesets create hf-dataset \
--description "Dataset from HuggingFace" \
--storage '{
"type": "huggingface",
"repo_id": "nvidia/Nemotron-Personas-Japan",
"repo_type": "dataset",
"token_secret": "hf_token"
}'
```

```python
import os

# Create a secret to store your HuggingFace token (needed for gated and private repos)
secret = client.secrets.create(name="hf_token", value=os.getenv("HF_TOKEN"))

# Create a fileset pointing to a HuggingFace repo
hf_fileset = client.files.filesets.create(
    name="hf-dataset",
    description="Dataset from HuggingFace",
    storage={
        "type": "huggingface",
        "repo_id": "nvidia/Nemotron-Personas-Japan",
        "repo_type": "dataset",
        "token_secret": secret.name,  # Optional, needed for gated and private repos
    },
)
```

Connect to an S3 bucket or S3-compatible storage (e.g., MinIO, Ceph):

```bash
# Create a fileset backed by S3 storage using SDK credential chain
# (uses AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY env vars, IRSA, instance profiles, etc.)
# The "prefix" field is optional - use it to scope the fileset to a folder within the bucket
nemo files filesets create s3-training-data \
--description "Training data stored in S3" \
--storage '{
"type": "s3",
"bucket": "my-ml-bucket",
"prefix": "datasets/training",
"region": "us-east-1",
"use_sdk_auth": true
}'

# Upload data to S3
nemo files upload ./training_data/ s3-training-data

# Download data from S3
nemo files download s3-training-data -o ./downloaded_data/
```

```python
# Create a fileset backed by S3 storage using SDK credential chain
# (uses AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY env vars, IRSA, instance profiles, etc.)
s3_fileset = client.files.filesets.create(
    name="s3-training-data",
    description="Training data stored in S3",
    storage={
        "type": "s3",
        "bucket": "my-ml-bucket",
        "prefix": "datasets/training",  # Optional: scope to a folder within the bucket
        "region": "us-east-1",
        "use_sdk_auth": True,  # Use AWS SDK credential chain (default)
    },
)

# Upload data to S3
client.files.upload(
    fileset="s3-training-data",
    local_path="./training_data/",
    remote_path="",
)

# Download data from S3
client.files.download(
    fileset="s3-training-data",
    remote_path="",
    local_path="./downloaded_data/",
)
```

For S3-compatible storage like MinIO, use explicit credentials and a custom endpoint:

```bash
# Create secrets to store your S3 credentials
echo "$S3_ACCESS_KEY" | nemo secrets create s3_access_key --from-file -
echo "$S3_SECRET_KEY" | nemo secrets create s3_secret_key --from-file -

nemo files filesets create minio-fileset \
--description "Data stored in MinIO" \
--storage '{
"type": "s3",
"bucket": "my-bucket",
"endpoint_url": "http://minio.example.com:9000",
"region": "us-east-1",
"use_sdk_auth": false,
"access_key_id_secret": "s3_access_key",
"secret_access_key_secret": "s3_secret_key"
}'
```

```python
import os

# Create secrets to store your S3 credentials
access_key = client.secrets.create(
    name="s3_access_key", value=os.getenv("S3_ACCESS_KEY")
)
secret_key = client.secrets.create(
    name="s3_secret_key", value=os.getenv("S3_SECRET_KEY")
)

s3_fileset = client.files.filesets.create(
    name="minio-fileset",
    description="Data stored in MinIO",
    storage={
        "type": "s3",
        "bucket": "my-bucket",
        "endpoint_url": "http://minio.example.com:9000",  # Custom S3 endpoint
        "region": "us-east-1",
        "use_sdk_auth": False,  # Use explicit credentials instead of SDK auth
        "access_key_id_secret": access_key.name,
        "secret_access_key_secret": secret_key.name,
    },
)
```