Manage Files
NeMo Platform provides a file storage interface through the Files service. The Files service supports multiple storage backends and can be used to store datasets for training, evaluation results, model artifacts, and other files.
Concepts
- Fileset: A named container that holds files.
Filesets are uniquely identified by a name within a given workspace.
- Storage Backend: Each fileset is backed by a storage backend where the files are actually persisted. Supported backends include:
local: Local filesystem storage (default, read/write)s3: Amazon S3 or S3-compatible storage such as MinIO (read/write)ngc: NVIDIA GPU Cloud storage (read-only)huggingface: HuggingFace Hub repositories (read-only)
Read-only backends allow you to create a fileset that acts as a handle to external resources. This provides a unified interface to access files from different sources using the same SDK methods, and allows other platform services to reference external data through a fileset.
-
Purpose: A fileset field that indicates the intended use. Each purpose enables specific metadata fields under the corresponding key. Select a tab below to see the available metadata fields for each purpose:
generic
dataset
model
Use
purpose="generic"(default) for other files that don’t fit thedatasetormodelcategories.Metadata fields: No purpose-specific metadata fields.
These fields are merged into the model entity spec by the model-spec background task.
-
Custom Fields: Arbitrary key-value data attached to a fileset via
custom_fieldsfor user-defined metadata.
Managing Filesets
Fileset management operations (create, retrieve, list, delete) are available through the CLI (nemo files filesets) or the SDK (client.files.filesets).
CLI commands use the workspace from your current context by default. Use --workspace to specify a different workspace:
Creating Filesets
Creating a fileset involves specifying a name and workspace. You can optionally provide a description, purpose, and custom storage configuration.
CLI
Python SDK
Listing Filesets
List all filesets in a given workspace:
CLI
Python SDK
Filter filesets by purpose or storage type:
CLI
Python SDK
Use pagination for large result sets:
CLI
Python SDK
Deleting Filesets
Delete an entire fileset:
CLI
Python SDK
Deleting a fileset is permanent and cannot be undone. For local and s3 storage backends, this also deletes all underlying files.
Managing Files Within Filesets
High-level file operations are available through the CLI (nemo files) or the SDK (client.files), which provide convenient methods for uploading, downloading, and listing files.
For advanced use cases, a fsspec-compatible filesystem is available at client.files.fsspec. Refer to the fsspec documentation for additional methods.
Uploading Files
Upload files to a fileset:
CLI
Python SDK
Upload without specifying a fileset to auto-create one:
If fileset is omitted, a new fileset is automatically created with a unique name following the pattern fileset-<8-hex> (e.g., fileset-a1b2c3d4). The generated name is returned so you can reference it in subsequent operations.
Listing Files
List all files in a fileset:
CLI
Python SDK
List files under a specific directory:
CLI
Python SDK
Downloading Files
Download files to a local path:
CLI
Python SDK
Read file content into memory (SDK only):
Deleting Files
Delete files from a fileset:
CLI
Python SDK
Using Progress Callbacks
The CLI displays progress bars automatically during uploads and downloads. This section covers custom progress handling in the SDK.
Track progress during large file transfers using the RichProgressCallback context manager:
Use Cases
Using External Storage Backends
Connect to files stored in NVIDIA GPU Cloud (NGC):
CLI
Python SDK
Connect to a HuggingFace repository:
CLI
Python SDK
Connect to an S3 bucket or S3-compatible storage (e.g., MinIO, Ceph):
CLI
Python SDK
For S3-compatible storage like MinIO, use explicit credentials and a custom endpoint: