Storage Guide¶

This guide helps you choose and configure persistent storage for Kubernetes clusters running on TrueNAS via the Omni infrastructure provider.

Longhorn with a dedicated data disk is the default storage approach. We chose Longhorn over NFS because NFS has significant networking complexity (firewall rules, port 2049 reachability, subnet auto-detection), many critical applications cannot run on NFS at all (see incompatibility list below), and NFS volumes cannot be snapshotted or backed up with standard Kubernetes tools like Velero CSI snapshots. Longhorn gives you block storage, built-in snapshots, S3 backup, and zero TrueNAS-side dependencies. NFS auto-storage remains available as an opt-in alternative for simple read-heavy workloads.

Choosing a Storage System¶

Software That Does Not Support NFS¶

Many popular Kubernetes workloads explicitly require block storage or are known to corrupt data on NFS. If you plan to run any of these, you must use Longhorn (or another block storage provider):

Software	Why NFS Fails
PostgreSQL / CloudNativePG	NFS file-locking semantics cause WAL corruption, timeouts, and data loss under concurrent writes. CloudNativePG docs explicitly recommend local/block storage.
Elasticsearch / OpenSearch	Lucene relies on filesystem behavior NFS does not provide. Elastic explicitly states NFS is not supported -- data corruption and index failures will occur.
Redis Enterprise	Redis docs state NFS is not supported -- requires block storage with EXT4/XFS. NFS locking is incompatible with Redis persistence.
MongoDB	Data directories fail to persist correctly on NFS. Missing subdirectories and silent data loss reported in Kubernetes NFS deployments.
OpenBao / Vault	Raft consensus storage requires consistent fsync semantics. NFS cannot guarantee the write ordering that Raft needs for safe leader election and log replication. Use block storage with integrated Raft, or Consul as the backend.
etcd	Requires low-latency, fdatasync-safe storage. NFS latency and locking cause leader election failures and cluster instability.
Loki (log aggregation)	Grafana docs warn against NFS for Loki -- shared filesystem causes "a bad experience." Production Loki should use S3 or block storage.
Prometheus	TSDB requires consistent block writes. NFS adds latency that causes scrape timeouts and compaction failures under load.
MySQL / MariaDB	InnoDB requires `O_DIRECT` and `fsync` guarantees that NFS does not reliably provide, leading to silent corruption on crash recovery.
CockroachDB / TiDB	Distributed SQL databases with Raft consensus -- same fsync requirements as etcd. NFS breaks replication consistency.

General rule: If the software uses a write-ahead log (WAL), Raft consensus, or Lucene indexing, it will not work reliably on NFS.

When NFS Does Not Work (Infrastructure)¶

Beyond software compatibility, NFS auto-storage requires the provider to have TrueNAS API access and the cluster nodes to reach TrueNAS on port 2049. It will not work in these scenarios:

Provider deployed to a remote Kubernetes cluster -- The provider has WebSocket API access, but the cluster VMs may not have network access to TrueNAS port 2049 (NFS). The provider can create the share, but pods can't mount it.
Provider deployed via Helm to a different site -- Multi-site or edge deployments where the cluster is not on the same LAN as TrueNAS.
Firewall blocks NFS -- If a firewall sits between the cluster network and TrueNAS and does not allow NFS traffic (TCP 2049, plus portmapper on 111).
TrueNAS NFS service disabled or unavailable -- Some TrueNAS configurations intentionally disable NFS (e.g., iSCSI-only setups, or when the NFS service conflicts with other workloads).
Air-gapped or restricted networks -- Environments where cluster nodes cannot make outbound connections to the NAS.
Shared TrueNAS with NFS conflicts -- When other NFS consumers on the same TrueNAS box have specific export requirements that conflict with the provider's auto-created shares.

In all of these cases, use Longhorn -- it runs entirely inside the cluster and has zero NAS-side dependencies.

Decision Matrix¶

	Longhorn (Recommended)	NFS (Auto Storage)
How it works	Storage software runs inside the cluster, replicates data across VM disks	TrueNAS serves an NFS share; a provisioner creates subdirectories for each PV
TrueNAS dependency	None -- self-contained	Requires API access + NFS port reachable from cluster
Extra VM disks needed	Yes (one per worker node)	No
Storage type	Block (better for databases)	File (NFS overhead on random I/O)
Data lives on	Virtual disks inside VMs (replicated by Longhorn)	TrueNAS ZFS pool (snapshots, scrub, replication)
Setup complexity	Medium -- Helm install + Talos config patch	Low -- one toggle in MachineClass
Access modes	ReadWriteOnce (single node)	ReadWriteMany (multiple pods)
Survives TrueNAS outage	Yes -- data is on local VM disks	No -- NFS mount goes offline
Snapshots & backup	Kubernetes VolumeSnapshots, Velero CSI integration, backup to S3	No CSI snapshot support — Velero can only do file-level restic/kopia backup, not crash-consistent snapshots
Project health	Active CNCF incubating project	nfs-subdir-external-provisioner unmaintained since 2022

Choose Longhorn if: You want reliable, self-contained storage that works regardless of your network topology or TrueNAS configuration, with proper snapshot and backup support. This is the right choice for most users.

Choose NFS if: You only need shared read-heavy storage (media files, static assets), your cluster nodes can reach TrueNAS over NFS, and you don't need Kubernetes-native snapshots or database workloads.

Both approaches can coexist in the same cluster -- install Longhorn alongside NFS and use StorageClass selectors per workload.

Option 1: Longhorn (Recommended)¶

Longhorn is a CNCF incubating project that provides Kubernetes-native distributed block storage. It runs entirely inside your cluster with no TrueNAS API dependency.

Requirements¶

Extra virtual disks attached to worker VMs (use additional_disks in MachineClass)
Talos machine config patches for Longhorn compatibility

Setup¶

1. Add a storage disk to your MachineClass:

providerdata: |
  cpus: 4
  memory: 8192
  disk_size: 40
  pool: default
  network_interface: br100
  storage_disk_size: 100  # GiB, dedicated to Longhorn

storage_disk_size is a shorthand that adds a data disk to each VM. You can also use the full additional_disks: [{size: 100}] syntax if you need per-disk pool or encryption options.

Automatic volume formatting + mount (v0.14.3+). When storage_disk_size is set (or any additional_disks entry is declared), the provider emits a Talos UserVolumeConfig patch so the disk is formatted (xfs by default) and mounted at /var/mnt/<name> inside the guest. The storage_disk_size shorthand uses name: longhorn, so the mount lands at /var/mnt/longhorn — which matches Longhorn's defaultDataPath. Older releases (≤ v0.14.2) attached the disk but required a manual UserVolumeConfig patch; that manual step is no longer needed.

2. Apply Talos machine config patches for Longhorn:

Longhorn needs specific Talos configuration. Apply this as a config patch in Omni:

machine:
  kubelet:
    extraMounts:
      - destination: /var/lib/longhorn
        type: bind
        source: /var/mnt/longhorn
        options:
          - bind
          - rshared
          - rw
  sysctls:
    vm.overcommit_memory: "1"

Note source: /var/mnt/longhorn — that's the mount path the provider's auto-emitted UserVolumeConfig creates. The /var/lib/longhorn destination is where Longhorn expects its data path inside its own containers.

See the Longhorn Talos Linux support guide for the latest configuration requirements.

3. Install Longhorn via Helm:

helm repo add longhorn https://charts.longhorn.io
helm repo update
helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --create-namespace \
  --set defaultSettings.defaultDataPath=/var/lib/longhorn

4. Set as default StorageClass:

kubectl patch storageclass longhorn -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Longhorn's Helm install creates two StorageClasses: longhorn (dynamic provisioning -- use this for all PVCs) and longhorn-static (for pre-existing volumes you created manually). Kubernetes StorageClasses have no description field, so the naming convention is the only way to communicate intent.

Why Longhorn¶

No TrueNAS dependency -- cluster storage is self-contained and works in any deployment topology
Block storage -- significantly better performance for databases (PostgreSQL, MySQL, etcd) and random I/O
Built-in replication across nodes, snapshots, and backup to S3
Active CNCF project with broad community support and regular releases
Web UI for monitoring volumes, replicas, and node health

Trade-offs¶

Requires extra VM disks (adds ZFS write amplification since Longhorn replicates on top of TrueNAS-managed zvols)
Storage capacity limited by total disk space across worker nodes
Doesn't leverage TrueNAS ZFS features (snapshots, replication, scrubbing) for cluster data
ReadWriteOnce only -- no shared volumes across pods on different nodes

Advanced: democratic-csi¶

For users who want per-PV ZFS dataset isolation with dynamic provisioning, democratic-csi is purpose-built for TrueNAS. Each PV gets its own ZFS dataset (NFS) or zvol (iSCSI).

This is more complex to set up than Longhorn but gives you: - Per-PV ZFS dataset/zvol isolation - ZFS snapshots exposed as Kubernetes VolumeSnapshots - Both NFS and iSCSI protocols

Mode	Auth	Notes
SSH-based (`freenas-nfs`, `freenas-iscsi`)	SSH to TrueNAS	Stable, battle-tested. Requires SSH access with root/sudo.
API-based (`freenas-api-nfs`, `freenas-api-iscsi`)	REST API	Experimental. 1 GB minimum volume size. REST v2.0 compatibility with TrueNAS 25.04+ should be verified.

iSCSI mode requires the iscsi-tools Talos extension:

machine:
  install:
    extensions:
      - image: ghcr.io/siderolabs/iscsi-tools:latest

See the democratic-csi documentation for setup instructions.

Talos Extension Requirements¶

Storage Option	Extensions Needed
Longhorn	None (uses standard block devices)
democratic-csi (iSCSI)	`iscsi-tools`