Everipedia Logo
Everipedia is now IQ.wiki - Join the IQ Brainlist and our Discord for early access to editing on the new platform and to participate in the beta testing.
ZFS

ZFS

ZFS is a combined file system and logical volume manager designed by Sun Microsystems. ZFS is scalable, and includes extensive protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z, native NFSv4 ACLs, and can be very precisely configured. The two main implementations, by Oracle and by the OpenZFS project, are extremely similar, making ZFS widely available within Unix-like systems.

The ZFS name stands for nothing—briefly assigned the backronym "Zettabyte File System", it is no longer considered an initialism.[5] Originally, ZFS was proprietary, closed-source software developed internally by Sun as part of Solaris, with a team led by the CTO of Sun's storage business unit and Sun Fellow Jeff Bonwick.[6][7] In 2005, the bulk of Solaris, including ZFS, was licensed as open-source software under the Common Development and Distribution License (CDDL), as the OpenSolaris project. ZFS became a standard feature of Solaris 10 in June 2006.

In 2010, Sun Microsystems was acquired by Oracle and ZFS became a registered trademark belonging to Oracle.[8] Oracle stopped releasing updated source code for new OpenSolaris and ZFS development, effectively reverting Oracle's ZFS to closed source. In response, the illumos project was founded, to maintain and enhance the existing open source Solaris, and in 2013 OpenZFS was founded to coordinate the development of open source ZFS.[9][10][11] OpenZFS maintains and manages the core ZFS code, while organizations using ZFS maintain the specific code and validation processes required for ZFS to integrate within their systems. OpenZFS is widely used in Unix-like systems.[12][13][14] In 2017, one analyst described OpenZFS as "the only proven Open Source data-validating enterprise file system".[15]

ZFS
Developer(s)Sun Microsystems originally, Oracle Corporation since 2010. See also OpenZFS (open source fork).
Full nameZFS
IntroducedNovember 2005 with OpenSolaris
Structures
Directory contentsExtensible hash table
Limits
Max. volume size256 trillion yobibytes (2128 bytes)[2]
Max. file size16 exbibytes (264 bytes)
Max. number of files
  • Per directory: 248
  • Per file system: unlimited[2]
Max. filename length255 ASCII characters (fewer for multibyte character standards such as Unicode)
Features
ForksYes (called "extended attributes", but they are full-fledged streams)
AttributesPOSIX
File system permissionsPOSIX, NFSv4 ACLs
Transparent compressionYes
Transparent encryptionYes[3]
Data deduplicationYes
Copy-on-writeYes
Other
Supported operating systemsSolaris, OpenSolaris, illumos distributions, OpenIndiana, FreeBSD, Mac OS X Server 10.5 (limited to read-only), NetBSD, Linux via third-party kernel module ("ZFS on Linux")[4] or ZFS-FUSE, OSv
img

Overview and design goals

ZFS compared to other file systems

The management of stored data generally involves two aspects: the physical volume management of one or more block storage devices such as hard drives and SD cards and their organization into logical block devices as seen by the operating system (often involving a volume manager, RAID controller, array manager, or suitable device driver), and the management of data and files that are stored on these logical block devices (a file system or other data storage).

Example: ARAID arrayof 2 hard drives and an SSD caching disk is controlled byIntel's RST system, part of thechipsetandfirmwarebuilt into a desktop computer. TheWindowsuser sees this as a single volume, containing an NTFS-formatted drive of their data, and NTFS is not necessarily aware of the manipulations that may be required (such as reading from/writing to the cache drive or rebuilding the RAID array if a disk fails). The management of the individual devices and their presentation as a single device is distinct from the management of the files held on that apparent device.

ZFS is unusual, because unlike most other storage systems, it unifies both of these roles and acts as both the volume manager and the file system. Therefore, it has complete knowledge of both the physical disks and volumes (including their condition and status, their logical arrangement into volumes, and also of all the files stored on them). ZFS is designed to ensure (subject to suitable hardware) that data stored on disks cannot be lost due to physical errors or misprocessing by the hardware or operating system, or bit rot events and data corruption which may happen over time, and its complete control of the storage system is used to ensure that every step, whether related to file management or disk management, is verified, confirmed, corrected if needed, and optimized, in a way that storage controller cards and separate volume and file managers cannot achieve.

ZFS also includes a mechanism for dataset and pool level snapshots and replication, including snapshot cloning which is described by the FreeBSD documentation as one of its "most powerful features", having features that "even other file systems with snapshot functionality lack".[16] Very large numbers of snapshots can be taken, without degrading performance, allowing snapshots to be used prior to risky system operations and software changes, or an entire production ("live") file system to be fully snapshotted several times an hour, in order to mitigate data loss due to user error or malicious activity. Snapshots can be rolled back "live" or previous file system states can be viewed, even on very large file systems, leading to savings in comparison to formal backup and restore processes.[16] Snapshots can also be cloned to form new independent file systems. A pool level snapshot (known as a "checkpoint") is available which allows rollback of operations that may affect the entire pool's structure, or which add or remove entire datasets.

Summary of key differentiating features

Examples of features specific to ZFS include:

  • Designed for long term storage of data, and indefinitely scaled datastore sizes with zero data loss, and high configurability.
  • Hierarchicalchecksummingof all data andmetadata, ensuring that the entire storage system can be verified on use, and confirmed to be correctly stored, or remedied if corrupt. Checksums are stored with a block's parentblock, rather than with the block itself. This contrasts with many file systems where checksums (if held) are stored with the data so that if the data is lost or corrupt, the checksum is also likely to be lost or incorrect.
  • Can store a user-specified number of copies of data or metadata, or selected types of data, to improve the ability to recover from data corruption of important files and structures.
  • Automatic rollback of recent changes to the file system and data, in some circumstances, in the event of an error or inconsistency.
  • Automated and (usually) silent self-healing of data inconsistencies and write failure when detected, for all errors where the data is capable of reconstruction. Data can be reconstructed using all of the following: error detection and correction checksums stored in each block's parent block; multiple copies of data (including checksums) held on the disk; write intentions logged on the SLOG (ZIL) for writes that should have occurred but did not occur (after a power failure); parity data from RAID/RAIDZ disks and volumes; copies of data from mirrored disks and volumes.
  • Native handling of standard RAID levels and additional ZFS RAID layouts ("RAIDZ"). The RAIDZ levels stripe data across only the disks required, for efficiency (many RAID systems stripe indiscriminately across all devices), and checksumming allows rebuilding of inconsistent or corrupted data to be minimised to those blocks with defects;
  • Native handling of tiered storage and caching devices, which is usually a volume related task. Because ZFS also understands the file system, it can use file-related knowledge to inform, integrate and optimize its tiered storage handling which a separate device cannot;
  • Native handling of snapshots and backup/replicationwhich can be made efficient by integrating the volume and file handling. Relevant tools are provided at a low level and require external scripts and software for utilization.
  • Nativedata compressionanddeduplication, although the latter is largely handled inRAMand is memory hungry.
  • Efficient rebuilding of RAID arrays — a RAID controller often has to rebuild an entire disk, but ZFS can combine disk and file knowledge to limit any rebuilding to data which is actually missing or corrupt, greatly speeding up rebuilding;
  • Unaffected by RAID hardware changes which affect many other systems. On many systems, if self-contained RAID hardware such as a RAID card fails, or the data is moved to another RAID system, the file system will lack information that was on the original RAID hardware, which is needed to manage data on the RAID array. This can lead to a total loss of data unless near-identical hardware can be acquired and used as a "stepping stone". Since ZFS manages RAID itself, a ZFS pool can be migrated to other hardware, or the operating system can be reinstalled, and the RAIDZ structures and data will be recognized and immediately accessible by ZFS again.
  • Ability to identify data that would have been found in a cache but has been discarded recently instead; this allows ZFS to reassess its caching decisions in light of later use and facilitates very high cache hit levels (ZFS cache hit rates are typically over 80%);
  • Alternative caching strategies can be used for data that would otherwise cause delays in data handling. For example, synchronous writes which are capable of slowing down the storage system can be converted to asynchronous writes by being written to a fast separate caching device, known as the SLOG (sometimes called the ZIL – ZFS Intent Log).
  • Highly tunable – many internal parameters can be configured for optimal functionality.
  • Can be used forhigh availabilityclusters and computing, although not fully designed for this use.

Inappropriately specified systems

Unlike many file systems, ZFS is intended to work towards specific aims. Its primary targets are enterprise standard data management and commercial environments, with hardware capable of supporting ZFS' capabilities for data resilience and the resources needed to serve data efficiently. If the system or its configuration are poorly matched to ZFS, then ZFS may underperform significantly. In their 2017 ZFS benchmarks, ZFS developers Calomel stated that:[17]

"On mailing lists and forums there are posts which state ZFS is slow and unresponsive. We have shown in the previous section you can get incredible speeds out of the file system if you understand the limitations of your hardware and how to properly setup your raid. We suspect that many of the objectors of ZFS have setup their ZFS system using slow or otherwise substandard I/O subsystems."

Common system design failures include:

  • Inadequate RAM — ZFS may require a large amount of memory (typically for the adaptive replacement cache (ARC)[18], and if deduplication is used, for the deduplication block table[19]).
  • Inadequate disk free space — ZFS usescopy on writefor data storage; its performance may suffer if the disk pool gets too close to full. Around 70% is a recommended limit for good performance. Above a certain percentage, typically set to around 80%, ZFS switches to a space-conserving rather than speed-oriented approach, and performance plummets as it focuses on preserving working space on the volume.
  • No efficient dedicated SLOG device, when synchronous writing is prominent — this is notably the case forNFSandESXi; even SSD based systems may need a separate ZFS intent log ("SLOG") device for expected performance. The SLOG device is only used for writing apart from when recovering from a system error. It can often be small (for example, inFreeNAS, the SLOG device only needs to store the largest amount of data likely to be written in about 10 seconds (or the size of two 'transaction groups'), although it can be made larger to allow longer lifetime of the device). SLOG is therefore unusual in that its main criteria are pure write functionality, very low write latency, and loss protection – usually little else matters.
  • Lack of suitable caches, or misdesigned/suboptimally configured caches — for example, ZFS can cache read data in RAM ("ARC") or a separate device ("L2ARC"); in some cases adding extra ARC is needed, in other cases adding extra L2ARC is needed, and in some situations adding extra L2ARC can even degrade performance, by forcing RAM to be used forlookup datafor the slower L2ARC, at the cost of less room for data in the ARC.
  • Use of hardware RAID cards — perhaps in the mistaken belief that these will 'help' ZFS. While routine for other filing systems, ZFS handles RAID natively, and is designed to work with a raw and unmodifiedlow levelview of storage devices, so it can fully use its functionality such as S.M.A.R.T. disk health monitoring. A separate RAID card may leave ZFS less efficient and reliable. For example, ZFS checksums all data, but most RAID cards will not do this as effectively, or for cached data. Separate cards can also mislead ZFS about the state of data, for example after acrash, or by mis-signalling exactly when data has safely been written, and in some cases this can lead to issues and data loss. Separate cards can also slow down the system, sometimes greatly, by addingCAS latencyto every data read/write operation, or by undertaking full rebuilds of damaged arrays where ZFS would have only needed to do minor repairs of a few seconds.
  • Use of poor quality components — Calomel identify poor quality RAID and network cards as common culprits for low performance.[17] Developer Jeff Bonwick also identifies inadequate quality hard drives, that misleadingly state data has been written before the data is actually written in order to appear faster than they are.[20]
  • Poor configuration/tuning — ZFS options allow for a wide range of tuning, and mis-tuning can affect performance. For example, suitable memory caching parameters for file shares onNFSare likely to be different from those required for block access shares usingiSCSIandFiber Channel. A memory cache that would be appropriate for the former, can causetimeouterrors and start-stop issues as data caches are flushed - because the time permitted for a response is likely to be much shorter on these kinds of connections, the client may believe the connection has failed, if there is a delay due to "writing out" a large cache. Similarly, many settings allow the balance betweenlatency(smoothness) andthroughputto be modified; inappropriate caches or settings can cause "freezing", slowness and "burstiness", or even connection timeouts.
  • Inappropriate use of deduplication — ZFS supportsdeduplication, a space-saving technique. But deduplication in ZFS typically requires very large or extreme amounts of RAM to cache the entirety of the pools's deduplication data which can require tens or hundreds of gigabytes of RAM. This is because ZFS performs deduplication encodingon the flyas data is written. It also places a very heavy load on the CPU, which must calculate and compare data for every block to be written to disk. Therefore, as a rule, deduplication requires a system to be designed and specified from the outset to handle the extra workload involved. Performance can be heavily impacted — often unacceptably so — if the deduplication capability is enabled without sufficient testing, and without balancing impact and expected benefits. Reputable ZFS commentators such as Oracle[21] and ixSystems,[22] as well as ZFS onlookers and bloggers,[23][24] strongly recommend this facility not be used in most cases at the present time, since it can often result in reduced performance and increased resource usage, without significant benefit in return.
  • No attempts made to identify/resolve issues using ZFS tools — ZFS exposes performance data for many of its inner operations, allowing troubleshooting of performance issues with precision. It may, in some cases, need these tools to be used, to gain the best performance it can provide.

Terminology and storage structure

Because ZFS acts as both volume manager and file system, the terminology and layout of ZFS storage covers two aspects:

  1. How physical devices such as hard drives are organized into vdevs (virtual devices - ZFS's fundamental "blocks" of redundant storage) which are used to create redundant storage for a ZFS pool or zpool (the top level of data container in a ZFS system); and

  2. How datasets (file systems) and volumes (also known as zvols, a block device) - the two kinds of data structures which ZFS is capable of presenting to a user - are held within a pool, and the features and capabilities they present to the user.

ZFS commands allow examination of the physical storage in terms of devices, vdevs they are organized into, data pools stored across those vdevs, and in various other ways. Various commands expose in-depth statistics of ZFS' internal status and internal performance/statistics data, to allow settings to be optimized.

Physical storage structure: devices and virtual devices

The physical devices used by ZFS (such as hard drives (HDDs) and SSDs) are organized into groups known as vdevs ("virtual devices") before being used to store data.

vdevs are a fundamental part of ZFS. They can be conceived as groups of disks that each provide redundancy against failure of their physical devices. Each vdev must be able to maintain the integrity of the data it holds, and must contain enough disks that the risk of data loss within it, is acceptably tiny. If any vdev were to become unreadable (due to disk errors or otherwise) then the entire pool it is part of, will also fail. (See data recovery below)

Each vdev can be one of:

  • a single device, or
  • multiple devices in amirroredconfiguration, or
  • multiple devices in a ZFSRAID("RaidZ") configuration.

Each vdev acts as an independent unit of redundant storage. Devices might not be in a vdev if they are unused spare disks, disks formatted with non-ZFS filing systems, offline disks, or cache devices.

The physical structure of a pool is defined by configuring as many vdevs of any type, and adding them to the pool. ZFS exposes and manages the individual disks within the system, as well as the vdevs, pools, datasets and volumes into which they are organized. Within any pool, data is automatically distributed by ZFS across all vdevs making up the pool. ZFS stripes the data in a pool across all the vdevs in that pool, for speed and efficiency.

Each vdev that the user defines, is completely independent from every other vdev, so different types of vdev can be mixed arbitrarily in a single ZFS system. If data redundancy is required (so that data is protected against physical device failure), then this is ensured by the user when they organize devices into vdevs, either by using a mirrored vdev or a RaidZ vdev. Data on a single device vdev may be lost if the device develops a fault. Data on a mirrored or RaidZ vdev will only be lost if enough disks fail at the same time (or before the system has resilvered any replacements due to recent disk failures). A ZFS vdev will continue to function in service if it is capable of providing at least one copy of the data stored on it, although it may become slower due to error fixing and resilvering, as part of its self-repair and data integrity processes. However ZFS is designed to not become unreasonably slow due to self-repair (unless directed to do so by an administrator) since one of its goals is to be capable of uninterrupted continual use even during self checking and self repair.

Since ZFS device redundancy is at vdev level, this also means that if a pool is stored across several vdevs, and one of these vdevs completely fails, then the entire pool content will be lost. This is similar to other RAID and redundancy systems, which require the data to be stored or capable of reconstruction from enough other devices to ensure data is unlikely to be lost due to physical devices failing. Therefore, it is intended that vdevs should be made of either mirrored devices or a RaidZ array of devices, with sufficient redundancy, for important data, so that ZFS can automatically limit and where possible avoid data loss if a device fails. Backups and replication are also an expected part of data protection.

Vdevs can be manipulated while in active use. A single disk can have additional devices added to create a mirrored vdev, and a mirrored vdev can have physical devices added or removed to leave a larger or smaller number of mirrored devices, or a single device. A RaidZ vdev cannot be converted to or from a mirror, although additional vdevs can always be added to expand storage capacity (which can be any kind including RaidZ). A device in any vdev can be marked for removal, and ZFS will de-allocate data from it to allow it to be removed or replaced.

Of note, the devices in a vdev do not have to be the same size, but ZFS may not use the full capacity of all disks in a vdev, if some are larger than others. This only applies to devices within a single vdev. As vdevs are independent, ZFS does not care if different vdevs have different sizes or are built from different devices.

Also as a vdev cannot be shrunk in size, it is common to set aside a small amount of unused space (for example 1-2GB on a multi-TB disk), so that if a disk needs replacing, it is possible to allow for slight manufacturing variances and replace it with another disk of the same nominal capacity but slightly smaller actual capacity.

Cache devices

In addition to devices used for main data storage, ZFS also allows and manages devices used for caching purposes. These can be single devices or multiple mirrored devices, and are fully dedicated to the type of cache designated. Cache usage and its detailed settings can be fully deleted, created and modified without limit during live use. A list of ZFS cache types is given later in this article.

Partitions

ZFS can handle devices formatted into partitions for certain purposes, but this is not common use. Generally caches and data pools are given complete devices (or multiple complete devices).

Data structures: Pools, datasets and volumes

The top level of data management is a ZFS pool (or zpool). A ZFS system can have multiple pools defined. The vdevs to be used for a pool are specified when the pool is created (others can be added later), and ZFS will use all of the specified vdevs to maximize performance when storing data – a form of striping across the vdevs. Therefore, it is important to ensure that each vdev is sufficiently redundant, as loss of any vdev in a pool would cause loss of the pool, as with any other striping.

A ZFS pool can be expanded at any time by adding new vdevs, including when the system is 'live'. The storage space / vdevs already allocated to a pool cannot be shrunk, as data is stored across all vdevs in the pool (even if it is not yet full). However, as explained above, the individual vdevs can each be modified at any time (within stated limits), and new vdevs added at any time, since the addition or removal of mirrors, or marking of a redundant disk as offline, do not affect the ability of that vdev to store data.

Within pools, ZFS recognizes two types of data store:

  • A pool can contain datasets, which arecontainersstoring a native ZFS file system. Datasets can contain other datasets ("nesteddatasets"), which are transparent for file system purposes. A dataset within another dataset is treated much like adirectoryfor the purposes of file system navigation, but it allows a branch of a file system to have different settings for compression, deduplication and other settings. This is because file system settings are per-dataset (and can beinheritedby nested datasets).
  • A pool can also contain volumes (also known as zvols), which can be used asblock storage devicesby other systems. An example of a volume would be aniSCSIorFibre Channeltarget for another system, used to createNetwork-attached storage, aStorage area network(SAN), or any other ZFS-backed raw block storage capability. The volume will be seen by other systems as a bare storage device which they can use as they like. Capabilities such as snapshots, redundancy, "scrubbing" (data integrity and repair checks), deduplication, compression, cache usage, and replication are operational but not exposed to the remote system, which "sees" only a bare file storage device. Because ZFS does not create a file storage system on the block device or control how the storage space is used, it cannot create nested ZFS datasets or volumes within a volume.

Since volumes are presented as block devices, they can also be formatted with any other file system, to add ZFS features to that file system, although this is not usual practice. For example, a ZFS volume can be created, and then the block device it presents can be partitioned and formatted with a file system such as ext4 or NTFS. This can be done either locally or over a network (using iSCSI or similar). The resulting file system will be accessible as normal, but will also gain ZFS benefits such as data resilience, data integrity/scrubbing, snapshots, and additional option for data compression.[25]

Snapshots

Snapshots are an integral feature of ZFS. They provide immutable (read only) copies of the file system at a single point in time, and even very large file systems can be snapshotted many times every hour, or sustain tens of thousands of snapshots. Snapshot versions of individual files, or an entire dataset or pool, can easily be accessed, searched and restored. An entire snapshot can be cloned to create a new "copy", copied to a separate server as a replicated backup, or the pool or dataset can quickly be rolled back to any specific snapshot. Snapshots can also be compared to each other, or to the current data, to check for modified data.Snapshots do not take much disk space, but when data is deleted, the space will not be marked as free until any data is no longer referenced by the current system or any snapshot.

As such, snapshots are also an easy way to avoid the impact of ransomware.[26]

Other terminology

  • Scrub / scrubbing – ZFS can periodically or on demand check all data and all copies of that data, held in the entire of any pool, dataset or volume including nested datasets and volumes, to confirm that all copies match the expected integritychecksums, and correct them if not. This is an intensive process and can run in the background, adjusting its activity to match how busy the system is.
  • (Re-)silver / (re-)silvering – ZFS automatically remedies any defects found, and regenerates its data onto any new or replacement disks added to a vdev, or to multiple vdevs. (Re-)silvering is the ZFS equivalent of rebuilding aRAIDarray, but as ZFS has complete knowledge of how storage is being used, and which data is reliable, it can often avoid the full rebuild that other RAID rebuilds require, and copy and verify only the minimum data needed to restore the array to full operation.

Resizing of vdevs, pools, datasets and volumes

Generally ZFS does not expect to reduce the size of a pool, and does not have tools to reduce the set of vdevs that a pool is stored on. (Tools to remove vdevs have been rolled out in Oracle ZFS[27] and also exist for some derivatives of OpenZFS but are not yet generally released in OpenZFS for platforms[28]). Therefore as of 2018, to remove an entire vdev that is in active use, or to reduce the size of a pool, the data stored on it must be moved to another pool or a temporary copy made (or if easier, it can be deleted and later restored from backups/copies) so that the devices making up the vdev can be freed for other use or the pool deleted and recreated using fewer vdevs or a smaller size.

Additional capacity can be added to a pool at any time, simply by adding more devices if needed, defining the unused devices into vdevs and adding the new vdevs to the pool.

The capacity of an individual vdev is generally fixed when it is defined. There is one exception to this rule: single drive and mirrored vdevs can be expanded to larger (but not smaller) capacities, without affecting the vdev's operation, by adding larger disks and replacing/removing smaller disks, as shown in the example below.

A pool can be expanded into unused space, and the datasets and volumes within a pool can be likewise expanded to use any unused pool space. Datasets do not need a fixed size and can dynamically grow as data is stored, but volumes, being block devices, need to have their size defined by the user, and must be manually resized as required (which can be done 'live').

Resizing example:

  • A vdev is made up initially from a single 4TB hard drive, and data stored on it. (Note- not recommended in practice due to risk of data loss).
  • Two 6TB drives are attached to the vdev while 'live'. The vdev is now configured as a 3-way mirror. Its size is still limited to 4TB (the extra 2TB on each of the new disks being unusable). ZFS will automatically copy data to the new disks (resilvering).
  • The original disk is detached, again while 'live'. The vdev that remains contains two 6TB disks and is now a 2-way 6TB mirror, of which 4TB is being used. The pool can now be expanded by 2TB to use the extra space, and it will then be a 2 way mirrored vdev with 6TB raw capacity. The datasets or volumes in the pool can use the extra space.
  • If desired a further disk can be detached, leaving a single device vdev of 6TB (not recommended). Alternatively, a set of disks can be added, either configured as a new vdev (to add to the pool or use for a second pool), or attached as extra mirrors for the existing vdev.

Features

Data integrity

One major feature that distinguishes ZFS from other file systems is that it is designed with a focus on data integrity by protecting the user's data on disk against silent data corruption caused by data degradation, current spikes, bugs in disk firmware, phantom writes (the previous write did not make it to disk), misdirected reads/writes (the disk accesses the wrong block), DMA parity errors between the array and server memory or from the driver (since the checksum validates data inside the array), driver errors (data winds up in the wrong buffer inside the kernel), accidental overwrites (such as swapping to a live file system), etc.

A 1999 study showed that neither any of the then-major and widespread filesystems (such as UFS, Ext,[29] XFS, JFS, or NTFS), nor hardware RAID (which has some issues with data integrity) provided sufficient protection against data corruption problems.[30][31][32][33] Initial research indicates that ZFS protects data better than earlier efforts.[34][35] It is also faster than UFS[36][37] and can be seen as its replacement.

Within ZFS, data integrity is achieved by using a Fletcher-based checksum or a SHA-256 hash throughout the file system tree.[38] Each block of data is checksummed and the checksum value is then saved in the pointer to that block—rather than at the actual block itself. Next, the block pointer is checksummed, with the value being saved at its pointer. This checksumming continues all the way up the file system's data hierarchy to the root node, which is also checksummed, thus creating a Merkle tree.[38] In-flight data corruption or phantom reads/writes (the data written/read checksums correctly but is actually wrong) are undetectable by most filesystems as they store the checksum with the data. ZFS stores the checksum of each block in its parent block pointer so the entire pool self-validates.[38]

When a block is accessed, regardless of whether it is data or meta-data, its checksum is calculated and compared with the stored checksum value of what it "should" be. If the checksums match, the data are passed up the programming stack to the process that asked for it; if the values do not match, then ZFS can heal the data if the storage pool provides data redundancy (such as with internal mirroring), assuming that the copy of data is undamaged and with matching checksums.[39] It is optionally possible to provide additional in-pool redundancy by specifying copies=2 (or copies=3 or more), which means that data will be stored twice (or three times) on the disk, effectively halving (or, for copies=3, reducing to one third) the storage capacity of the disk.[40] Additionally some kinds of data used by ZFS to manage the pool are stored multiple times by default for safety, even with the default copies=1 setting.

If other copies of the damaged data exist or can be reconstructed from checksums and parity data, ZFS will use a copy of the data (or recreate it via a RAID recovery mechanism), and recalculate the checksum—ideally resulting in the reproduction of the originally expected value. If the data passes this integrity check, the system can then update all faulty copies with known-good data and redundancy will be restored.

Consistency of data held in memory, such as cached data in the ARC, is not checked by default, as ZFS is expected to run on enterprise quality hardware with error correcting RAM, but the capability to check in-memory data exists and can be enabled using "debug flags".

RAID ("RaidZ")

For ZFS to be able to guarantee data integrity, it needs multiple copies of the data, usually spread across multiple disks. Typically this is achieved by using either a RAID controller or so-called "soft" RAID (built into a file system).

Avoidance of hardware RAID controllers

While ZFS can work with hardware RAID devices, ZFS will usually work more efficiently and with greater protection of data, if it has raw access to all storage devices, and disks are not connected to the system using a hardware, firmware or other "soft" RAID, or any other controller which modifies the usual ZFS-to-disk I/O path. This is because ZFS relies on the disk for an honest view, to determine the moment data is confirmed as safely written, and it has numerous algorithms designed to optimize its use of caching, cache flushing, and disk handling.

If a third-party device performs caching or presents drives to ZFS as a single system, or without the low level view ZFS relies upon, there is a much greater chance that the system will perform less optimally, and that a failure will not be preventable by ZFS or as quickly or fully recovered by ZFS. For example, if a hardware RAID card is used, ZFS may not be able to determine the condition of disks or whether the RAID array is degraded or rebuilding, it may not know of all data corruption, and it cannot place data optimally across the disks, make selective repairs only, control how repairs are balanced with ongoing use, and may not be able to make repairs even if it could usually do so, as the hardware RAID card will interfere. RAID controllers also usually add controller-dependent data to the drives which prevents software RAID from accessing the user data. While it is possible to read the data with a compatible hardware RAID controller, this isn't always possible, and if the controller card develops a fault then a replacement may not be available, and other cards may not understand the manufacturer's custom data which is needed to manage and restore an array on a new card.

Therefore, unlike most other systems, where RAID cards or similar are used to offload resources and processing and enhance performance and reliability, with ZFS it is strongly recommended these methods not be used as they typically reduce the system's performance and reliability.

If disks must be connected through a RAID or other controller, it is recommended to use a plain HBA (host adapter) or fanout card, or configure the card in JBOD mode (i.e. turn off RAID and caching functions), to allow devices to be attached but the ZFS-to-disk I/O pathway to be unchanged. A RAID card in JBOD mode may still interfere, if it has a cache or depending upon its design, and may detach drives that do not respond in time (as has been seen with many energy-efficient consumer-grade hard drives), and as such, may require Time-Limited Error Recovery (TLER)/CCTL/ERC-enabled drives to prevent drive dropouts, so not all cards are suitable even with RAID functions disabled.[41]

ZFS' approach: RAID-Z and mirroring

Instead of hardware RAID, ZFS employs "soft" RAID, offering RAID-Z (parity based like RAID 5 and similar) and disk mirroring (similar to RAID 1). The schemes are highly flexible.

RAID-Z is a data/parity distribution scheme like RAID-5, but uses dynamic stripe width: every block is its own RAID stripe, regardless of blocksize, resulting in every RAID-Z write being a full-stripe write. This, when combined with the copy-on-write transactional semantics of ZFS, eliminates the write hole error. RAID-Z is also faster than traditional RAID 5 because it does not need to perform the usual read-modify-write sequence.[42]

As all stripes are of different sizes, RAID-Z reconstruction has to traverse the filesystem metadata to determine the actual RAID-Z geometry. This would be impossible if the filesystem and the RAID array were separate products, whereas it becomes feasible when there is an integrated view of the logical and physical structure of the data. Going through the metadata means that ZFS can validate every block against its 256-bit checksum as it goes, whereas traditional RAID products usually cannot do this.[42]

In addition to handling whole-disk failures, RAID-Z can also detect and correct silent data corruption, offering "self-healing data": when reading a RAID-Z block, ZFS compares it against its checksum, and if the data disks did not return the right answer, ZFS reads the parity and then figures out which disk returned bad data. Then, it repairs the damaged data and returns good data to the requestor.[42]

RAID-Z and mirroring do not require any special hardware: they do not need NVRAM for reliability, and they do not need write buffering for good performance or data protection. With RAID-Z, ZFS provides fast, reliable storage using cheap, commodity disks.[42]

There are five different RAID-Z modes: RAID-Z0 (similar to RAID 0, offers no redundancy), RAID-Z1 (similar to RAID 5, allows one disk to fail), RAID-Z2 (similar to RAID 6, allows two disks to fail), RAID-Z3 (a RAID 7 [1] configuration, allows three disks to fail), and mirroring (similar to RAID 1, allows all but one disk to fail).[44]

The need for RAID-Z3 arose in the early 2000's as muti-terabyte capacity drives became more common. This increase in capacity - without a corresponding increase in throughput speeds - meant that rebuilding an array due to a failed drive could take "weeks or even months" to complete.[43] During this time, the older disks in the array will be stressed by the additional workload, which could result data corruption or drive failure. By increasing parity, RAID-Z3 reduces the chance of data loss by simply increasing redundancy.[45]

Resilvering and scrub (array syncing and integrity checking)

ZFS has no tool equivalent to fsck (the standard Unix and Linux data checking and repair tool for file systems).[46] Instead, ZFS has a built-in scrub function which regularly examines all data and repairs silent corruption and other problems. Some differences are:

  • fsck must be run on an offline filesystem, which means the filesystem must be unmounted and is not usable while being repaired, while scrub is designed to be used on a mounted, live filesystem, and does not need the ZFS filesystem to be taken offline.

  • fsck usually only checks metadata (such as the journal log) but never checks the data itself. This means, after an fsck, the data might still not match the original data as stored.

  • fsck cannot always validate and repair data when checksums are stored with data (often the case in many file systems), because the checksums may also be corrupted or unreadable. ZFS always stores checksums separately from the data they verify, improving reliability and the ability of scrub to repair the volume. ZFS also stores multiple copies of data – metadata in particular may have upwards of 4 or 6 copies (multiple copies per disk and multiple disk mirrors per volume), greatly improving the ability of scrub to detect and repair extensive damage to the volume, compared to fsck.

  • scrub checks everything, including metadata and the data. The effect can be observed by comparing fsck to scrub times – sometimes a fsck on a large RAID completes in a few minutes, which means only the metadata was checked. Traversing all metadata and data on a large RAID takes many hours, which is exactly what scrub does.

The official recommendation from Sun/Oracle is to scrub enterprise-level disks once a month, and cheaper commodity disks once a week.[47][48]

Capacity

ZFS is a 128-bit file system,[49][50] so it can address 1.84 × 1019 times more data than 64-bit systems such as Btrfs. The maximum limits of ZFS are designed to be so large that they should never be encountered in practice. For instance, fully populating a single zpool with 2128 bits of data would require 3×1024 TB hard disk drives.[51]

Some theoretical limits in ZFS are:

  • 248: number of entries in any individual directory[52]

  • 16 exbibytes (264 bytes): maximum size of a single file

  • 16 exbibytes: maximum size of any attribute

  • 256 quadrillion zebibytes (2128 bytes): maximum size of any zpool

  • 256: number of attributes of a file (actually constrained to 248 for the number of files in a directory)

  • 264: number of devices in any zpool

  • 264: number of zpools in a system

  • 264: number of file systems in a zpool

Encryption

With Oracle Solaris, the encryption capability in ZFS[53] is embedded into the I/O pipeline. During writes, a block may be compressed, encrypted, checksummed and then deduplicated, in that order. The policy for encryption is set at the dataset level when datasets (file systems or ZVOLs) are created. The wrapping keys provided by the user/administrator can be changed at any time without taking the file system offline. The default behaviour is for the wrapping key to be inherited by any child data sets. The data encryption keys are randomly generated at dataset creation time. Only descendant datasets (snapshots and clones) share data encryption keys.[54] A command to switch to a new data encryption key for the clone or at any time is provided—this does not re-encrypt already existing data, instead utilising an encrypted master-key mechanism.

Read/write efficiency

ZFS will automatically allocate data storage across all vdevs in a pool (and all devices in each vdev) in a way that generally maximises the performance of the pool. ZFS will also update its write strategy to take account of new disks added to a pool, when they are added.

As a general rule, ZFS allocates writes across vdevs based on the free space in each vdev. This ensures that vdevs which have proportionately less data already, are given more writes when new data is to be stored. This helps to ensure that as the pool becomes more used, the situation does not develop that some vdevs become full, forcing writes to occur on a limited number of devices. It also means that when data is read (and reads are much more frequent than writes in most uses), different parts of the data can be read from as many disks as possible at the same time, giving much higher read performance. Therefore, as a general rule, pools and vdevs should be managed and new storage added, so that the situation does not arise that some vdevs in a pool are almost full and others almost empty, as this will make the pool less efficient.

Other features

Storage devices, spares, and quotas

Pools can have hot spares to compensate for failing disks. When mirroring, block devices can be grouped according to physical chassis, so that the filesystem can continue in the case of the failure of an entire chassis.

Storage pool composition is not limited to similar devices, but can consist of ad-hoc, heterogeneous collections of devices, which ZFS seamlessly pools together, subsequently doling out space to diverse filesystems as needed. Arbitrary storage device types can be added to existing pools to expand their size.[55]

The storage capacity of all vdevs is available to all of the file system instances in the zpool. A quota can be set to limit the amount of space a file system instance can occupy, and a reservation can be set to guarantee that space will be available to a file system instance.

Caching mechanisms: ARC, L2ARC, Transaction groups, ZIL, SLOG, Special VDEV

ZFS uses different layers of disk cache to speed up read and write operations. Ideally, all data should be stored in RAM, but that is usually too expensive. Therefore, data is automatically cached in a hierarchy to optimize performance versus cost;[56] these are often called "hybrid storage pools".[57] Frequently accessed data will be stored in RAM, and less frequently accessed data can be stored on slower media, such as solid state drives (SSDs). Data that is not often accessed is not cached and left on the slow hard drives. If old data is suddenly read a lot, ZFS will automatically move it to SSDs or to RAM.

ZFS caching mechanisms include one each for reads and writes, and in each case, two levels of caching can exist, one in computer memory (RAM) and one on fast storage (usually solid state drives (SSDs)), for a total of four caches.

Where storedRead cacheWrite cache
First level cacheIn RAMKnown as ARC, due to its use of a variant of the adaptive replacement cache (ARC) algorithm. RAM will always be used for caching, thus this level is always present. The efficiency of the ARC algorithm means that disks will often not need to be accessed, provided the ARC size is sufficiently large. If RAM is too small there will hardly be any ARC at all; in this case, ZFS always needs to access the underlying disks which impacts performance considerably.Handled by means of "transaction groups" – writes are collated over a short period (typically 5 – 30 seconds) up to a given limit, with each group being written to disk ideally while the next group is being collated. This allows writes to be organized more efficiently for the underlying disks at the risk of minor data loss of the most recent transactions upon power interruption or hardware fault. In practice the power loss risk is avoided by ZFS write journaling and by the SLOG/ZIL second tier write cache pool (see below), so writes will only be lost if a write failure happens at the same time as a total loss of the second tier SLOG pool, and then only when settings related to synchronous writing and SLOG use are set in a way that would allow such a situation to arise. If data is received faster than it can be written, data receipt is paused until the disks can catch up.
Second level cacheOn fast storage devices (which can be added or removed from a "live" system without disruption in current versions of ZFS, although not always in older versions)Known as L2ARC ("Level 2 ARC"), optional. ZFS will cache as much data in L2ARC as it can, which can be tens or hundreds of gigabytes in many cases. L2ARC will also considerably speed up deduplication if the entire deduplication table can be cached in L2ARC. It can take several hours to fully populate the L2ARC from empty (before ZFS has decided which data are "hot" and should be cached). If the L2ARC device is lost, all reads will go out to the disks which slows down performance, but nothing else will happen (no data will be lost).Known as SLOG or ZIL ("ZFS Intent Log") - the terms are often used incorrectly. A SLOG (secondary log device) is an optional dedicated cache on a separate device, for recording writes, in the event of a system issue. If an SLOG device exists, it will be used for the ZFS Intent Log as a second level log, and if no separate cache device is provided, the ZIL will be created on the main storage devices instead. The SLOG thus, technically, refes to the dedicated disk to which the ZIL is offloaded, in order to speed up the pool. Strictly speaking, ZFS does not use the SLOG device to cache its disk writes. Rather, it uses SLOG to ensure writes are captured to a permanent storage medium as quickly as possible, so that in the event of power loss or write failure, no data which was acknowledged as written, will be lost. The SLOG device allows ZFS to speedily store writes and quickly report them as written, even for storage devices such as HDDs that are much slower. In the normal course of activity, the SLOG is never referred to or read, and it does not act as a cache; its purpose is to safeguard data in flight during the few seconds taken for collation and "writing out", in case the eventual write were to fail. If all goes well, then the storage pool will be updated at some point within the next 5 to 60 seconds, when the current transaction group is written out to disk (see above), at which point the saved writes on the SLOG will simply be ignored and overwritten. If the write eventually fails, or the system suffers a crash or fault preventing its writing, then ZFS can identify all the writes that it has confirmed were written, by reading back the SLOG (the only time it is read from), and use this to completely repair the data loss. This becomes crucial if a large number of synchronous writes take place (such as with ESXi, NFS and some databases),[58] where the client requires confirmation of successful writing before continuing its activity; the SLOG allows ZFS to confirm writing is successful much more quickly than if it had to write to the main store every time, without the risk involved in misleading the client as to the state of data storage. If there is no SLOG device then part of the main data pool will be used for the same purpose, although this is slower. If the log device itself is lost, it is possible to lose the latest writes, therefore the log device should be mirrored. In earlier versions of ZFS, loss of the log device could result in loss of the entire zpool, although this is no longer the case. Therefore, one should upgrade ZFS if planning to use a separate log device.

A number of other caches, cache divisions, and queues also exist within ZFS. For example, each VDEV has its own data cache, and the ARC cache is divided between data stored by the user and metadata used by ZFS, with control over the balance between these.

Special VDEV Class

In ZFS 0.8 and later, it is possible to configure a Special VDEV class to preferentially store filesystem metadata, and optionally the Data Deduplication Table (DDT), and small filesystem blocks. This allows, for example, to create a Special VDEV on fast solid-state storage to store the metadata, while the regular file data is stored on spinning disks. This speeds up metadata-intensive operations such as filesystem traversal, scrub, and resilver, without the expense of storing the entire filesystem on solid-state storage.

Copy-on-write transactional model

ZFS uses a copy-on-write transactional object model. All block pointers within the filesystem contain a 256-bit checksum or 256-bit hash (currently a choice between Fletcher-2, Fletcher-4, or SHA-256)[59] of the target block, which is verified when the block is read. Blocks containing active data are never overwritten in place; instead, a new block is allocated, modified data is written to it, then any metadata blocks referencing it are similarly read, reallocated, and written. To reduce the overhead of this process, multiple updates are grouped into transaction groups, and ZIL (intent log) write cache is used when synchronous write semantics are required. The blocks are arranged in a tree, as are their checksums (see Merkle signature scheme).

Snapshots and clones

An advantage of copy-on-write is that, when ZFS writes new data, the blocks containing the old data can be retained, allowing a snapshot version of the file system to be maintained. ZFS snapshots are consistent (they reflect the entire data as it existed at a single point in time), and can be created extremely quickly, since all the data composing the snapshot is already stored, with the entire storage pool often snapshotted several times per hour. They are also space efficient, since any unchanged data is shared among the file system and its snapshots. Snapshots are inherently read-only, ensuring they will not be modified after creation, although they should not be relied on as a sole means of backup. Entire snapshots can be restored and also files and directories within snapshots.

Writeable snapshots ("clones") can also be created, resulting in two independent file systems that share a set of blocks. As changes are made to any of the clone file systems, new data blocks are created to reflect those changes, but any unchanged blocks continue to be shared, no matter how many clones exist. This is an implementation of the Copy-on-write principle.

Sending and receiving snapshots

ZFS file systems can be moved to other pools, also on remote hosts over the network, as the send command creates a stream representation of the file system's state. This stream can either describe complete contents of the file system at a given snapshot, or it can be a delta between snapshots. Computing the delta stream is very efficient, and its size depends on the number of blocks changed between the snapshots. This provides an efficient strategy, e.g., for synchronizing offsite backups or high availability mirrors of a pool.

Dynamic striping

Dynamic striping across all devices to maximize throughput means that as additional devices are added to the zpool, the stripe width automatically expands to include them; thus, all disks in a pool are used, which balances the write load across them.

Variable block sizes

ZFS uses variable-sized blocks, with 128 KB as the default size. Available features allow the administrator to tune the maximum block size which is used, as certain workloads do not perform well with large blocks. If data compression is enabled, variable block sizes are used. If a block can be compressed to fit into a smaller block size, the smaller size is used on the disk to use less storage and improve IO throughput (though at the cost of increased CPU use for the compression and decompression operations).[60]

Lightweight filesystem creation

In ZFS, filesystem manipulation within a storage pool is easier than volume manipulation within a traditional filesystem; the time and effort required to create or expand a ZFS filesystem is closer to that of making a new directory than it is to volume manipulation in some other systems.

Adaptive endianness

Pools and their associated ZFS file systems can be moved between different platform architectures, including systems implementing different byte orders. The ZFS block pointer format stores filesystem metadata in an endian-adaptive way; individual metadata blocks are written with the native byte order of the system writing the block. When reading, if the stored endianness does not match the endianness of the system, the metadata is byte-swapped in memory.

This does not affect the stored data; as is usual in POSIX systems, files appear to applications as simple arrays of bytes, so applications creating and reading data remain responsible for doing so in a way independent of the underlying system's endianness.

Deduplication

Data deduplication capabilities were added to the ZFS source repository at the end of October 2009,[61] and relevant OpenSolaris ZFS development packages have been available since December 3, 2009 (build 128).

Effective use of deduplication may require large RAM capacity; recommendations range between 1 and 5 GB of RAM for every TB of storage.[62][63][64] An accurate assessment of the memory required for deduplication is made by referring to the number of unique blocks in the pool, and the number of bytes on disk and in RAM ("core") required to store each record - these figures are reported by inbuilt commands such as zpool and zdb. Insufficient physical memory or lack of ZFS cache can result in virtual memory thrashing when using deduplication, which can cause performance to plummet, or result in complete memory starvation. Because deduplication occurs at write-time, it is also very CPU-intensive and this can also significantly slow down a system.

Other storage vendors use modified versions of ZFS to achieve very high data compression ratios. Two examples in 2012 were GreenBytes[65] and Tegile.[66] In May 2014, Oracle bought GreenBytes for its ZFS deduplication and replication technology.[67]

As described above, deduplication is usually not recommended due to its heavy resource requirements (especially RAM) and impact on performance (especially when writing), other than in specific circumstances where the system and data are well-suited to this space-saving technique.

Additional capabilities

  • Explicit I/O priority with deadline scheduling.

  • Claimed globally optimal I/O sorting and aggregation.

  • Multiple independent prefetch streams with automatic length and stride detection.

  • Parallel, constant-time directory operations.

  • End-to-end checksumming, using a kind of "Data Integrity Field", allowing data corruption detection (and recovery if you have redundancy in the pool). A choice of 3 hashes can be used, optimized for speed (fletcher), standardization and security (SHA256) and salted hashes (Skein).[68]

  • Transparent filesystem compression. Supports LZJB, gzip[69] and LZ4.

  • Intelligent scrubbing and resilvering (resyncing).[70]

  • Load and space usage sharing among disks in the pool.[71]

  • Ditto blocks: Configurable data replication per filesystem, with zero, one or two extra copies requested per write for user data, and with that same base number of copies plus one or two for metadata (according to metadata importance).[72] If the pool has several devices, ZFS tries to replicate over different devices. Ditto blocks are primarily an additional protection against corrupted sectors, not against total disk failure.[73]

  • ZFS design (copy-on-write + superblocks) is safe when using disks with write cache enabled, if they honor the write barriers. This feature provides safety and a performance boost compared with some other filesystems.

  • On Solaris, when entire disks are added to a ZFS pool, ZFS automatically enables their write cache. This is not done when ZFS only manages discrete slices of the disk, since it does not know if other slices are managed by non-write-cache safe filesystems, like UFS. The FreeBSD implementation can handle disk flushes for partitions thanks to its GEOM framework, and therefore does not suffer from this limitation.

  • Per-user, per-group, per-project, and per-dataset quota limits.[74]

  • Filesystem encryption since Solaris 11 Express[3] (on some other systems ZFS can utilize encrypted disks for a similar effect; GELI on FreeBSD can be used this way to create fully encrypted ZFS storage).

  • Pools can be imported in read-only mode.

  • It is possible to recover data by rolling back entire transactions at the time of importing the zpool.

  • ZFS is not a clustered filesystem; however, clustered ZFS is available from third parties.

  • Snapshots can be taken manually or automatically. The older versions of the stored data that they contain can be exposed as full read-only file systems. They can also be exposed as historic versions of files and folders when used with CIFS (also known as SMB, Samba or file shares); this is known as "Previous versions", "VSS shadow copies", or "File history" on Windows, or AFP and "Apple Time Machine" on Apple devices.[75]

  • Disks can be marked as 'spare'. A data pool can be set to automatically and transparently handle disk faults by activating a spare disk and beginning to resilver the data that was on the suspect disk onto it, when needed.

Limitations

Limitations in preventing data corruption

The authors of a 2010 study that examined the ability of file systems to detect and prevent data corruption, with particular focus on ZFS, observed that ZFS itself is effective in detecting and correcting data errors on storage devices, but that it assumes data in RAM is "safe", and not prone to error. The study comments that "a single bit flip in memory causes a small but non-negligible percentage of runs to experience a failure", with the probability of committing bad data to disk varying from 0% to 3.6% (according to the workload)," and that when ZFS caches pages or stores copies of metadata in RAM, or holds data in its "dirty" cache for writing to disk, no test is made whether the checksums still match the data at the point of use.[76] Much of this risk can be mitigated in one of two ways:

  • According to the authors, by usingECC RAM; however the authors considered that addingerror detectionrelated to the page cache and heap would allow ZFS to handle certain classes of error more robustly.[76]
  • One of the main architects of ZFS, Matt Ahrens, explains there is an option to enable checksumming of data in memory by using the ZFS_DEBUG_MODIFY flag (zfs_flags=0x10) which addresses these concerns.[77]

Other limitations specific to ZFS

  • Capacity expansion is normally achieved by adding groups of disks as a top-level vdev: simple device, RAID-Z, RAID Z2, RAID Z3, or mirrored. Newly written data will dynamically start to use all available vdevs. It is also possible to expand the array by iteratively swapping each drive in the array with a bigger drive and waiting for ZFS to self-heal; the heal time will depend on the amount of stored information, not the disk size.

  • As of Solaris 10 Update 11 and Solaris 11.2, it was neither possible to reduce the number of top-level vdevs in a pool, nor to otherwise reduce pool capacity.[78] This functionality was said to be in development in 2007.[79] Enhancements to allow reduction of vdevs is under development in OpenZFS.[80]

  • As of 2008 it was not possible to add a disk as a column to a RAID Z, RAID Z2 or RAID Z3 vdev. However, a new RAID Z vdev can be created instead and added to the zpool.[81]

  • Some traditional nested RAID configurations, such as RAID 51 (a mirror of RAID 5 groups), are not configurable in ZFS. Vdevs can only be composed of raw disks or files, not other vdevs. However, a ZFS pool effectively creates a stripe (RAID 0) across its vdevs, so the equivalent of a RAID 50 or RAID 60 is common.

  • Reconfiguring the number of devices in a top-level vdev requires copying data offline, destroying the pool, and recreating the pool with the new top-level vdev configuration, except for adding extra redundancy to an existing mirror, which can be done at any time or if all top level vdevs are mirrors with sufficient redundancy the zpool split[82] command can be used to remove a vdev from each top level vdev in the pool, creating a 2nd pool with identical data.

  • IOPS performance of a ZFS storage pool can suffer if the ZFS raid is not appropriately configured. This applies to all types of RAID, in one way or another. If the zpool consists of only one group of disks configured as, say, eight disks in RAID Z2, then the IOPS performance will be that of a single disk (write speed will be equivalent to 6 disks, but random read speed will be similar to a single disk). However, there are ways to mitigate this IOPS performance problem, for instance add SSDs as L2ARC cache — which can boost IOPS into 100.000s.[83] In short, a zpool should consist of several groups of vdevs, each vdev consisting of 8–12 disks, if using RAID Z. It is not recommended to create a zpool with a single large vdev, say 20 disks, because IOPS performance will be that of a single disk, which also means that resilver time will be very long (possibly weeks with future large drives).

  • Online shrink is not supported.

  • Resilver (repair) of a crashed disk in a ZFS RAID can take a long time which is not unique to ZFS, it applies to all types of RAID, in one way or another. This means that very large volumes can take several days to repair or to being back to full redundancy after severe data corruption or failure, and during this time a second disk failure may occur, especially as the repair puts additional stress on the system as a whole. In turn this means that configurations that only allow for recovery of a single disk failure, such as RAID Z1 (similar to RAID 5) should be avoided. Therefore, with large disks, one should use RAID Z2 (allow two disks to crash) or RAID Z3 (allow three disks to crash).[84] It should be noted however, that ZFS RAID differs from conventional RAID by only reconstructing live data and metadata when replacing a disk, not the entirety of the disk including blank and garbage blocks, which means that replacing a member disk on a ZFS pool that is only partially full will take proportionally less time compared to conventional RAID.[70]

Other good practices

For ZFS to protect data against disk failure, it needs to be configured with redundant storage - either RAID-Z or mirrored (so all data is copied to at least two disks). If a single disk is used, redundant copies of the data should be enabled which duplicates the data on the same logical drive - this is far less safe since it is vulnerable to the failure of the single disk. Using ZFS copies is a good feature to use on notebooks and desktop computers, since the disks are large and it at least provides some limited redundancy with just a single drive.

Data recovery

Historically, ZFS has not shipped with tools such as fsck to repair damaged file systems, because the file system itself was designed to self-repair, so long as it had been built with sufficient attention to the design of storage and redundancy of data. If the pool was compromised because of poor hardware, inadequate design or redundancy, or unfortunate mishap, to the point that ZFS was unable to mount the pool, traditionally there were no tools which allowed an end-user to attempt partial salvage of the stored data. This led to threads in online forums where ZFS developers sometimes tried to provide ad-hoc help to home and other small scale users, facing loss of data due to their inadequate design or poor system management.[85]

Modern ZFS has improved considerably on this situation over time, and continues to do so:

  • Removal or abrupt failure of caching devices no longer causes pool loss. (At worst, loss of the ZIL may lose very recent transactions, but the ZIL does not usually store more than a few seconds worth of recent transactions. Loss of the L2ARC cache does not affect data.)
  • If the pool is unmountable, modern versions of ZFS will attempt to identify the most recent consistent point at which the pool which can be recovered, at the cost of losing some of the most recent changes to the contents.Copy on writemeans that older versions of data, including top-level records and metadata, may still exist even though they are superseded, and if so, the pool can be wound back to a consistent state based on them. The older the data, the more likely it is that at least some blocks have been overwritten and that some data will be irrecoverable, so there is a limit at some point, on the ability of the pool to be wound back.
  • Informally, tools exist to probe the reason why ZFS is unable to mount a pool, and guide the user or a developer as to manual changes required to force the pool to mount. These include using zdb (ZFS debug) to find a valid importable point in the pool, usingdtraceor similar to identify the issue causing mount failure, or manually bypassing health checks that cause the mount process to abort, and allow mounting of the damaged pool.
  • As of March 2018, a range of significantly enhanced methods are gradually being rolled out within OpenZFS. These include:[85]
  • Code refactoring, and more detailed diagnostic and debug information on mount failures, to simplify diagnosis and fixing of corrupt pool issues;
  • The ability to trust or distrust the stored pool configuration. This is particularly powerful, as it allows a pool to be mounted even when top-level vdevs are missing or faulty, when top level data is suspect, and also to rewind beyond a pool configuration change if that change was connected to the problem. Once the corrupt pool is mounted, readable files can be copied for safety, and it may turn out that data can be rebuilt even for missing vdevs, by using copies stored elsewhere in the pool.
  • The ability to fix the situation where a disk needed in one pool, was accidentally removed and added to a different pool, causing it to lose metadata related to the first pool, which becomes unreadable.

Platforms

Solaris

Solaris 10 update 2 and later

ZFS is part of Sun's own Solaris operating system and is thus available on both SPARC and x86-based systems.

Solaris 11

After Oracle's Solaris 11 Express release, the OS/Net consolidation (the main OS code) was made proprietary and closed-source,[86] and further ZFS upgrades and implementations inside Solaris (such as encryption) are not compatible with other non-proprietary implementations which use previous versions of ZFS.

When creating a new ZFS pool, to retain the ability to use access the pool from other non-proprietary Solaris-based distributions, it is recommended to upgrade to Solaris 11 Express from OpenSolaris (snv_134b), and thereby stay at ZFS version 28.

OpenSolaris

OpenSolaris 2008.05, 2008.11 and 2009.06 use ZFS as their default filesystem. There are over a dozen 3rd-party distributions, of which nearly a dozen are mentioned here. [196] (OpenIndiana and illumos are two new distributions not included on the OpenSolaris distribution reference page.)

OpenIndiana

OpenIndiana uses OpenZFS with feature flags as implemented in Illumos. ZFS version 28 used up to version 151a3.[87]

By upgrading from OpenSolaris snv_134 to both OpenIndiana and Solaris 11 Express, one also has the ability to upgrade and separately boot Solaris 11 Express on the same ZFS pool, but one should not install Solaris 11 Express first because of ZFS incompatibilities introduced by Oracle past ZFS version 28.[88]

BSD

macOS

OpenZFS on OSX (abbreviated to O3X) is an implementation of ZFS for macOS.[89] O3X is under active development, with close relation to ZFS on Linux and illumos' ZFS implementation, while maintaining feature flag compatibility with ZFS on Linux. O3X implements zpool version 5000, and includes the Solaris Porting Layer (SPL) originally written for MacZFS, which has been further enhanced to include a memory management layer based on the illumos kmem and vmem allocators. O3X is fully featured, supporting LZ4 compression, deduplication, ARC, L2ARC, and SLOG.

MacZFS is free software providing support for ZFS on macOS. The stable legacy branch provides up to ZFS pool version 8 and ZFS filesystem version 2. The development branch, based on ZFS on Linux and OpenZFS, provides updated ZFS functionality, such as up to ZFS zpool version 5000 and feature flags.[90][91]

A proprietary implementation of ZFS (Zevo) was available at no cost from GreenBytes, Inc., implementing up to ZFS file system version 5 and ZFS pool version 28.[92] Zevo offered a limited ZFS feature set, pending further commercial development; it was sold to Oracle in 2014, with unknown future plans.

DragonFlyBSD

Edward O'Callaghan started the initial port of ZFS to DragonFlyBSD.[93]

NetBSD

The NetBSD ZFS port was started as a part of the 2007 Google Summer of Code and in August 2009, the code was merged into NetBSD's source tree.[94]

FreeBSD

Paweł Jakub Dawidek ported ZFS to FreeBSD, and it has been part of FreeBSD since version 7.0.[95] This includes zfsboot, which allows booting FreeBSD directly from a ZFS volume.[96][97]

FreeBSD's ZFS implementation is fully functional; the only missing features are kernel CIFS server and iSCSI, but the latter can be added using externally available packages.[98] Samba can be used to provide a userspace CIFS server.

FreeBSD 7-STABLE (where updates to the series of versions 7.x are committed to) uses zpool version 6.

FreeBSD 8 includes a much-updated implementation of ZFS, and zpool version 13 is supported.[99] zpool version 14 support was added to the 8-STABLE branch on January 11, 2010,[100] and is included in FreeBSD release 8.1. zpool version 15 is supported in release 8.2.[101] The 8-STABLE branch gained support for zpool version v28 and zfs version 5 in early June 2011.[102] These changes were released mid-April 2012 with FreeBSD 8.3.[103]

FreeBSD 9.0-RELEASE uses ZFS Pool version 28.[104][105]

FreeBSD 9.2-RELEASE is the first FreeBSD version to use the new "feature flags" based implementation thus Pool version 5000.[106]

MidnightBSD

MidnightBSD, a desktop operating system derived from FreeBSD, supports ZFS storage pool version 6 as of 0.3-RELEASE. This was derived from code included in FreeBSD 7.0-RELEASE. An update to storage pool 28 is in progress in 0.4-CURRENT and based on 9-STABLE sources around FreeBSD 9.1-RELEASE code.

TrueOS (formerly PC-BSD)

TrueOS (formerly known as PC-BSD) is a desktop-oriented distribution of FreeBSD, which inherits its ZFS support.

FreeNAS

FreeNAS, an embedded open source network-attached storage (NAS) distribution based on FreeBSD, has the same ZFS support as FreeBSD and PC-BSD.

ZFS Guru

ZFS Guru, an embedded open source network-attached storage (NAS) distribution based on FreeBSD.[107]

pfSense

pfSense, an open source BSD based router, supports ZFS, including installation and booting to ZFS pools, as of version 2.4.

NAS4Free

NAS4Free, an embedded open source network-attached storage (NAS) distribution based on FreeBSD, has the same ZFS support as FreeBSD, ZFS storage pool version 5000. This project is a continuation of FreeNAS 7 series project.[108]

Debian GNU/kFreeBSD

Being based on the FreeBSD kernel, Debian GNU/kFreeBSD has ZFS support from the kernel. However, additional userland tools are required,[109] while it is possible to have ZFS as root or /boot file system[110] in which case required GRUB configuration is performed by the Debian installer since the Wheezy release.[111]

As of 31 January 2013, the ZPool version available is 14 for the Squeeze release, and 28 for the Wheezy-9 release.[112]

Linux

Although the ZFS filesystem supports Linux-based operating systems, difficulties arise for Linux distribution maintainers wishing to provide native support for ZFS in their products due to potential legal incompatibilities between the CDDL license used by the ZFS code, and the GPL license used by the Linux kernel. To enable ZFS support within Linux, a loadable kernel module containing the CDDL-licensed ZFS code must be compiled and loaded into the kernel. According to the Free Software Foundation, the wording of the GPL license legally prohibits redistribution of the resulting product as a derivative work,[113][114] though this viewpoint has caused some controversy.[115][116]

ZFS on FUSE

One potential workaround to licensing incompatibility was trialed in 2006, with an experimental port of the ZFS code to Linux's FUSE system. The filesystem ran entirely in userspace instead of being integrated into the Linux kernel, and was therefore not considered a derivative work of the kernel. This approach was functional, but suffered from significant performance penalties when compared with integrating the filesystem as a native kernel module running in kernel space.[117] As of 2016, the ZFS on FUSE project appears to be defunct.

Native ZFS on Linux

A native port of ZFS for Linux produced by the Lawrence Livermore National Laboratory (LLNL) was released in March 2013,[118][119] following these key events:[120]

  • 2008: prototype to determine viability

  • 2009: initial ZVOL and Lustre support

  • 2010: development moved to GitHub

  • 2011: POSIX layer added

  • 2011: community of early adopters

  • 2012: production usage of ZFS

  • 2013: stable GA release

As of August 2014, ZFS on Linux uses the OpenZFS pool version number 5000, which indicates that the features it supports are defined via feature flags. This pool version is an unchanging number that is expected to never conflict with version numbers given by Oracle.[121]

KQ InfoTech

Another native port for Linux was developed by KQ InfoTech in 2010.[122][123] This port used the zvol implementation from the Lawrence Livermore National Laboratory as a starting point. A release supporting zpool v28 was announced in January 2011.[124] In April 2011, KQ Infotech was acquired by sTec, Inc., and their work on ZFS ceased.[125] Source code of this port can be found on GitHub.[126]

The work of KQ InfoTech was ultimately integrated into the LLNL's native port of ZFS for Linux.[125]

Source code distribution

While the license incompatibility may arise with the distribution of compiled binaries containing ZFS code, it is generally agreed that distribution of the source code itself is not affected by this. In Gentoo, configuring a ZFS root filesystem is well documented and the required packages can be installed from its package repository.[127] Slackware also provides documentation on supporting ZFS, both as a kernel module[128] and when built into the kernel.[129]

Ubuntu integration

The question of the CDDL license's compatibility with the GPL license resurfaced in 2015, when the Linux distribution Ubuntu announced that it intended to make precompiled OpenZFS binary kernel modules available to end-users directly from the distribution's official package repositories.[130] In 2016, Ubuntu announced that a legal review resulted in the conclusion that providing support for ZFS via a binary kernel module was not in violation of the provisions of the GPL license.[131] Others,[132] such as the Software Freedom Law Center[133] followed Ubuntu's conclusion, while the FSF and SFC reiterated their opposing view.[134][135]

Ubuntu 16.04 LTS ("Xenial Xerus"), released on April 21, 2016, allows the user to install the OpenZFS binary packages directly from the Ubuntu software repositories.[136][137][138][139] as of 2019, no legal challenge has been brought against Canonical regarding the distribution of these packages.

As of 2019 Ubuntu do not support install ZFS on root filesystem, only MAAS allows ZFS to be installed as a root filesystem.[140]

Microsoft Windows

A port of open source ZFS was attempted in 2010 but after a hiatus of over one year development ceased in 2012.[141] In October 2017 a new port of OpenZFS was announced by Jörgen Lundman at OpenZFS Developer Summit.[142][143]

List of operating systems supporting ZFS

List of Operating Systems, distributions and add-ons that support ZFS, the zpool version it supports, and the Solaris build they are based on (if any):

OSZpool versionSun/Oracle Build #Comments
Oracle Solaris 11.3370.5.11-0.175.3.1.0.5.0
Oracle Solaris 10 1/13 (U11)32
Oracle Solaris 11.2350.5.11-0.175.2.0.0.42.0
Oracle Solaris 11 2011.1134b175
Oracle Solaris Express 11 2010.1131b151alicensed for testing only
OpenSolaris 2009.0614b111b
OpenSolaris (last dev)22b134
OpenIndiana5000b147distribution based on illumos; creates a name clash naming their build code 'b151a'
Nexenta Core 3.0.126b134+GNU userland
NexentaStor Community 3.0.126b134+up to 18 TB, web admin
NexentaStor Community 3.1.028b134+GNU userland
NexentaStor Community 4.05000b134+up to 18 TB, web admin
NexentaStor Enterprise28b134 +not free, web admin
GNU/kFreeBSD "Squeeze" (as of 1/31/2013)14Requires package "zfsutils"
GNU/kFreeBSD "Wheezy-9" (as of 2/21/2013)28Requires package "zfsutils"
FreeBSD5000
zfs-fuse 0.7.223suffered from performance issues; defunct
ZFS on Linux 0.6.5.850000.6.0 release candidate has POSIX layer
KQ Infotech's ZFS on Linux28defunct; code integrated into LLNL-supported ZFS on Linux
BeleniX 0.8b114b111small-size live-CD distribution; once based on OpenSolaris
Schillix 0.7.228b147small-size live-CD distribution; as SchilliX-ON [197] 0.8.0 based on OpenSolaris
StormOS "hail"distribution once based on Nexenta Core 2.0+, Debian Linux; superseded by Dyson OS
JarisJapanese Solaris distribution; once based on OpenSolaris
MilaX 0.520b128asmall-size live-CD distribution; once based on OpenSolaris
FreeNAS 8.0.2 / 8.215
FreeNAS 8.3.028based on FreeBSD 8.3
FreeNAS 9.1.05000based on FreeBSD 9.1
NAS4Free 10.2.0.2/10.3.0.35000based on FreeBSD 10.2/10.3
Korona 4.5.022b134KDE
EON NAS (v0.6)22b130embedded NAS
EON NAS (v1.0beta)28b151aembedded NAS
napp-it [198]28/5000Illumos/SolarisStorage appliance; OpenIndiana (Hipster), OmniOS, Solaris 11, Linux (ZFS management)
OmniOS CE [199]28/5000illumos-OmniOS branchminimal stable/LTS storage server distribution based on Illumos, community driven
SmartOS28/5000Illumos b151+minimal live distribution based on Illumos (USB/CD boot); cloud and hypervisor use (KVM)
macOS 10.5, 10.6, 10.7, 10.8, 10.95000via MacZFS; superseded [200] by OpenZFS on OS X [201]
macOS 10.6, 10.7, 10.828via ZEVO; superseded [202] by OpenZFS on OS X [203]
NetBSD22
MidnightBSD6
Ubuntu Linux 16.04 LTS, 18.04 LTS, 18.105000native support via installable binary module [204] , wiki.ubuntu.com/ZFS [205]
ZFSGuru 10.1.1005000

History

Development history

Original development

ZFS was designed and implemented by a team at Sun led by Jeff Bonwick, Bill Moore[144] and Matthew Ahrens. It was announced on September 14, 2004,[145] but development started in 2001.[146] Source code for ZFS was integrated into the main trunk of Solaris development on October 31, 2005,[50] and released as part of build 27 of OpenSolaris on November 16, 2005. Sun announced that ZFS was included in the 6/06 update to Solaris 10 in June 2006, one year after the opening of the OpenSolaris community.[147]

The name at one point was said to stand for "Zettabyte File System",[148] but by 2006 was no longer considered to be an abbreviation.[5] A ZFS file system can store up to 256 quadrillion zettabytes (ZB).

In September 2007, NetApp sued Sun claiming that ZFS infringed some of NetApp's patents on Write Anywhere File Layout. Sun counter-sued in October the same year claiming the opposite. The lawsuits were ended in 2010 with an undisclosed settlement.[149]

Open source

The following is a list of events in the development of open-source ZFS implementations:[120][150]

  • 2005: Source code was released as part of OpenSolaris.

  • 2006: Development of a FUSE port for Linux started.

  • 2007: Apple started porting ZFS to Mac OS X.

  • 2008: A port to FreeBSD was released as part of FreeBSD 7.0.

  • 2008: Development of a native Linux port started.

  • 2009: Apple's ZFS project closed. The MacZFS project continued to develop the code.

  • 2010: OpenSolaris was discontinued. Further development of ZFS on Solaris was no longer open source.

  • 2010: illumos was founded as an open source successor,[151] and continued to develop ZFS in the open. Ports of ZFS to other platforms continued porting upstream changes from illumos.

  • 2013: The OpenZFS project begins, aiming at coordinated open-source development of ZFS. The OpenZFS project provides a common foundation for any interested groups and organizations to contribute and collaborate towards a common open source ZFS core, and in addition, to also maintain any specific code and validation processes needed for core ZFS code to work with their own individual systems.

Apple/MacOS

The first indication of Apple Inc.'s interest in ZFS was an April 2006 post on the opensolaris.org zfs-discuss mailing list where an Apple employee mentioned being interested in porting ZFS to their Mac OS X operating system.[152] In the release version of Mac OS X 10.5, ZFS was available in read-only mode from the command line, which lacks the possibility to create zpools or write to them.[153] Before the 10.5 release, Apple released the "ZFS Beta Seed v1.1", which allowed read-write access and the creation of zpools,;[154] however, the installer for the "ZFS Beta Seed v1.1" has been reported to only work on version 10.5.0, and has not been updated for version 10.5.1 and above.[155] In August 2007, Apple opened a ZFS project on their Mac OS Forge web site. On that site, Apple provided the source code and binaries of their port of ZFS which includes read-write access, but there was no installer available[156] until a third-party developer created one.[157] In October 2009, Apple announced a shutdown of the ZFS project on Mac OS Forge. That is to say that their own hosting and involvement in ZFS was summarily discontinued. No explanation was given, just the following statement: "The ZFS project has been discontinued. The mailing list and repository will also be removed shortly." Apple would eventually release the legally required, CDDL-derived, portion of the source code of their final public beta of ZFS, code named "10a286". Complete ZFS support was once advertised as a feature of Snow Leopard Server (Mac OS X Server 10.6).[158] However, by the time the operating system was released, all references to this feature had been silently removed from its features page.[159] Apple has not commented regarding the omission.

Apple's "10a286" source code release, and versions of the previously released source and binaries, have been preserved and new development has been adopted by a group of enthusiasts.[160][161] The MacZFS project[162] acted quickly to mirror the public archives of Apple's project before the materials would have disappeared from the internet, and then to resume its development elsewhere. The MacZFS community has curated and matured the project, supporting ZFS for all Mac OS releases since 10.5. The project has an active mailing list [206] . As of July 2012, MacZFS implements zpool version 8 and ZFS version 2, from the October 2008 release of Solaris. Additional historical information and commentary can be found on the MacZFS web site and FAQ.[163]

The 17th September 2013 launch of OpenZFS included ZFS-OSX, which will become a new version of MacZFS, as the distribution for Darwin.[164]

Commercial and open source products

  • 2008: Sun shipped a line of ZFS-based 7000-series storage appliances.[165]

  • 2013: Oracle shipped ZS3 series of ZFS-based filers and seized first place in the SPC-2 benchmark with one of them.[166]

  • 2013: iXsystems ships ZFS-based NAS devices called FreeNAS for SOHO and TrueNAS for the enterprise.[167]

  • 2014: Netgear ships a line of ZFS-based NAS devices called ReadyDATA, designed to be used in the enterprise.[168]

  • 2015: rsync.net announces a cloud storage platform that allows customers to provision their own zpool and import and export data using zfs send and zfs receive.[169][170]

Detailed release history

With ZFS in Oracle Solaris: as new features are introduced, the version numbers of the pool and file system are incremented to designate the format and features available. Features that are available in specific file system versions require a specific pool version.[171][172]

Distributed development of OpenZFS involves feature flags[90] and pool version 5000, an unchanging number that is expected to never conflict with version numbers given by Oracle. Legacy version numbers still exist for pool versions 1–28, implied by the version 5000.[173] Illumos uses pool version 5000 for this purpose.[174][175] Future on-disk format changes are enabled / disabled independently via feature flags.

Legend:
Old release
Latest FOSS stable release
Latest Proprietary stable release
Latest Proprietary beta release
ZFS Filesystem Version NumberRelease dateSignificant changes
1OpenSolaris Nevada[176] build 36First release
2OpenSolaris Nevada b69Enhanced directory entries. In particular, directory entries now store the object type. For example, file, directory, named pipe, and so on, in addition to the object number.
3OpenSolaris Nevada b77Support for sharing ZFS file systems over SMB. Case insensitivity support. System attribute support. Integrated anti-virus support.
4OpenSolaris Nevada b114Properties: userquota, groupquota, userused and groupused
5OpenSolaris Nevada b137System attributes; symlinks now their own object type
6Solaris 11.1Multilevel file system support
ZFS Pool Version NumberRelease dateSignificant changes
1OpenSolaris Nevada[176] b36First release
2OpenSolaris Nevada b38Ditto Blocks
3OpenSolaris Nevada b42Hot spares, double-parity RAID-Z (raidz2), improved RAID-Z accounting
4OpenSolaris Nevada b62zpool history
5OpenSolaris Nevada b62gzip compression for ZFS datasets
6OpenSolaris Nevada b62"bootfs" pool property
7OpenSolaris Nevada b68ZIL: adds the capability to specify a separate Intent Log device or devices
8OpenSolaris Nevada b69ability to delegate zfs(1M) administrative tasks to ordinary users
9OpenSolaris Nevada b77CIFS server support, dataset quotas
10OpenSolaris Nevada b77Devices can be added to a storage pool as "cache devices"
11OpenSolaris Nevada b94Improved zpool scrub / resilver performance
12OpenSolaris Nevada b96Snapshot properties
13OpenSolaris Nevada b98Properties: usedbysnapshots, usedbychildren, usedbyrefreservation, and usedbydataset
14OpenSolaris Nevada b103passthrough-x aclinherit property support
15OpenSolaris Nevada b114Properties: userquota, groupquota, usuerused and groupused; also required FS v4
16OpenSolaris Nevada b116STMF property support
17OpenSolaris Nevada b120triple-parity RAID-Z
18OpenSolaris Nevada b121ZFS snapshot holds
19OpenSolaris Nevada b125ZFS log device removal
20OpenSolaris Nevada b128zle compression algorithm that is needed to support the ZFS deduplication properties in ZFS pool version 21, which were released concurrently
21OpenSolaris Nevada b128Deduplication
22OpenSolaris Nevada b128zfs receive properties
23OpenSolaris Nevada b135slim ZIL
24OpenSolaris Nevada b137System attributes. Symlinks now their own object type. Also requires FS v5.
25OpenSolaris Nevada b140Improved pool scrubbing and resilvering statistics
26OpenSolaris Nevada b141Improved snapshot deletion performance
27OpenSolaris Nevada b145Improved snapshot creation performance (particularly recursive snapshots)
28OpenSolaris Nevada b147Multiple virtual device replacements
29Solaris Nevada b148RAID-Z/mirror hybrid allocator
30Solaris Nevada b149ZFS encryption
31Solaris Nevada b150Improved 'zfs list' performance
32Solaris Nevada b151One MB block support
33Solaris Nevada b163Improved share support
34Solaris 11.1 (0.5.11-0.175.1.0.0.24.2)Sharing with inheritance
35Solaris 11.2 (0.5.11-0.175.2.0.0.42.0)Sequential resilver
36Solaris 11.3Efficient log block allocation
37Solaris 11.3LZ4 compression
38Solaris 11.4xcopy with encryption
39Solaris 11.4reduce resilver restart
40Solaris 11.4Deduplication 2
41Solaris 11.4Asynchronous dataset destroy
42Solaris 11.4Reguid: ability to change the pool guid
43Solaris 11.4, Oracle ZFS Storage Simulator 8.7[177]RAID-Z improvements and cloud device support.[178]
44Solaris 11.4[179]Device removal
45Solaris 11.4 SRU 11[180]Lazy deadlists
46Solaris 11.4 SRU 12[181]Compact file metadata for encryption
5000OpenZFSUnchanging pool version to signify that the pool indicates new features after pool version 28 using ZFS feature flags rather than by incrementing the pool version

Note: The Solaris version under development by Sun since the release of Solaris 10 in 2005 was codenamed 'Nevada', and was derived from what was the OpenSolaris codebase. 'Solaris Nevada' is the codename for the next-generation Solaris OS to eventually succeed Solaris 10 and this new code was then pulled successively into new OpenSolaris 'Nevada' snapshot builds.[176] OpenSolaris is now discontinued and OpenIndiana forked from it.[182][183] A final build (b134) of OpenSolaris was published by Oracle (2010-Nov-12) as an upgrade path to Solaris 11 Express.**

See also

  • APFS – for Apple operating systems

  • Btrfs – for Linux

  • Comparison of file systems

  • ext4

  • HAMMER – a file system with a similar feature set for DragonFly BSD

  • LFS – BSD Log Structured Filesystem

  • List of file systems

  • LVM – Logical Volume Manager (Linux), supports snapshots

  • LZJB – data compression algorithm used in ZFS

  • NILFS – a Linux file system with checksumming (but not scrubbing), also supporting snapshots

  • ReFS – a Microsoft file system with built-in resiliency features

  • Reiser4

  • Sun Open Storage

  • Veritas File System and Veritas Volume Manager – similar to ZFS

  • Versioning file systems – List of versioning file systems

  • Write Anywhere File Layout – a similar file system by NetApp

References

[1]
Citation Linkopenlibrary.orgWhile RAID 7 is not a standard RAID level, it has been proposed as a catch-all term for any >3 parity RAID configuration Leventhal, Adam (December 17, 2009). "Triple-Parity RAID and Beyond". Queue. 7 (11): 30. doi:10.1145/1661785.1670144. Retrieved April 12, 2019.
Sep 20, 2019, 11:35 PM
[2]
Citation Linkdocs.oracle.com"What Is ZFS?". Oracle Solaris ZFS Administration Guide. Oracle. Retrieved December 29, 2015.
Sep 20, 2019, 11:35 PM
[3]
Citation Linkwww.oracle.com"What's new in Solaris 11 Express 2010.11" (PDF). Oracle. Retrieved November 17, 2010.
Sep 20, 2019, 11:35 PM
[4]
Citation Linkzfsonlinux.org"1.1 What about the licensing issue?". Retrieved November 18, 2010.
Sep 20, 2019, 11:35 PM
[5]
Citation Linkblogs.oracle.comJeff Bonwick (May 3, 2006). "You say zeta, I say zetta". Jeff Bonwick's Blog. Archived from the original on February 23, 2017. Retrieved April 21, 2017. So we finally decided to unpimp the name back to ZFS, which doesn't stand for anything.
Sep 20, 2019, 11:35 PM
[6]
Citation Linkwww.youtube.com"The Birth of ZFS". OpenZFS. Retrieved October 21, 2015.
Sep 20, 2019, 11:35 PM
[7]
Citation Linkwww.eweek.com"Sun's ZFS Creator to Quit Oracle and Join Startup". eWeek. Retrieved September 29, 2010.
Sep 20, 2019, 11:35 PM
[8]
Citation Linktsdrapi.uspto.gov"Status Information for Serial Number 85901629 (ZFS)". United States Patent and Trademark Office. Retrieved October 21, 2013.
Sep 20, 2019, 11:35 PM
[9]
Citation Linklwn.net"The OpenZFS project launches". LWN.net. September 17, 2013. Retrieved October 1, 2013.
Sep 20, 2019, 11:35 PM
[10]
Citation Linkopen-zfs.org"OpenZFS Announcement". OpenZFS. September 17, 2013. Retrieved September 19, 2013.
Sep 20, 2019, 11:35 PM
[11]
Citation Linkopen-zfs.orgopen-zfs.org/History "OpenZFS is the truly open source successor to the ZFS project [...] Effects of the fork (2010 to date)"
Sep 20, 2019, 11:35 PM
[12]
Citation Linkwww.infostor.comSean Michael Kerner (September 18, 2013). "LinuxCon: OpenZFS moves Open Source Storage Forward". infostor.com. Retrieved October 9, 2013.
Sep 20, 2019, 11:35 PM
[13]
Citation Linklwn.net"The OpenZFS project launches". LWN.net. September 17, 2013. Retrieved October 1, 2013.
Sep 20, 2019, 11:35 PM
[14]
Citation Linkwww.freebsdnews.net"OpenZFS – Communities co-operating on ZFS code and features". freebsdnews.net. September 23, 2013. Retrieved March 14, 2014.
Sep 20, 2019, 11:35 PM
[15]
Citation Linkwww.ixsystems.comIX systems Open ZFS vs. Btrfs | and other file systems Blog, 4 August 2017
Sep 20, 2019, 11:35 PM
[16]
Citation Linkwww.freebsd.org"19.4. zfs Administration". www.freebsd.org.
Sep 20, 2019, 11:35 PM
[17]
Citation Linkcalomel.org"ZFS Raidz Performance, Capacity and Integrity Comparison @ Calomel.org". calomel.org.
Sep 20, 2019, 11:35 PM
[18]
Citation Linkweb.archive.org"Explanation of ARC and L2ARC". ZFS Build. April 15, 2010. Archived from the original on February 5, 2019. Retrieved February 5, 2019.
Sep 20, 2019, 11:35 PM
[19]
Citation Linkweb.archive.orgGonzalez, Constantin (July 27, 2011). "ZFS: To Dedupe or not to Dedupe..." Constant Thinking. Archived from the original on February 5, 2019. Retrieved February 5, 2019.
Sep 20, 2019, 11:35 PM
[20]
Citation Linkmarc.infozfs-discuss list post by Jeff Bonwick, 2008-10-10
Sep 20, 2019, 11:35 PM