ZFS - 启用或禁用磁盘缓存？

Question 1

你绝对应该使能够磁盘缓存。

理由是 ZFS假设启用磁盘缓存，然后通过适当且特定的 SATA/SAS 命令（ATA FLUSH、FUA 等）刷新任何关键写入（即：同步写入和超级块重写）。

保持磁盘缓存启用状态可以利用现代磁盘的写入组合功能，而不会影响池的可靠性。

这显然假设您的磁盘确实遵守缓存刷新命令，这是现代（2006 年之后）磁盘的常态。如果您的磁盘在缓存刷新方面撒谎，那么您应该禁用它。

作为补充信息，我建议你阅读zfs_nocacheflush可调参数描述：

ZFS 使用屏障（易失性缓存刷新命令）来确保设备将数据提交到永久介质中。这可确保缓存易失的设备（例如 HDD）的介质状态一致。

Answer

你绝对应该使能够磁盘缓存。

理由是 ZFS假设启用磁盘缓存，然后通过适当且特定的 SATA/SAS 命令（ATA FLUSH、FUA 等）刷新任何关键写入（即：同步写入和超级块重写）。

保持磁盘缓存启用状态可以利用现代磁盘的写入组合功能，而不会影响池的可靠性。

这显然假设您的磁盘确实遵守缓存刷新命令，这是现代（2006 年之后）磁盘的常态。如果您的磁盘在缓存刷新方面撒谎，那么您应该禁用它。

作为补充信息，我建议你阅读zfs_nocacheflush可调参数描述：

ZFS 使用屏障（易失性缓存刷新命令）来确保设备将数据提交到永久介质中。这可确保缓存易失的设备（例如 HDD）的介质状态一致。

Question 2

如果你想的话，你可以这么做。这不会有什么大的区别。ZFS 利用一部分 RAM 进行写入缓存，并定期刷新到磁盘。

由于有 4 个磁盘，这听起来像是一个小型安装，因此请对两者进行基准测试，看看是否有明显的好处。

Answer

如果你想的话，你可以这么做。这不会有什么大的区别。ZFS 利用一部分 RAM 进行写入缓存，并定期刷新到磁盘。

由于有 4 个磁盘，这听起来像是一个小型安装，因此请对两者进行基准测试，看看是否有明显的好处。

Question 3

我有权同意，但我的设置可能不是最佳的。我的游泳池：

zdata                           2.82T   822G     73    412  40.0M  46.1M                         raidz1-0                      2.82T   822G     73    412  40.0M  46.1M                           wwn-0x50014ee0019b83a6          -      -     16    106  10.0M  11.5M                           wwn-0x50014ee2b3f6d328          -      -     20    102  10.0M  11.5M                           wwn-0x50014ee25ea101ef          -      -     18    105  10.0M  11.5M                           wwn-0x50014ee057084591          -      -     16     97  9.94M  11.5M                       logs                                -      -      -      -      -      -                         wwn-0x50000f0056424431-part5   132K   112M      0      0      0      0                       cache                               -      -      -      -      -      -                         wwn-0x50000f0056424431-part4  30.7G   270M      0      5  2.45K   517K                       ------------------------------  -----  -----  -----  -----  -----  -----

理由。这是一款基于 arch os 的专用 NAS，配有 Promise SATA2 控制器。由于带有 arch 的三星 SSD 仍有充足的空间，我决定将其用作日志和缓存设备，并将其添加到 ZFS 池中。考虑到 Promise 只是一个 PCI 设备，我预计 SSD 上的日志和缓存会提高性能。在日常使用中，我没有看到性能提升

Answer

我有权同意，但我的设置可能不是最佳的。我的游泳池：

zdata                           2.82T   822G     73    412  40.0M  46.1M                         raidz1-0                      2.82T   822G     73    412  40.0M  46.1M                           wwn-0x50014ee0019b83a6          -      -     16    106  10.0M  11.5M                           wwn-0x50014ee2b3f6d328          -      -     20    102  10.0M  11.5M                           wwn-0x50014ee25ea101ef          -      -     18    105  10.0M  11.5M                           wwn-0x50014ee057084591          -      -     16     97  9.94M  11.5M                       logs                                -      -      -      -      -      -                         wwn-0x50000f0056424431-part5   132K   112M      0      0      0      0                       cache                               -      -      -      -      -      -                         wwn-0x50000f0056424431-part4  30.7G   270M      0      5  2.45K   517K                       ------------------------------  -----  -----  -----  -----  -----  -----

理由。这是一款基于 arch os 的专用 NAS，配有 Promise SATA2 控制器。由于带有 arch 的三星 SSD 仍有充足的空间，我决定将其用作日志和缓存设备，并将其添加到 ZFS 池中。考虑到 Promise 只是一个 PCI 设备，我预计 SSD 上的日志和缓存会提高性能。在日常使用中，我没有看到性能提升

Question 4

为了提高随机写入IOPS，我们应该启用非易失性写入缓存（旋转磁盘）。
该功能需要软件支持。例如：ZFS。不适用于ext4或XFS。
在Linux / Solaris / FreeBSD上的ZFS中，ZFS社区建议通过SAS / SATA直接连接或通过IO扩展器通过scsi网络（SAS /光纤通道）连接。
硬件raid适配器将默认禁用raid模式下的所有写入缓存，并且可以为所有设备在JBOD模式下工作并启用非易失性写入缓存。

在 Linux 中，直接连接设备将显示这些日志，这取决于设备供应商的 SAS/SATA 设备固件。

[sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
or
[sdcl] Write cache: enabled, read cache: disabled, supports DPO and FUA

写回缓存控制.txt

but it means the operating
system needs to force data out to the non-volatile storage when it performs
a data integrity operation like fsync, sync or an unmount

Forced Unit Access
-----------------

The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the
filesystem and will make sure that I/O completion for this request is only
signaled after the data has been committed to non-volatile storage.

这是blk-flush.c中的Linux文档

 * If the device has writeback cache and supports FUA, REQ_PREFLUSH is
 * translated to PREFLUSH but REQ_FUA is passed down directly with DATA.
 *
 * If the device has writeback cache and doesn't support FUA, REQ_PREFLUSH
 * is translated to PREFLUSH and REQ_FUA to POSTFLUSH.

Linux 4.7 之前的代码

/**
 * blk_queue_flush - configure queue's cache flush capability
 * @q:          the request queue for the device
 * @flush:      0, REQ_FLUSH or REQ_FLUSH | REQ_FUA
 *
 * Tell block layer cache flush capability of @q.  If it supports
 * flushing, REQ_FLUSH should be set.  If it supports bypassing
 * write cache for individual writes, REQ_FUA should be set.
 */
void blk_queue_flush(struct request_queue *q, unsigned int flush)
{
        WARN_ON_ONCE(flush & ~(REQ_FLUSH | REQ_FUA));

        if (WARN_ON_ONCE(!(flush & REQ_FLUSH) && (flush & REQ_FUA)))
                flush &= ~REQ_FUA;

        q->flush_flags = flush & (REQ_FLUSH | REQ_FUA);
}

Linux 4.7 之后

/**
 * blk_queue_write_cache - configure queue's write cache
 * @q:          the request queue for the device
 * @wc:         write back cache on or off
 * @fua:        device supports FUA writes, if true
 *
 * Tell the block layer about the write cache of @q.
 */
void blk_queue_write_cache(struct request_queue *q, bool wc, bool fua)
{
        if (wc)
                blk_queue_flag_set(QUEUE_FLAG_WC, q);
        else
                blk_queue_flag_clear(QUEUE_FLAG_WC, q);
        if (fua)
                blk_queue_flag_set(QUEUE_FLAG_FUA, q);
        else
                blk_queue_flag_clear(QUEUE_FLAG_FUA, q);

        wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
}
EXPORT_SYMBOL_GPL(blk_queue_write_cache);

Openzfs 在linux上刷新数据

/*
 * 4.7 API,
 * The blk_queue_write_cache() interface has replaced blk_queue_flush()
 * interface.  However, the new interface is GPL-only thus we implement
 * our own trivial wrapper when the GPL-only version is detected.
 *
 * 2.6.36 - 4.6 API,
 * The blk_queue_flush() interface has replaced blk_queue_ordered()
 * interface.  However, while the old interface was available to all the
 * new one is GPL-only.   Thus if the GPL-only version is detected we
 * implement our own trivial helper.
 */

软件看起来已经足够了。
如果硬件（黑匣子）出现问题，软件也无法阻止。

祝你好运并备份所有重要数据。

Answer