diff --git a/Documentation/ch-checksumming.rst b/Documentation/ch-checksumming.rst index 5e47a6bfb..b7fde46fe 100644 --- a/Documentation/ch-checksumming.rst +++ b/Documentation/ch-checksumming.rst @@ -3,6 +3,24 @@ writing and verified after reading the blocks from devices. The whole metadata block has an inline checksum stored in the b-tree node header. Each data block has a detached checksum stored in the checksum tree. +.. note:: + Since a data checksum is calculated just before submitting to the block + device, btrfs has a strong requirement that the coresponding data block must + not be modified until the writeback is finished. + + This requirement is met for a buffered write as btrfs has the full control on + its page caches, but a direct write (``O_DIRECT``) bypasses page caches, and + btrfs can not control the direct IO buffer (as it can be in user space memory), + thus it's possible that a user space program modifies its direct write buffer + before the buffer is fully written back, and this can lead to a data checksum mismatch. + + To avoid such a checksum mismatch, since v6.14 btrfs will force a direct + write to fall back to a buffered one, if the inode requires a data checksum. + This will bring a small performance penalty, and if the end user requires true + zero-copy direct writes, they should set the ``NODATASUM`` flag for the inode + and make sure the direct IO buffer is fully aligned to btrfs block size. + + There are several checksum algorithms supported. The default and backward compatible algorithm is *crc32c*. Since kernel 5.5 there are three more with different characteristics and trade-offs regarding speed and strength. The following list