Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extreme load when deleting large file on full ZFS Filesystem #6823

Closed
FireDrunk opened this issue Nov 4, 2017 · 2 comments
Closed

Extreme load when deleting large file on full ZFS Filesystem #6823

FireDrunk opened this issue Nov 4, 2017 · 2 comments
Labels
Status: Stale No recent activity for issue

Comments

@FireDrunk
Copy link
Contributor

FireDrunk commented Nov 4, 2017

System information

Distribution Name | Fedora
Distribution Version | 25 (Minimal)
Linux Kernel | 4.12.13-200.fc25.x86_64
Architecture | x86_64
ZFS Version | 0.7.2-1.fc25 (DKMS)
SPL Version | 0.7.2-1.fc25 (DKMS)

System Specifications:
Xeon E5-2620v3 (6-core)
32GB ECC Registered RAM
Supermicro X8SRH Motherboard
SSD to boot Fedora from, unrelated
6 * HGST 4K7000 7200RPM Harddisks in RAIDZ2
Disks are attached to local SATA ports (not the onboard SAS ports)
Scrub has been done fairly recent (~40 days ago)

Describe the problem you're observing

Extreme load, unresponsive system (regarding IO).

I wrote a (sub)filesystem to 0b free doing a dd-backup of a large USB attached harddisk. The pool had >1.2T free, and the disk was ~500G, so I thought it shouldn't be a problem, because I also piped it through gzip.

Unfortunately the filesystem became full anyway. To free up some space I tried deleting the target file, which took quite a long time (multiple minutes, for ~275G).

During that time, the system load increased to > 60 and was barely responsive regarding IO. (the pool is not the root pool, my OS is on a separate SSD). The main pool consists of 6 * 4TB HGST 7200RPM harddrives configured in RAIDZ2.

I noticed the sluggishness because I was installing an RPM in the meantime, which took quite some more time (10's of seconds instead of a few seconds for a 51MB rpm). When I started top, I noticed the load. After a few minutes the extreme load subsided, but the delete job has been running for more than 20 minutes now.

The zpool list output:

archive  21.8T  20.9T   847G         -    45%    96%  2.30x  ONLINE  -

The zfs list output:

NAME                             USED  AVAIL  REFER  MOUNTPOINT
archive                         14.5T   207G   352K  /archive
archive/Backups                 1.44T   207G  1.44T  /archive/Backups
archive/CrashPlanBackups         320K   207G   192K  /archive/CrashPlanBackups
archive/Docker                  11.7G   207G  6.34G  /archive/Docker
archive/Docker/Storage           216K   207G   216K  /archive/Docker/Storage
archive/Documents               1.21G   207G  1.17G  /archive/Documents
archive/Games                   20.0G   207G  20.0G  /archive/Games
archive/Incoming                1.18T   207G  1.18T  /archive/Incoming
archive/Movies                  5.38T   207G  5.38T  /archive/Movies
archive/Music                   51.7G   207G  51.7G  /archive/Music
archive/Programs                 303G   207G   259G  /archive/Programs
archive/Scans                   17.2M   207G  17.0M  /archive/Scans
archive/Series                  5.98T   207G  5.98T  /archive/Series
archive/VM                      94.6G   207G  54.7G  /archive/VM
archive/git                      340M   207G   339M  /archive/git

Describe how to reproduce the problem

I've no idea whether this is reproducible on other systems, but I can reproduce it by filling up my filesystem and deleting the file :-)

Include any warning/errors/backtraces from the system logs

Top output snippet during the load > 60

  943 root       1 -19       0      0      0 S   5.0  0.0  38:33.67 z_wr_iss
    2 root      20   0       0      0      0 S   4.6  0.0  48:24.43 kthreadd
14944 root      20   0       0      0      0 D   3.6  0.0   0:00.63 kworker/u24:2
  776 root       0 -20       0      0      0 S   3.0  0.0  26:47.64 spl_dynamic_tas
  946 root       0 -20       0      0      0 S   2.0  0.0  19:34.87 z_wr_int_0
  947 root       0 -20       0      0      0 S   2.0  0.0  19:35.06 z_wr_int_1
  948 root       0 -20       0      0      0 S   2.0  0.0  19:35.84 z_wr_int_2
  949 root       0 -20       0      0      0 S   2.0  0.0  19:34.68 z_wr_int_3
  951 root       0 -20       0      0      0 S   2.0  0.0  19:35.51 z_wr_int_5
  952 root       0 -20       0      0      0 S   2.0  0.0  19:35.26 z_wr_int_6
  937 root       1 -19       0      0      0 S   1.7  0.0  38:34.74 z_wr_iss
  941 root       1 -19       0      0      0 S   1.7  0.0  38:35.65 z_wr_iss
  950 root       0 -20       0      0      0 S   1.7  0.0  19:35.21 z_wr_int_4
  953 root       0 -20       0      0      0 S   1.7  0.0  19:34.90 z_wr_int_7
  936 root       1 -19       0      0      0 S   1.3  0.0  38:35.36 z_wr_iss
  938 root       1 -19       0      0      0 S   1.3  0.0  38:37.74 z_wr_iss
  939 root       1 -19       0      0      0 S   1.3  0.0  38:37.94 z_wr_iss
  940 root       1 -19       0      0      0 S   1.3  0.0  38:38.68 z_wr_iss
  942 root       1 -19       0      0      0 S   1.3  0.0  38:35.38 z_wr_iss
  944 root       1 -19       0      0      0 S   1.3  0.0  38:38.28 z_wr_iss

I double checked the free space, because this is the third time I've written this pool to 0b free (stupid me, I know). The weird thing is, that the zpool had > 1.2T free. But the filesystem only could write < 300g ( I understand that the zpool space is without parity calculations, but the difference is much more).

Some caveats: The pool is 'rather' old, and has undergone some migrations.
And, it's 49% fragmented, since is rather heavily used :-)
Compression (LZ4) is enabled for all filesystems, dedup only for some specific filesystems. It IS enabled, for the target filesystem that was being written to.

@GregorKopka
Copy link
Contributor

Compression (LZ4) is enabled for all filesystems, dedup only for some specific filesystems. It IS enabled, for the target filesystem that was being written to.

You're seeing the delete penalty of dedup, the DDT have to be updated for every block that is free'd - which will take a while when:

  • having to release ~275G
  • running with the effective IOPS of one single disk (raidz one vdev)
  • especially should the DDT not fit into ARC completely and have to be re-read on a constant basis.

Bottom line: no bug but a case of (the downsides of) dedup.

As a side note: The FREE output of zpool list is completely meaningless (not accounting for redundancy and spa_slop_shift eating away space for unexplained reasons), always use zfs list -d0 in case you want to get an estimated estimation on how much additional stuff you might be able to pour into a pool.

@stale
Copy link

stale bot commented Aug 25, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 25, 2020
@stale stale bot closed this as completed Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue
Projects
None yet
Development

No branches or pull requests

2 participants