Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop caches never returned on ZFS backed Lustre OSS system. #1598

Closed
prakashsurya opened this issue Jul 18, 2013 · 0 comments
Closed

Drop caches never returned on ZFS backed Lustre OSS system. #1598

prakashsurya opened this issue Jul 18, 2013 · 0 comments
Milestone

Comments

@prakashsurya
Copy link
Member

I ran into a situation where issuing a request to drop caches never returned. The command:

# echo 3 > /proc/sys/vm/drop_caches                                         

I ended up crashing the node with sysrq, and I'll list some info from the crash dump below along with the "interesting" threads that I see in the resulting crash dump:

System information:

crash> sys                                                                  
      KERNEL: /tftpboot/images/lcraterz.x86_64/usr/lib/debug/lib/modules/2.6.32-358.6.1.3chaos.ch5.1.x86_64/vmlinux
    DUMPFILE: /tftpboot/dumps/vmcore-zwicky-lcz-oss1-2013-07-18-18:32:10  [PARTIAL DUMP]
        CPUS: 12                                                            
        DATE: Thu Jul 18 11:31:51 2013                                      
      UPTIME: 02:59:36                                                      
LOAD AVERAGE: 32.08, 32.46, 22.69                                           
       TASKS: 872                                                           
    NODENAME: zwicky-lcz-oss1                                               
     RELEASE: 2.6.32-358.6.1.3chaos.ch5.1.x86_64                            
     VERSION: #1 SMP Tue May 14 16:23:32 PDT 2013                           
     MACHINE: x86_64  (2800 Mhz)                                            
      MEMORY: 24 GB                                                         
       PANIC: "Oops: 0002 [#1] SMP " (check log for details)                

Running processes:

crash> ps | grep '>'                                                        
>   138      2   1  ffff880338ce3500  RU   0.0       0      0  [kswapd0]    
>   139      2  11  ffff880338ce2aa0  RU   0.0       0      0  [kswapd1]    
>  2698      1   7  ffff880339f4a040  RU   0.0   10928    692  irqbalance   
>  3198      1   3  ffff88063189c040  RU   0.0 1116820   3048  opensm       
>  4869   4816   0  ffff8802302d5500  RU   0.0    4104    788  iostat       
> 11000  10999   5  ffff88062fc02080  RU   0.0  108296   1944  bash         
> 11486      1   4  ffff8804b33e4080  RU   0.0   15420   2096  strings      
> 11487  11441   9  ffff8800be50d540  RU   0.0  108296    464  bash         
> 11489  11488  10  ffff88007f191540  RU   0.0   39212    800  crond        
> 12099      2   6  ffff88031a166aa0  RU   0.0       0      0  [ll_ost02_000]
> 12160      1   2  ffff88031a296040  RU   0.0   29776   5044  cerebrod     
> 12788      2   8  ffff8806380caae0  RU   0.0       0      0  [ll_ost02_012]

Stacks of "interesting" threads, along with some gdb source info:

PID: 138    TASK: ffff880338ce3500  CPU: 1   COMMAND: "kswapd0"             
 #0 [ffff880028227e90] crash_nmi_callback+0x46 at ffffffff8102d306          
 #1 [ffff880028227ea0] notifier_call_chain+0x55 at ffffffff81513b45         
 #2 [ffff880028227ee0] atomic_notifier_call_chain+0x1a at ffffffff81513baa  
 #3 [ffff880028227ef0] notify_die+0x2e at ffffffff8109cf5e                  
 #4 [ffff880028227f20] do_nmi+0x1bb at ffffffff8151180b                     
 #5 [ffff880028227f50] nmi+0x20 at ffffffff815110d0                         
    [exception RIP: __spl_kmem_cache_generic_shrinker+82]                   
    RIP: ffffffffa05a69c2  RSP: ffff880338ce5c60  RFLAGS: 00000286          
    RAX: ffff880614dc8070  RBX: ffff8805eda70000  RCX: 0000000000000034     
    RDX: 0000000000000010  RSI: ffff8805eda78040  RDI: ffff88034aa08c28     
    RBP: ffff880338ce5c80   R8: 8080000000000000   R9: f9f459fb29a01010     
    R10: 0000000000000000  R11: 0000000000000000  R12: 000000000051574e     
    R13: ffff880338ce5c90  R14: 00000000000000d0  R15: 0000000000013200     
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018                          
--- <NMI exception stack> ---                                               
 #6 [ffff880338ce5c60] __spl_kmem_cache_generic_shrinker+0x52 at ffffffffa05a69c2 [spl]
 #7 [ffff880338ce5c88] spl_kmem_cache_generic_shrinker+0x20 at ffffffffa05a6a70 [spl]
 #8 [ffff880338ce5ca8] shrink_slab+0x12a at ffffffff81131a9a                
 #9 [ffff880338ce5d08] balance_pgdat+0x59a at ffffffff81134c8a              
#10 [ffff880338ce5e28] kswapd+0x134 at ffffffff81135044                     
#11 [ffff880338ce5ee8] kthread+0x96 at ffffffff81096c76                     
#12 [ffff880338ce5f48] child_rip+0xa at ffffffff8100c0ca                    

(gdb) l *__spl_kmem_cache_generic_shrinker+0x52                             
0x79f2 is in __spl_kmem_cache_generic_shrinker (/usr/src/debug/spl-kmod-0.6.1/spl-0.6.1/module/spl/spl-kmem.c:2121).
2116                     * Presume everything alloc'ed in reclaimable, this ensures
2117                     * we are called again with nr_to_scan > 0 so can try and
2118                     * reclaim.  The exact number is not important either so
2119                     * we forgo taking this already highly contented lock.
2120                     */                                                 
2121                    unused += skc->skc_obj_alloc;                       
2122            }                                                           
2123            up_read(&spl_kmem_cache_sem);                               
2124                                                                        
2125            return (unused * sysctl_vfs_cache_pressure) / 100;          

PID: 139    TASK: ffff880338ce2aa0  CPU: 11  COMMAND: "kswapd1"             
 #0 [ffff88034aca7e90] crash_nmi_callback+0x46 at ffffffff8102d306          
 #1 [ffff88034aca7ea0] notifier_call_chain+0x55 at ffffffff81513b45         
 #2 [ffff88034aca7ee0] atomic_notifier_call_chain+0x1a at ffffffff81513baa  
 #3 [ffff88034aca7ef0] notify_die+0x2e at ffffffff8109cf5e                  
 #4 [ffff88034aca7f20] do_nmi+0x1bb at ffffffff8151180b                     
 #5 [ffff88034aca7f50] nmi+0x20 at ffffffff815110d0                         
    [exception RIP: spl_kmem_cache_reap_now+330]                            
    RIP: ffffffffa05a476a  RSP: ffff880338ce9c00  RFLAGS: 00000246          
    RAX: ffff880338ce9bb0  RBX: ffff8806270b0000  RCX: 0000000000000005     
    RDX: 0000000000008b07  RSI: ffff880338ce9ba0  RDI: ffff8806270b8090     
    RBP: ffff880338ce9c50   R8: 8080000000000000   R9: f9f441fb2fa01010     
    R10: 0000000000000000  R11: 0000000000000001  R12: 0000000000000004     
    R13: ffff880338ce9c90  R14: ffff8806270b8040  R15: 0000000000013200     
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018                          
--- <NMI exception stack> ---                                               
 #6 [ffff880338ce9c00] spl_kmem_cache_reap_now+0x14a at ffffffffa05a476a [spl]
 #7 [ffff880338ce9c58] __spl_kmem_cache_generic_shrinker+0x4b at ffffffffa05a69bb [spl]
 #8 [ffff880338ce9c88] spl_kmem_cache_generic_shrinker+0x20 at ffffffffa05a6a70 [spl]
 #9 [ffff880338ce9ca8] shrink_slab+0x12a at ffffffff81131a9a                
#10 [ffff880338ce9d08] balance_pgdat+0x59a at ffffffff81134c8a              
#11 [ffff880338ce9e28] kswapd+0x134 at ffffffff81135044                     
#12 [ffff880338ce9ee8] kthread+0x96 at ffffffff81096c76                     
#13 [ffff880338ce9f48] child_rip+0xa at ffffffff8100c0ca                    

(gdb) l *spl_kmem_cache_reap_now+0x14a                                      
0x579a is in spl_kmem_cache_reap_now (/usr/src/debug/spl-kmod-0.6.1/spl-0.6.1/module/spl/spl-kmem.c:2201).
2196            }                                                           
2197                                                                        
2198            spl_slab_reclaim(skc, count, 1);                            
2199            clear_bit(KMC_BIT_REAPING, &skc->skc_flags);                
2200            smp_mb__after_clear_bit();                                  
2201            wake_up_bit(&skc->skc_flags, KMC_BIT_REAPING);              
2202                                                                        
2203            atomic_dec(&skc->skc_ref);                                  
2204                                                                        
2205            SEXIT;                      

PID: 2698   TASK: ffff880339f4a040  CPU: 7   COMMAND: "irqbalance"          
 #0 [ffff88034ac27e90] crash_nmi_callback+0x46 at ffffffff8102d306          
 #1 [ffff88034ac27ea0] notifier_call_chain+0x55 at ffffffff81513b45         
 #2 [ffff88034ac27ee0] atomic_notifier_call_chain+0x1a at ffffffff81513baa  
 #3 [ffff88034ac27ef0] notify_die+0x2e at ffffffff8109cf5e                  
 #4 [ffff88034ac27f20] do_nmi+0x1bb at ffffffff8151180b                     
 #5 [ffff88034ac27f50] nmi+0x20 at ffffffff815110d0                         
    [exception RIP: spl_slab_reclaim+564]                                   
    RIP: ffffffffa05a4474  RSP: ffff8803241ad648  RFLAGS: 00000202          
    RAX: ffff8805edad80a8  RBX: ffff8805edad0000  RCX: 0000000000000005     
    RDX: 000000000000df27  RSI: 0000000000000004  RDI: ffff8805edad8090     
    RBP: ffff8803241ad6e8   R8: 8080000000000000   R9: f9f18cfbdce01010     
    R10: 0000000000000000  R11: 0000000000000001  R12: ffff8805edad8080     
    R13: 0000000000000000  R14: ffff8805edad8090  R15: 0000000000013200     
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018                          
--- <NMI exception stack> ---                                               
 #6 [ffff8803241ad648] spl_slab_reclaim+0x234 at ffffffffa05a4474 [spl]     
 #7 [ffff8803241ad6f0] spl_kmem_cache_reap_now+0x144 at ffffffffa05a4764 [spl]
 #8 [ffff8803241ad750] __spl_kmem_cache_generic_shrinker+0x4b at ffffffffa05a69bb [spl]
 #9 [ffff8803241ad780] spl_kmem_cache_generic_shrinker+0x20 at ffffffffa05a6a70 [spl]
#10 [ffff8803241ad7a0] shrink_slab+0x12a at ffffffff81131a9a                
#11 [ffff8803241ad800] do_try_to_free_pages+0x3f7 at ffffffff81133e77       
#12 [ffff8803241ad8a0] try_to_free_pages+0x92 at ffffffff81134262           
#13 [ffff8803241ad940] __alloc_pages_nodemask+0x478 at ffffffff8112bab8     
#14 [ffff8803241ada80] alloc_pages_vma+0x9a at ffffffff8116064a             
#15 [ffff8803241adad0] handle_pte_fault+0x76b at ffffffff8114396b           
#16 [ffff8803241adbb0] handle_mm_fault+0x23a at ffffffff81143f8a            
#17 [ffff8803241adc20] __do_page_fault+0x139 at ffffffff810474c9            
#18 [ffff8803241add40] do_page_fault+0x3e at ffffffff81513a8e               
#19 [ffff8803241add70] page_fault+0x25 at ffffffff81510e45                  
    [exception RIP: copy_user_generic_string+50]                            
    RIP: ffffffff81282442  RSP: ffff8803241ade20  RFLAGS: 00010297          
    RAX: ffff8803241ac000  RBX: ffff88063a1ef800  RCX: 0000000000000007     
    RDX: 0000000000000007  RSI: ffff880193649000  RDI: 00007ffff7ff0000     
    RBP: ffff8803241ade98   R8: 0000000000000020   R9: 0000000000000005     
    R10: 0000000000000000  R11: 0000000000000002  R12: ffff880497556440     
    R13: 0000000000000007  R14: 0000000000000007  R15: ffff8803241ade58     
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018                          
#20 [ffff8803241ade20] seq_read+0x2d2 at ffffffff811a4e32                   
#21 [ffff8803241adea0] proc_reg_read+0x7e at ffffffff811e974e               
#22 [ffff8803241adef0] vfs_read+0xb5 at ffffffff81181395                    
#23 [ffff8803241adf30] sys_read+0x51 at ffffffff811814d1                    
#24 [ffff8803241adf80] system_call_fastpath+0x16 at ffffffff8100b072        
    RIP: 00007ffff748a5f0  RSP: 00007fffffffbbe0  RFLAGS: 00010206          
    RAX: 0000000000000000  RBX: ffffffff8100b072  RCX: 00007ffff7ff0008     
    RDX: 0000000000000400  RSI: 00007ffff7ff0000  RDI: 0000000000000003     
    RBP: ffffffffffffffff   R8: 00000000ffffffff   R9: 0000000000000000     
    R10: 0000000000000022  R11: 0000000000000246  R12: 0000000000000000     
    R13: 00007fffffffca18  R14: 00007ffff820cb70  R15: 0000000000000000     
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b                          

(gdb) l *spl_slab_reclaim+0x234                                             
0x54a4 is in spl_slab_reclaim (/usr/src/debug/spl-kmod-0.6.1/spl-0.6.1/module/spl/spl-kmem.c:1104).
1099             * are freed.  This is all done outside the skc->skc_lock since
1100             * this allows the destructor to sleep, and allows us to perform
1101             * a conditional reschedule when a freeing a large number of
1102             * objects and slabs back to the system.                    
1103             */                                                         
1104            if (skc->skc_flags & KMC_OFFSLAB)                           
1105                    size = spl_offslab_size(skc);                       
1106                                                                        
1107            list_for_each_entry_safe(sko, n, &sko_list, sko_list) {     
1108                    ASSERT(sko->sko_magic == SKO_MAGIC);         

PID: 3198   TASK: ffff88063189c040  CPU: 3   COMMAND: "opensm"              
 #0 [ffff880028267e90] crash_nmi_callback+0x46 at ffffffff8102d306          
 #1 [ffff880028267ea0] notifier_call_chain+0x55 at ffffffff81513b45         
 #2 [ffff880028267ee0] atomic_notifier_call_chain+0x1a at ffffffff81513baa  
 #3 [ffff880028267ef0] notify_die+0x2e at ffffffff8109cf5e                  
 #4 [ffff880028267f20] do_nmi+0x1bb at ffffffff8151180b                     
 #5 [ffff880028267f50] nmi+0x20 at ffffffff815110d0                         
    [exception RIP: __spl_kmem_cache_generic_shrinker+82]                   
    RIP: ffffffffa05a69c2  RSP: ffff88063208b6b8  RFLAGS: 00000246          
    RAX: ffff880619b78070  RBX: ffff880626b50000  RCX: 0000000000000005     
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: ffffffffa05d4200     
    RBP: ffff88063208b6d8   R8: 8080000000000000   R9: f9ccec0505201010     
    R10: 0000000000000000  R11: 0000000000000001  R12: 00000000004cbdcc     
    R13: ffff88063208b6e8  R14: 00000000000200da  R15: 0000000000013200     
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018                          
--- <NMI exception stack> ---                                               
 #6 [ffff88063208b6b8] __spl_kmem_cache_generic_shrinker+0x52 at ffffffffa05a69c2 [spl]
 #7 [ffff88063208b6e0] spl_kmem_cache_generic_shrinker+0x20 at ffffffffa05a6a70 [spl]
 #8 [ffff88063208b700] shrink_slab+0x11a at ffffffff81131a8a                
 #9 [ffff88063208b760] do_try_to_free_pages+0x3f7 at ffffffff81133e77       
#10 [ffff88063208b800] try_to_free_pages+0x92 at ffffffff81134262           
#11 [ffff88063208b8a0] __alloc_pages_nodemask+0x478 at ffffffff8112bab8     
#12 [ffff88063208b9e0] alloc_pages_vma+0x9a at ffffffff8116064a             
#13 [ffff88063208ba30] shmem_alloc_page+0x55 at ffffffff81135eb5            
#14 [ffff88063208bb20] shmem_getpage_gfp+0x2a7 at ffffffff81137cc7          
#15 [ffff88063208bbd0] shmem_write_begin+0x38 at ffffffff811380b8           
#16 [ffff88063208bbe0] generic_file_buffered_write+0x123 at ffffffff8111a2e3
#17 [ffff88063208bcb0] __generic_file_aio_write+0x260 at ffffffff8111bd50   
#18 [ffff88063208bd70] generic_file_aio_write+0x88 at ffffffff8111c008      
#19 [ffff88063208bdc0] do_sync_write+0xfa at ffffffff8118096a               
#20 [ffff88063208bef0] vfs_write+0xb8 at ffffffff81180c68                   
#21 [ffff88063208bf30] sys_write+0x51 at ffffffff81181561                   
#22 [ffff88063208bf80] system_call_fastpath+0x16 at ffffffff8100b072        
    RIP: 00007ffff6cc966d  RSP: 00007fffedbaf3a0  RFLAGS: 00010202          
    RAX: 0000000000000001  RBX: ffffffff8100b072  RCX: 00007fffdc001f90     
    RDX: 0000000000000088  RSI: 00007fffb3fff000  RDI: 0000000000000007     
    RBP: 00007fffb3fff000   R8: 00007fffedbb0700   R9: 00000000006d6660     
    R10: 0000000000000000  R11: 0000000000000293  R12: 0000000000000088     
    R13: 00007fffdc002070  R14: 0000000000000088  R15: 00007fffdc001f80     
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b                          

(gdb) l *__spl_kmem_cache_generic_shrinker+0x52                             
0x79f2 is in __spl_kmem_cache_generic_shrinker (/usr/src/debug/spl-kmod-0.6.1/spl-0.6.1/module/spl/spl-kmem.c:2121).
2116                     * Presume everything alloc'ed in reclaimable, this ensures
2117                     * we are called again with nr_to_scan > 0 so can try and
2118                     * reclaim.  The exact number is not important either so
2119                     * we forgo taking this already highly contented lock.
2120                     */                                                 
2121                    unused += skc->skc_obj_alloc;                       
2122            }                                                           
2123            up_read(&spl_kmem_cache_sem);                               
2124                                                                        
2125            return (unused * sysctl_vfs_cache_pressure) / 100;          

PID: 4869   TASK: ffff8802302d5500  CPU: 0   COMMAND: "iostat"              
 #0 [ffff8800282039a0] machine_kexec+0x18b at ffffffff81035bfb              
 #1 [ffff880028203a00] crash_kexec+0x72 at ffffffff810c0932                 
 #2 [ffff880028203ad0] oops_end+0xc0 at ffffffff81511b40                    
 #3 [ffff880028203b00] no_context+0xfb at ffffffff81046bfb                  
 #4 [ffff880028203b50] __bad_area_nosemaphore+0x125 at ffffffff81046e85     
 #5 [ffff880028203ba0] bad_area_nosemaphore+0x13 at ffffffff81046f53        
 #6 [ffff880028203bb0] __do_page_fault+0x321 at ffffffff810476b1            
 #7 [ffff880028203cd0] do_page_fault+0x3e at ffffffff81513a8e               
 #8 [ffff880028203d00] page_fault+0x25 at ffffffff81510e45                  
    [exception RIP: sysrq_handle_crash+22]                                  
    RIP: ffffffff8133dcf6  RSP: ffff880028203db8  RFLAGS: 00010096          
    RAX: 0000000000000010  RBX: 0000000000000063  RCX: 0000000000001efb     
    RDX: 0000000000000000  RSI: ffff88063acd6800  RDI: 0000000000000063     
    RBP: ffff880028203db8   R8: 0000000000000000   R9: ffffffff8163fdc0     
    R10: 0000000000000001  R11: 0000000000000000  R12: ffff88063acd6800     
    R13: ffffffff81afffc0  R14: 0000000000000082  R15: 0000000000000008     
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018                          
 #9 [ffff880028203dc0] __handle_sysrq+0x132 at ffffffff8133dfb2             
#10 [ffff880028203e10] handle_sysrq+0x2b at ffffffff8133e09b                
#11 [ffff880028203e20] serial8250_handle_port+0x218 at ffffffff81358d18     
#12 [ffff880028203e80] serial8250_interrupt+0x8b at ffffffff81358eab        
#13 [ffff880028203ed0] handle_IRQ_event+0x60 at ffffffff810e12b0            
#14 [ffff880028203f20] handle_edge_irq+0xde at ffffffff810e39fe             
#15 [ffff880028203f60] handle_irq+0x49 at ffffffff8100de89                  
#16 [ffff880028203f80] do_IRQ+0x6c at ffffffff815175fc                      
--- <IRQ stack> ---                                                         
#17 [ffff88018d1d3598] ret_from_intr at ffffffff8100b9d3                    
    [exception RIP: spl_slab_reclaim+564]                                   
    RIP: ffffffffa05a4474  RSP: ffff88018d1d3648  RFLAGS: 00000202          
    RAX: ffff8805f2ef80a8  RBX: ffff88018d1d36e8  RCX: 0000000000000005     
    RDX: 000000000000f866  RSI: 0000000000000004  RDI: ffff8805f2ef8090     
    RBP: ffffffff8100b9ce   R8: 8080000000000000   R9: f9c17a07e1a01010     
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff88018d1d3638     
    R13: ffffffff8100bb8e  R14: ffff8802302d5500  R15: ffff88011127aae8     
    ORIG_RAX: ffffffffffffffcb  CS: 0010  SS: 0018                          
#18 [ffff88018d1d36f0] spl_kmem_cache_reap_now+0x144 at ffffffffa05a4764 [spl]
#19 [ffff88018d1d3750] __spl_kmem_cache_generic_shrinker+0x4b at ffffffffa05a69bb [spl]
#20 [ffff88018d1d3780] spl_kmem_cache_generic_shrinker+0x20 at ffffffffa05a6a70 [spl]
#21 [ffff88018d1d37a0] shrink_slab+0x12a at ffffffff81131a9a                
#22 [ffff88018d1d3800] do_try_to_free_pages+0x3f7 at ffffffff81133e77       
#23 [ffff88018d1d38a0] try_to_free_pages+0x92 at ffffffff81134262           
#24 [ffff88018d1d3940] __alloc_pages_nodemask+0x478 at ffffffff8112bab8     
#25 [ffff88018d1d3a80] alloc_pages_vma+0x9a at ffffffff8116064a             
#26 [ffff88018d1d3ad0] handle_pte_fault+0x76b at ffffffff8114396b           
#27 [ffff88018d1d3bb0] handle_mm_fault+0x23a at ffffffff81143f8a            
#28 [ffff88018d1d3c20] __do_page_fault+0x139 at ffffffff810474c9            
#29 [ffff88018d1d3d40] do_page_fault+0x3e at ffffffff81513a8e               
#30 [ffff88018d1d3d70] page_fault+0x25 at ffffffff81510e45                  
    [exception RIP: copy_user_generic_string+45]                            
    RIP: ffffffff8128243d  RSP: ffff88018d1d3e20  RFLAGS: 00010202          
    RAX: ffff88018d1d2000  RBX: ffff88049ebf5300  RCX: 0000000000000002     
    RDX: 0000000000000002  RSI: ffff8804d8c60000  RDI: 00002aaaaaacd000     
    RBP: ffff88018d1d3e98   R8: 00000000fffffffe   R9: 0000000000000000     
    R10: 0000000000000001  R11: 0000000000000246  R12: ffff8803f2dc3540     
    R13: 0000000000000012  R14: 0000000000000012  R15: ffff88018d1d3e58     
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018                          
#31 [ffff88018d1d3e20] seq_read+0x2d2 at ffffffff811a4e32                   
#32 [ffff88018d1d3ea0] proc_reg_read+0x7e at ffffffff811e974e               
#33 [ffff88018d1d3ef0] vfs_read+0xb5 at ffffffff81181395                    
#34 [ffff88018d1d3f30] sys_read+0x51 at ffffffff811814d1                    
#35 [ffff88018d1d3f80] system_call_fastpath+0x16 at ffffffff8100b072        
    RIP: 00002aaaaada85f0  RSP: 00007fffffffe0f8  RFLAGS: 00000246          
    RAX: 0000000000000000  RBX: ffffffff8100b072  RCX: ffffffffffffffff     
    RDX: 0000000000000400  RSI: 00002aaaaaacd000  RDI: 0000000000000003     
    RBP: 000000000000007f   R8: 00000000ffffffff   R9: 0000000000000000     
    R10: 0000000000000022  R11: 0000000000000246  R12: 0000000000000000     
    R13: 000000000000000a  R14: 00000000006142e0  R15: 0000000000000000     
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b                          

(gdb) l *spl_kmem_cache_reap_now+0x144                                      
0x5794 is in spl_kmem_cache_reap_now (/usr/src/kernels/2.6.32-358.6.1.3chaos.ch5.1.x86_64/arch/x86/include/asm/bitops.h:103).
98       */                                                                 
99      static __always_inline void                                         
100     clear_bit(int nr, volatile unsigned long *addr)                     
101     {                                                                   
102             if (IS_IMMEDIATE(nr)) {                                     
103                     asm volatile(LOCK_PREFIX "andb %1,%0"               
104                             : CONST_MASK_ADDR(nr, addr)                 
105                             : "iq" ((u8)~CONST_MASK(nr)));              
106             } else {                                                    
107                     asm volatile(LOCK_PREFIX "btr %1,%0"                

PID: 11000  TASK: ffff88062fc02080  CPU: 5   COMMAND: "bash"                
 #0 [ffff8800282a7e90] crash_nmi_callback+0x46 at ffffffff8102d306          
 #1 [ffff8800282a7ea0] notifier_call_chain+0x55 at ffffffff81513b45         
 #2 [ffff8800282a7ee0] atomic_notifier_call_chain+0x1a at ffffffff81513baa  
 #3 [ffff8800282a7ef0] notify_die+0x2e at ffffffff8109cf5e                  
 #4 [ffff8800282a7f20] do_nmi+0x1bb at ffffffff8151180b                     
 #5 [ffff8800282a7f50] nmi+0x20 at ffffffff815110d0                         
    [exception RIP: spl_kmem_cache_reap_now+350]                            
    RIP: ffffffffa05a477e  RSP: ffff8806213a5d38  RFLAGS: 00000246          
    RAX: ffff88034aa08c30  RBX: ffff8805efc10000  RCX: 0000000000000034     
    RDX: 0000000000000010  RSI: ffff8805efc18040  RDI: ffff88034aa08c28     
    RBP: ffff8806213a5d88   R8: 8080000000000000   R9: f9f23ffbb0201010     
    R10: 0000000000000000  R11: 0000000000000001  R12: 0000000000000004     
    R13: ffff8806213a5dc8  R14: ffff8805efc18040  R15: 0000000000013200     
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018                          
--- <NMI exception stack> ---                                               
 #6 [ffff8806213a5d38] spl_kmem_cache_reap_now+0x15e at ffffffffa05a477e [spl]
 #7 [ffff8806213a5d90] __spl_kmem_cache_generic_shrinker+0x4b at ffffffffa05a69bb [spl]
 #8 [ffff8806213a5dc0] spl_kmem_cache_generic_shrinker+0x20 at ffffffffa05a6a70 [spl]
 #9 [ffff8806213a5de0] shrink_slab+0x12a at ffffffff81131a9a                
#10 [ffff8806213a5e40] drop_caches_sysctl_handler+0x1d3 at ffffffff811ae123 
#11 [ffff8806213a5e80] proc_sys_call_handler+0x97 at ffffffff811f3177       
#12 [ffff8806213a5ee0] proc_sys_write+0x14 at ffffffff811f31c4              
#13 [ffff8806213a5ef0] vfs_write+0xb8 at ffffffff81180c68                   
#14 [ffff8806213a5f30] sys_write+0x51 at ffffffff81181561                   
#15 [ffff8806213a5f80] system_call_fastpath+0x16 at ffffffff8100b072        
    RIP: 00002aaaab1ce650  RSP: 00007fffffffe210  RFLAGS: 00010202          
    RAX: 0000000000000001  RBX: ffffffff8100b072  RCX: 0000000000000033     
    RDX: 0000000000000002  RSI: 00002aaab152d000  RDI: 0000000000000001     
    RBP: 00002aaab152d000   R8: 000000000000000a   R9: 00002aaaab486700     
    R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000002     
    R13: 00002aaaab480780  R14: 0000000000000002  R15: 00000000006ea9c0     
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b                          

(gdb) l *spl_kmem_cache_reap_now+0x15e                                      
0x57ae is in spl_kmem_cache_reap_now (/usr/src/debug/spl-kmod-0.6.1/spl-0.6.1/module/spl/spl-kmem.c:2205).
2200            smp_mb__after_clear_bit();                                  
2201            wake_up_bit(&skc->skc_flags, KMC_BIT_REAPING);              
2202                                                                        
2203            atomic_dec(&skc->skc_ref);                                  
2204                                                                        
2205            SEXIT;                                                      
2206    }                                                                   
2207    EXPORT_SYMBOL(spl_kmem_cache_reap_now);                             
2208                                                                        
2209    /*                                                                  

PID: 11486  TASK: ffff8804b33e4080  CPU: 4   COMMAND: "strings"             
 #0 [ffff880028287e90] crash_nmi_callback+0x46 at ffffffff8102d306          
 #1 [ffff880028287ea0] notifier_call_chain+0x55 at ffffffff81513b45         
 #2 [ffff880028287ee0] atomic_notifier_call_chain+0x1a at ffffffff81513baa  
 #3 [ffff880028287ef0] notify_die+0x2e at ffffffff8109cf5e                  
 #4 [ffff880028287f20] do_nmi+0x1bb at ffffffff8151180b                     
 #5 [ffff880028287f50] nmi+0x20 at ffffffff815110d0                         
    [exception RIP: spl_kmem_cache_reap_now+119]                            
    RIP: ffffffffa05a4697  RSP: ffff880555897758  RFLAGS: 00000202          
    RAX: 0000000000000000  RBX: ffff8805f2630000  RCX: 0000000000000005     
    RDX: 0000000000000010  RSI: 0000000000000004  RDI: ffff8805f2630000     
    RBP: ffff8805558977a8   R8: 8080000000000000   R9: f9f441fb2fa01010     
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000004     
    R13: ffff8805558977e8  R14: ffff8805f2638040  R15: 0000000000013200     
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018                          
--- <NMI exception stack> ---                                               
 #6 [ffff880555897758] spl_kmem_cache_reap_now+0x77 at ffffffffa05a4697 [spl]
 #7 [ffff8805558977b0] __spl_kmem_cache_generic_shrinker+0x4b at ffffffffa05a69bb [spl]
 #8 [ffff8805558977e0] spl_kmem_cache_generic_shrinker+0x20 at ffffffffa05a6a70 [spl]
 #9 [ffff880555897800] shrink_slab+0x12a at ffffffff81131a9a                
#10 [ffff880555897860] do_try_to_free_pages+0x3f7 at ffffffff81133e77       
#11 [ffff880555897900] try_to_free_pages+0x92 at ffffffff81134262           
#12 [ffff8805558979a0] __alloc_pages_nodemask+0x478 at ffffffff8112bab8     
#13 [ffff880555897ae0] alloc_pages_current+0xaa at ffffffff8116054a         
#14 [ffff880555897b10] __page_cache_alloc+0x87 at ffffffff81119d67          
#15 [ffff880555897b40] __do_page_cache_readahead+0xdb at ffffffff8112e48b   
#16 [ffff880555897bd0] ra_submit+0x21 at ffffffff8112e5e1                   
#17 [ffff880555897be0] ondemand_readahead+0x115 at ffffffff8112e955         
#18 [ffff880555897c40] page_cache_async_readahead+0x90 at ffffffff8112eb10  
#19 [ffff880555897c90] generic_file_aio_read+0x503 at ffffffff8111b643      
#20 [ffff880555897d70] nfs_file_read+0xca at ffffffffa02c75ca [nfs]         
#21 [ffff880555897dc0] do_sync_read+0xfa at ffffffff81180aaa                
#22 [ffff880555897ef0] vfs_read+0xb5 at ffffffff81181395                    
#23 [ffff880555897f30] sys_read+0x51 at ffffffff811814d1                    
#24 [ffff880555897f80] system_call_fastpath+0x16 at ffffffff8100b072        
    RIP: 00002aaaab2a55f0  RSP: 00007fffffffe090  RFLAGS: 00010202          
    RAX: 0000000000000000  RBX: ffffffff8100b072  RCX: 00002aaaab2af45a     
    RDX: 0000000000500000  RSI: 00002aaaab55f010  RDI: 0000000000000004     
    RBP: 0000000000000000   R8: 00002aaaab55f010   R9: 00002aaaab55d700     
    R10: 00002aaaab55d700  R11: 0000000000000246  R12: 00002aaaab55f010     
    R13: 0000000000519b05  R14: 00000000006091b0  R15: 0000000000519b10     
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b                          

(gdb) l *spl_kmem_cache_reap_now+0x77                                       
0x56c7 is in spl_kmem_cache_reap_now (/usr/src/debug/spl-kmod-0.6.1/spl-0.6.1/module/spl/spl-kmem.c:2167).
2162             * a reclaim function the cache will be skipped to avoid deadlock.
2163             *                                                          
2164             * Longer term this would be the correct place to add the code which
2165             * repacks the slabs in order minimize fragmentation.       
2166             */                                                         
2167            if (skc->skc_reclaim) {                                     
2168                    uint64_t objects = UINT64_MAX;                      
2169                    int do_reclaim;                                     
2170                                                                        
2171                    do {                                                

PID: 11487  TASK: ffff8800be50d540  CPU: 9   COMMAND: "bash"                
 #0 [ffff88034ac67e90] crash_nmi_callback+0x46 at ffffffff8102d306          
 #1 [ffff88034ac67ea0] notifier_call_chain+0x55 at ffffffff81513b45         
 #2 [ffff88034ac67ee0] atomic_notifier_call_chain+0x1a at ffffffff81513baa  
 #3 [ffff88034ac67ef0] notify_die+0x2e at ffffffff8109cf5e                  
 #4 [ffff88034ac67f20] do_nmi+0x1bb at ffffffff8151180b                     
 #5 [ffff88034ac67f50] nmi+0x20 at ffffffff815110d0                         
    [exception RIP: spl_kmem_cache_reap_now+66]                             
    RIP: ffffffffa05a4662  RSP: ffff880082ba3838  RFLAGS: 00000247          
    RAX: 0000000000000004  RBX: ffff88062fa70000  RCX: 0000000000000005     
    RDX: 0000000000000000  RSI: 0000000000000004  RDI: ffff88062fa70000     
    RBP: ffff880082ba3888   R8: 8080000000000000   R9: f9edaefcd4601010     
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000004     
    R13: ffff880082ba38c8  R14: ffff88062fa78040  R15: 0000000000013200     
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018                          
--- <NMI exception stack> ---                                               
 #6 [ffff880082ba3838] spl_kmem_cache_reap_now+0x42 at ffffffffa05a4662 [spl]
 #7 [ffff880082ba3890] __spl_kmem_cache_generic_shrinker+0x4b at ffffffffa05a69bb [spl]
 #8 [ffff880082ba38c0] spl_kmem_cache_generic_shrinker+0x20 at ffffffffa05a6a70 [spl]
 #9 [ffff880082ba38e0] shrink_slab+0x12a at ffffffff81131a9a                
#10 [ffff880082ba3940] do_try_to_free_pages+0x3f7 at ffffffff81133e77       
#11 [ffff880082ba39e0] try_to_free_pages+0x92 at ffffffff81134262           
#12 [ffff880082ba3a80] __alloc_pages_nodemask+0x478 at ffffffff8112bab8     
#13 [ffff880082ba3bc0] alloc_pages_vma+0x9a at ffffffff8116064a             
#14 [ffff880082ba3c10] do_wp_page+0xfd at ffffffff811424ad                  
#15 [ffff880082ba3cb0] handle_pte_fault+0x2cd at ffffffff811434cd           
#16 [ffff880082ba3d90] handle_mm_fault+0x23a at ffffffff81143f8a            
#17 [ffff880082ba3e00] __do_page_fault+0x139 at ffffffff810474c9            
#18 [ffff880082ba3f20] do_page_fault+0x3e at ffffffff81513a8e               
#19 [ffff880082ba3f50] page_fault+0x25 at ffffffff81510e45                  
    RIP: 000000000044fb45  RSP: 00007fffffffea60  RFLAGS: 00010246          
    RAX: 0000000000000004  RBX: 0000000000000002  RCX: 00002aaaab125a3e     
    RDX: 0000000000000004  RSI: 0000000000000000  RDI: 0000000000000002     
    RBP: 0000000000000002   R8: 00007fffffffe920   R9: 0000000000000000     
    R10: 0000000000000008  R11: 0000000000000206  R12: 000000000044fbb0     
    R13: 0000000000000000  R14: 0000000000000000  R15: 00000000006ea6e0     
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b                          

(gdb) l *spl_kmem_cache_reap_now+0x42                                       
0x5692 is in spl_kmem_cache_reap_now (/usr/src/kernels/2.6.32-358.6.1.3chaos.ch5.1.x86_64/arch/x86/include/asm/bitops.h:201).
196      */                                                                 
197     static inline int test_and_set_bit(int nr, volatile unsigned long *addr)
198     {                                                                   
199             int oldbit;                                                 
200                                                                         
201             asm volatile(LOCK_PREFIX "bts %2,%1\n\t"                    
202                          "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory");
203                                                                         
204             return oldbit;                                              
205     }                                                                   

PID: 11489  TASK: ffff88007f191540  CPU: 10  COMMAND: "crond"               
 #0 [ffff88034ac87e90] crash_nmi_callback+0x46 at ffffffff8102d306          
 #1 [ffff88034ac87ea0] notifier_call_chain+0x55 at ffffffff81513b45         
 #2 [ffff88034ac87ee0] atomic_notifier_call_chain+0x1a at ffffffff81513baa  
 #3 [ffff88034ac87ef0] notify_die+0x2e at ffffffff8109cf5e                  
 #4 [ffff88034ac87f20] do_nmi+0x1bb at ffffffff8151180b                     
 #5 [ffff88034ac87f50] nmi+0x20 at ffffffff815110d0                         
    [exception RIP: spl_slab_reclaim+589]                                   
    RIP: ffffffffa05a448d  RSP: ffff88001dabd788  RFLAGS: 00000246          
    RAX: ffff88001dabd7d8  RBX: ffff880626b50000  RCX: 0000000000000005     
    RDX: 00000000000066d1  RSI: 0000000000000004  RDI: ffff880626b58090     
    RBP: ffff88001dabd828   R8: 8080000000000000   R9: 0000000000000001     
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff88001dabd7d8     
    R13: 0000000000000000  R14: ffff88001dabd7c0  R15: 0000000000013200     
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018                          
--- <NMI exception stack> ---                                               
 #6 [ffff88001dabd788] spl_slab_reclaim+0x24d at ffffffffa05a448d [spl]     
 #7 [ffff88001dabd830] spl_kmem_cache_reap_now+0x144 at ffffffffa05a4764 [spl]
 #8 [ffff88001dabd890] __spl_kmem_cache_generic_shrinker+0x4b at ffffffffa05a69bb [spl]
 #9 [ffff88001dabd8c0] spl_kmem_cache_generic_shrinker+0x20 at ffffffffa05a6a70 [spl]
#10 [ffff88001dabd8e0] shrink_slab+0x12a at ffffffff81131a9a                
#11 [ffff88001dabd940] do_try_to_free_pages+0x3f7 at ffffffff81133e77       
#12 [ffff88001dabd9e0] try_to_free_pages+0x92 at ffffffff81134262           
#13 [ffff88001dabda80] __alloc_pages_nodemask+0x478 at ffffffff8112bab8     
#14 [ffff88001dabdbc0] alloc_pages_vma+0x9a at ffffffff8116064a             
#15 [ffff88001dabdc10] do_wp_page+0xfd at ffffffff811424ad                  
#16 [ffff88001dabdcb0] handle_pte_fault+0x2cd at ffffffff811434cd           
#17 [ffff88001dabdd90] handle_mm_fault+0x23a at ffffffff81143f8a            
#18 [ffff88001dabde00] __do_page_fault+0x139 at ffffffff810474c9            
#19 [ffff88001dabdf20] do_page_fault+0x3e at ffffffff81513a8e               
#20 [ffff88001dabdf50] page_fault+0x25 at ffffffff81510e45                  
    RIP: 00007ffff7ff9e39  RSP: 00007ffffffde520  RFLAGS: 00010246          
    RAX: 00000000ffffffff  RBX: 0000000000000000  RCX: 00007ffff829a370     
    RDX: 0000000000000048  RSI: 0000000000000001  RDI: 0000000000000000     
    RBP: 00007ffff829a370   R8: 00007fffffffeef0   R9: 0000000000000001     
    R10: 0000000000000055  R11: 676f6c2f7665642f  R12: 00007ffff7ffc24f     
    R13: 00007ffff82809d0  R14: 0000000000002ce1  R15: 0000000000000000     
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b                          

(gdb) l *spl_slab_reclaim+0x24d                                             
0x54bd is in spl_slab_reclaim (/usr/src/debug/spl-kmod-0.6.1/spl-0.6.1/module/spl/spl-kmem.c:1107).
1102             * objects and slabs back to the system.                    
1103             */                                                         
1104            if (skc->skc_flags & KMC_OFFSLAB)                           
1105                    size = spl_offslab_size(skc);                       
1106                                                                        
1107            list_for_each_entry_safe(sko, n, &sko_list, sko_list) {     
1108                    ASSERT(sko->sko_magic == SKO_MAGIC);                
1109                                                                        
1110                    if (skc->skc_dtor)                                  
1111                            skc->skc_dtor(sko->sko_addr, skc->skc_private);

PID: 12099  TASK: ffff88031a166aa0  CPU: 6   COMMAND: "ll_ost02_000"        
 #0 [ffff88034ac07e90] crash_nmi_callback+0x46 at ffffffff8102d306          
 #1 [ffff88034ac07ea0] notifier_call_chain+0x55 at ffffffff81513b45         
 #2 [ffff88034ac07ee0] atomic_notifier_call_chain+0x1a at ffffffff81513baa  
 #3 [ffff88034ac07ef0] notify_die+0x2e at ffffffff8109cf5e                  
 #4 [ffff88034ac07f20] do_nmi+0x1bb at ffffffff8151180b                     
 #5 [ffff88034ac07f50] nmi+0x20 at ffffffff815110d0                         
    [exception RIP: __spl_kmem_cache_generic_shrinker+82]                   
    RIP: ffffffffa05a69c2  RSP: ffff8801f09bf520  RFLAGS: 00000246          
    RAX: ffff880624068070  RBX: ffff880615100000  RCX: 0000000000000034     
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: ffffffffa05d4200     
    RBP: ffff8801f09bf540   R8: 8080000000000000   R9: f9edaefcd4601010     
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000515780     
    R13: ffff8801f09bf550  R14: 0000000000000050  R15: 0000000000013200     
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018                          
--- <NMI exception stack> ---                                               
 #6 [ffff8801f09bf520] __spl_kmem_cache_generic_shrinker+0x52 at ffffffffa05a69c2 [spl]
 #7 [ffff8801f09bf548] spl_kmem_cache_generic_shrinker+0x20 at ffffffffa05a6a70 [spl]
 #8 [ffff8801f09bf568] shrink_slab+0x11a at ffffffff81131a8a                
 #9 [ffff8801f09bf5c8] do_try_to_free_pages+0x3f7 at ffffffff81133e77       
#10 [ffff8801f09bf668] try_to_free_pages+0x92 at ffffffff81134262           
#11 [ffff8801f09bf708] __alloc_pages_nodemask+0x478 at ffffffff8112bab8     
#12 [ffff8801f09bf848] kmem_getpages+0x62 at ffffffff811666a2               
#13 [ffff8801f09bf878] fallback_alloc+0x1ba at ffffffff811672ba             
#14 [ffff8801f09bf8f8] ____cache_alloc_node+0x99 at ffffffff81167039        
#15 [ffff8801f09bf958] kmem_cache_alloc+0x11b at ffffffff81167fbb           
#16 [ffff8801f09bf998] cfs_mem_cache_alloc+0x22 at ffffffffa08eda72 [libcfs]
#17 [ffff8801f09bf9b8] ldlm_resource_get+0x152 at ffffffffa0b915a2 [ptlrpc] 
#18 [ffff8801f09bfa28] ldlm_lock_create+0x55 at ffffffffa0b8c115 [ptlrpc]   
#19 [ffff8801f09bfa78] ldlm_cli_enqueue_local+0xbe at ffffffffa0ba977e [ptlrpc]
#20 [ffff8801f09bfb08] ofd_destroy_by_fid+0x321 at ffffffffa114d391 [ofd]   
#21 [ffff8801f09bfc08] ofd_destroy+0x1a7 at ffffffffa11500d7 [ofd]          
#22 [ffff8801f09bfc78] ost_handle+0x4349 at ffffffffa11247a9 [ost]          
#23 [ffff8801f09bfdb8] ptlrpc_server_handle_request+0x398 at ffffffffa0be3738 [ptlrpc]
#24 [ffff8801f09bfeb8] ptlrpc_main+0xace at ffffffffa0be4ace [ptlrpc]       
#25 [ffff8801f09bff48] child_rip+0xa at ffffffff8100c0ca                    

(gdb) l *__spl_kmem_cache_generic_shrinker+0x52                             
0x79f2 is in __spl_kmem_cache_generic_shrinker (/usr/src/debug/spl-kmod-0.6.1/spl-0.6.1/module/spl/spl-kmem.c:2121).
2116                     * Presume everything alloc'ed in reclaimable, this ensures
2117                     * we are called again with nr_to_scan > 0 so can try and
2118                     * reclaim.  The exact number is not important either so
2119                     * we forgo taking this already highly contented lock.
2120                     */                                                 
2121                    unused += skc->skc_obj_alloc;                       
2122            }                                                           
2123            up_read(&spl_kmem_cache_sem);                               
2124                                                                        
2125            return (unused * sysctl_vfs_cache_pressure) / 100;          

PID: 12160  TASK: ffff88031a296040  CPU: 2   COMMAND: "cerebrod"            
 #0 [ffff880028247e90] crash_nmi_callback+0x46 at ffffffff8102d306          
 #1 [ffff880028247ea0] notifier_call_chain+0x55 at ffffffff81513b45         
 #2 [ffff880028247ee0] atomic_notifier_call_chain+0x1a at ffffffff81513baa  
 #3 [ffff880028247ef0] notify_die+0x2e at ffffffff8109cf5e                  
 #4 [ffff880028247f20] do_nmi+0x1bb at ffffffff8151180b                     
 #5 [ffff880028247f50] nmi+0x20 at ffffffff815110d0                         
    [exception RIP: spl_kmem_cache_reap_now+66]                             
    RIP: ffffffffa05a4662  RSP: ffff88027d77d718  RFLAGS: 00000247          
    RAX: 0000000000000004  RBX: ffff8805ed400000  RCX: 0000000000000005     
    RDX: 0000000000000010  RSI: 0000000000000004  RDI: ffff8805ed400000     
    RBP: ffff88027d77d768   R8: 8080000000000000   R9: f9c13407f3201010     
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000004     
    R13: ffff88027d77d7a8  R14: ffff8805ed408040  R15: 0000000000013200     
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018                          
--- <NMI exception stack> ---                                               
 #6 [ffff88027d77d718] spl_kmem_cache_reap_now+0x42 at ffffffffa05a4662 [spl]
 #7 [ffff88027d77d770] __spl_kmem_cache_generic_shrinker+0x4b at ffffffffa05a69bb [spl]
 #8 [ffff88027d77d7a0] spl_kmem_cache_generic_shrinker+0x20 at ffffffffa05a6a70 [spl]
 #9 [ffff88027d77d7c0] shrink_slab+0x12a at ffffffff81131a9a                
#10 [ffff88027d77d820] do_try_to_free_pages+0x3f7 at ffffffff81133e77       
#11 [ffff88027d77d8c0] try_to_free_pages+0x92 at ffffffff81134262           
#12 [ffff88027d77d960] __alloc_pages_nodemask+0x478 at ffffffff8112bab8     
#13 [ffff88027d77daa0] alloc_pages_vma+0x9a at ffffffff8116064a             
#14 [ffff88027d77daf0] handle_pte_fault+0x76b at ffffffff8114396b           
#15 [ffff88027d77dbd0] handle_mm_fault+0x23a at ffffffff81143f8a            
#16 [ffff88027d77dc40] __do_page_fault+0x139 at ffffffff810474c9            
#17 [ffff88027d77dd60] do_page_fault+0x3e at ffffffff81513a8e               
#18 [ffff88027d77dd90] page_fault+0x25 at ffffffff81510e45                  
    [exception RIP: copy_user_generic_string+45]                            
    RIP: ffffffff8128243d  RSP: ffff88027d77de40  RFLAGS: 00010202          
    RAX: ffff88027d77c000  RBX: ffff88027d77df48  RCX: 0000000000000002     
    RDX: 0000000000000001  RSI: ffff88003a2c8000  RDI: 00002aaaac5bd000     
    RBP: ffff88027d77de98   R8: 0000000000000073   R9: ffff8806149180c4     
    R10: 0000000000000000  R11: 0000000000000001  R12: 0000000000000011     
    R13: 00002aaaac5bd000  R14: 0000000000000011  R15: ffff88003a2c8000     
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018                          
#19 [ffff88027d77de40] lprocfs_fops_read+0x133 at ffffffffa0a154f3 [obdclass]
#20 [ffff88027d77dea0] proc_reg_read+0x7e at ffffffff811e974e               
#21 [ffff88027d77def0] vfs_read+0xb5 at ffffffff81181395                    
#22 [ffff88027d77df30] sys_read+0x51 at ffffffff811814d1                    
#23 [ffff88027d77df80] system_call_fastpath+0x16 at ffffffff8100b072        
    RIP: 00002aaaab3cc5f0  RSP: 00007fffffffdc60  RFLAGS: 00000206          
    RAX: 0000000000000000  RBX: ffffffff8100b072  RCX: 000000000063e288     
    RDX: 0000000000000400  RSI: 00002aaaac5bd000  RDI: 0000000000000001     
    RBP: 00007fffffffdb50   R8: 00000000ffffffff   R9: 0000000000000000     
    R10: 0000000000000022  R11: 0000000000000246  R12: 0000000000000000     
    R13: 0000000000000000  R14: 000000000063dfc0  R15: 0000000000000073     
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b                          

PID: 12788  TASK: ffff8806380caae0  CPU: 8   COMMAND: "ll_ost02_012"        
 #0 [ffff88034ac47e90] crash_nmi_callback+0x46 at ffffffff8102d306          
 #1 [ffff88034ac47ea0] notifier_call_chain+0x55 at ffffffff81513b45         
 #2 [ffff88034ac47ee0] atomic_notifier_call_chain+0x1a at ffffffff81513baa  
 #3 [ffff88034ac47ef0] notify_die+0x2e at ffffffff8109cf5e                  
 #4 [ffff88034ac47f20] do_nmi+0x1bb at ffffffff8151180b                     
 #5 [ffff88034ac47f50] nmi+0x20 at ffffffff815110d0                         
    [exception RIP: spl_kmem_cache_reap_now+350]                            
    RIP: ffffffffa05a477e  RSP: ffff8806086a94c0  RFLAGS: 00000246          
    RAX: ffff88034aa08bd0  RBX: ffff8806261f0000  RCX: 0000000000000034     
    RDX: 0000000000000010  RSI: ffff8806261f8040  RDI: ffff88034aa08bc8     
    RBP: ffff8806086a9510   R8: 8080000000000000   R9: f9bbe20947a01010     
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000004     
    R13: ffff8806086a9550  R14: ffff8806261f8040  R15: 0000000000013200     
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018                          
--- <NMI exception stack> ---                                               
 #6 [ffff8806086a94c0] spl_kmem_cache_reap_now+0x15e at ffffffffa05a477e [spl]
 #7 [ffff8806086a9518] __spl_kmem_cache_generic_shrinker+0x4b at ffffffffa05a69bb [spl]
 #8 [ffff8806086a9548] spl_kmem_cache_generic_shrinker+0x20 at ffffffffa05a6a70 [spl]
 #9 [ffff8806086a9568] shrink_slab+0x12a at ffffffff81131a9a                
#10 [ffff8806086a95c8] do_try_to_free_pages+0x3f7 at ffffffff81133e77       
#11 [ffff8806086a9668] try_to_free_pages+0x92 at ffffffff81134262           
#12 [ffff8806086a9708] __alloc_pages_nodemask+0x478 at ffffffff8112bab8     
#13 [ffff8806086a9848] kmem_getpages+0x62 at ffffffff811666a2               
#14 [ffff8806086a9878] fallback_alloc+0x1ba at ffffffff811672ba             
#15 [ffff8806086a98f8] ____cache_alloc_node+0x99 at ffffffff81167039        
#16 [ffff8806086a9958] kmem_cache_alloc+0x11b at ffffffff81167fbb           
#17 [ffff8806086a9998] cfs_mem_cache_alloc+0x22 at ffffffffa08eda72 [libcfs]
#18 [ffff8806086a99b8] ldlm_resource_get+0x152 at ffffffffa0b915a2 [ptlrpc] 
#19 [ffff8806086a9a28] ldlm_lock_create+0x55 at ffffffffa0b8c115 [ptlrpc]   
#20 [ffff8806086a9a78] ldlm_cli_enqueue_local+0xbe at ffffffffa0ba977e [ptlrpc]
#21 [ffff8806086a9b08] ofd_destroy_by_fid+0x321 at ffffffffa114d391 [ofd]   
#22 [ffff8806086a9c08] ofd_destroy+0x1a7 at ffffffffa11500d7 [ofd]          
#23 [ffff8806086a9c78] ost_handle+0x4349 at ffffffffa11247a9 [ost]          
#24 [ffff8806086a9db8] ptlrpc_server_handle_request+0x398 at ffffffffa0be3738 [ptlrpc]
#25 [ffff8806086a9eb8] ptlrpc_main+0xace at ffffffffa0be4ace [ptlrpc]       
#26 [ffff8806086a9f48] child_rip+0xa at ffffffff8100c0ca                    

(gdb) l *spl_kmem_cache_reap_now+0x15e                                      
0x57ae is in spl_kmem_cache_reap_now (/usr/src/debug/spl-kmod-0.6.1/spl-0.6.1/module/spl/spl-kmem.c:2205).
2200            smp_mb__after_clear_bit();                                  
2201            wake_up_bit(&skc->skc_flags, KMC_BIT_REAPING);              
2202                                                                        
2203            atomic_dec(&skc->skc_ref);                                  
2204                                                                        
2205            SEXIT;                                                      
2206    }                                                                   
2207    EXPORT_SYMBOL(spl_kmem_cache_reap_now);                             
2208                                                                        
2209    /*   
behlendorf added a commit to behlendorf/spl that referenced this issue Jul 26, 2013
It has been observed that it's possible to get in a state where
shrink_slabs() will spin repeated invoking the generic kmem cache
shrinker.  It fails to detect it's not making forward progress
reclaiming from the cache and doesn't give up.  To ensure this
never occurs we unconditionally return -1 after reclaiming what
we can.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs/zfs#1276
Issue openzfs/zfs#1598
ryao pushed a commit to ryao/spl that referenced this issue Jul 30, 2013
It has been observed that it's possible to get in a state where
shrink_slabs() will spin repeated invoking the generic kmem cache
shrinker.  It fails to detect it's not making forward progress
reclaiming from the cache and doesn't give up.  To ensure this
never occurs we unconditionally return -1 after reclaiming what
we can.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs/zfs#1276
Issue openzfs/zfs#1598
ryao pushed a commit to ryao/spl that referenced this issue Aug 7, 2013
It has been observed that it's possible to get in a state where
shrink_slabs() will spin repeated invoking the generic kmem cache
shrinker.  It fails to detect it's not making forward progress
reclaiming from the cache and doesn't give up.  To ensure this
never occurs we unconditionally return -1 after reclaiming what
we can.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs/zfs#1276
Issue openzfs/zfs#1598
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant