Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PAGE_FAULT_IN_NONPAGED_AREA with Dokan 0.74 #46

Closed
ghost opened this issue Aug 22, 2015 · 29 comments
Closed

PAGE_FAULT_IN_NONPAGED_AREA with Dokan 0.74 #46

ghost opened this issue Aug 22, 2015 · 29 comments
Labels

Comments

@ghost
Copy link

ghost commented Aug 22, 2015

Just when I thought it was safe to develop on my laptop again.
I think this was caused when I was sitting at a breakpoint in Visual Studio.

3: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced.  This cannot be protected by try-except,
it must be protected by a Probe.  Typically the address is just plain bad or it
is pointing at freed memory.
Arguments:
Arg1: ffffffffffffffe8, memory referenced.
Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.
Arg3: fffff800cc121c8f, If non-zero, the instruction address which referenced the bad memory
    address.
Arg4: 0000000000000000, (reserved)

Debugging Details:
------------------


Could not read faulting driver name

READ_ADDRESS: unable to get nt!MmSpecialPoolStart
unable to get nt!MmSpecialPoolEnd
unable to get nt!MmPagedPoolEnd
unable to get nt!MmNonPagedPoolStart
unable to get nt!MmSizeOfNonPagedPoolInBytes
 ffffffffffffffe8 

FAULTING_IP: 
nt!ObQueryNameStringMode+5f
fffff800`cc121c8f 410fb64618      movzx   eax,byte ptr [r14+18h]

MM_INTERNAL_CODE:  0

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

BUGCHECK_STR:  AV

PROCESS_NAME:  CCleaner64.exe

CURRENT_IRQL:  2

ANALYSIS_VERSION: 6.3.9600.17336 (debuggers(dbg).150226-1500) amd64fre

TRAP_FRAME:  ffffd0002663ed50 -- (.trap 0xffffd0002663ed50)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=00ffffffffffffff rbx=0000000000000000 rcx=0000000000000000
rdx=00000000000000ff rsi=0000000000000000 rdi=0000000000000000
rip=fffff800cc121c8f rsp=ffffd0002663eee0 rbp=0000000000000000
 r8=00000000000000fe  r9=ffffd0002663f078 r10=fffff801ea485b80
r11=ffffd0002663f148 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         ov up ei pl nz na po cy
nt!ObQueryNameStringMode+0x5f:
fffff800`cc121c8f 410fb64618      movzx   eax,byte ptr [r14+18h] ds:00000000`00000018=??
Resetting default scope

LAST_CONTROL_TRANSFER:  from fffff800cbe2904a to fffff800cbdd8d00

STACK_TEXT:  
ffffd000`2663eb08 fffff800`cbe2904a : 00000000`00000050 ffffffff`ffffffe8 00000000`00000000 ffffd000`2663ed50 : nt!KeBugCheckEx
ffffd000`2663eb10 fffff800`cbcaa536 : 00000000`00000000 00000000`00000000 ffffd000`2663ed50 ffffe000`c0e40180 : nt! ?? ::FNODOBFM::`string'+0x4174a
ffffd000`2663ec00 fffff800`cbde1dbd : 00000000`00008000 fffff800`cbefd68d ffffe000`c2dfd000 ffffd000`2663eda0 : nt!MmAccessFault+0x696
ffffd000`2663ed50 fffff800`cc121c8f : 00000000`00000800 00000000`00000801 20206f49`00000008 fffff800`cbcd93e6 : nt!KiPageFault+0x13d
ffffd000`2663eee0 fffff800`cc1b5126 : 00000000`00000000 ffffe000`bb5f9f00 ffffe000`000000fe ffffd000`2663f078 : nt!ObQueryNameStringMode+0x5f
ffffd000`2663f000 fffff801`ea49954c : ffffd000`2663f0f0 00000000`00000000 00000000`c000014f fffff800`cbd31a4d : nt!ObQueryNameString+0xe
ffffd000`2663f040 fffff801`ea498322 : 00000000`00000000 ffffd000`2663f0b0 00000000`00000000 00000000`00000000 : FLTMGR!FltpGetObjectName+0x30
ffffd000`2663f070 fffff801`ea492d8c : ffffe000`be1f2980 ffffe000`c2204080 00000000`00000000 00000000`00000000 : FLTMGR!FltpFsControlMountVolume+0xa2
ffffd000`2663f150 fffff800`cc16eae2 : fffff800`cbfca540 00000000`00000000 ffffe000`c0e40080 ffffe000`c0e40080 : FLTMGR!FltpFsControl+0x14c
ffffd000`2663f1b0 fffff800`cbd53364 : ffffe000`c2204080 ffffe000`bfc6a990 ffffe000`c2204080 ffffd000`2663f480 : nt!IopMountVolume+0x35a
ffffd000`2663f430 fffff800`cc0b7367 : 00000000`00000005 00000000`00000000 ffffd000`2663f790 00000000`00000000 : nt!IopCheckVpbMounted+0x154
ffffd000`2663f480 fffff800`cc0b29d1 : ffffc000`b5e2a718 ffffc000`b5e2a718 ffffd000`2663f790 ffffe000`c2204050 : nt!IopParseDevice+0x4a7
ffffd000`2663f690 fffff800`cc11138c : ffffe000`bb3c7001 ffffd000`2663f8b8 00000000`00000040 ffffe000`ba20bf20 : nt!ObpLookupObjectName+0x711
ffffd000`2663f830 fffff800`cc10d69c : 00000000`00000001 ffffe000`bfc6a990 00000000`009ef2e0 00000000`009ef2d0 : nt!ObOpenObjectByName+0x1ec
ffffd000`2663f960 fffff800`cc10d25c : 00000000`009ef2b8 00000000`00000000 00000000`009ef2e0 00000000`009ef2d0 : nt!IopCreateFile+0x38c
ffffd000`2663fa00 fffff800`cbde3363 : 00000000`00000102 00000000`00000001 00000000`00000001 00000000`00000000 : nt!NtOpenFile+0x58
ffffd000`2663fa90 00007fff`be6e382a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13
00000000`009ef268 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007fff`be6e382a


STACK_COMMAND:  kb

FOLLOWUP_IP: 
nt!ObQueryNameStringMode+5f
fffff800`cc121c8f 410fb64618      movzx   eax,byte ptr [r14+18h]

SYMBOL_STACK_INDEX:  4

SYMBOL_NAME:  nt!ObQueryNameStringMode+5f

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: nt

IMAGE_NAME:  ntkrnlmp.exe

DEBUG_FLR_IMAGE_TIMESTAMP:  55c9bcb6

IMAGE_VERSION:  10.0.10240.16431

BUCKET_ID_FUNC_OFFSET:  5f

FAILURE_BUCKET_ID:  AV_nt!ObQueryNameStringMode

BUCKET_ID:  AV_nt!ObQueryNameStringMode

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:av_nt!obquerynamestringmode

FAILURE_ID_HASH:  {9eb41a12-81ba-71f6-0c3d-31180b493b51}

Followup: MachineOwner
---------

@ghost
Copy link
Author

ghost commented Aug 22, 2015

Whoa, didn't see CCleaner there before. I'll disable it - I wonder what it tried to access?

@marinkobabic
Copy link
Contributor

It tried to access the volume. If you can do the same again to reproduce the problem, then I would install Ccleaner on my machine :-)

@ghost
Copy link
Author

ghost commented Aug 22, 2015

0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced.  This cannot be protected by try-except,
it must be protected by a Probe.  Typically the address is just plain bad or it
is pointing at freed memory.
Arguments:
Arg1: ffffffffffffffe9, memory referenced.
Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.
Arg3: fffff802d7d07c8f, If non-zero, the instruction address which referenced the bad memory
    address.
Arg4: 0000000000000000, (reserved)

Debugging Details:
------------------


Could not read faulting driver name

READ_ADDRESS: unable to get nt!MmSpecialPoolStart
unable to get nt!MmSpecialPoolEnd
unable to get nt!MmPagedPoolEnd
unable to get nt!MmNonPagedPoolStart
unable to get nt!MmSizeOfNonPagedPoolInBytes
 ffffffffffffffe9 

FAULTING_IP: 
nt!ObQueryNameStringMode+5f
fffff802`d7d07c8f 410fb64618      movzx   eax,byte ptr [r14+18h]

MM_INTERNAL_CODE:  0

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

BUGCHECK_STR:  AV

PROCESS_NAME:  explorer.exe

CURRENT_IRQL:  2

ANALYSIS_VERSION: 6.3.9600.17336 (debuggers(dbg).150226-1500) amd64fre

TRAP_FRAME:  ffffd00027042fa0 -- (.trap 0xffffd00027042fa0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=00ffffffffffffff rbx=0000000000000000 rcx=0000000000000001
rdx=00000000000000ff rsi=0000000000000000 rdi=0000000000000000
rip=fffff802d7d07c8f rsp=ffffd00027043130 rbp=0000000000000001
 r8=00000000000000fe  r9=ffffd000270432c8 r10=fffff80130225b80
r11=ffffd00027043398 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         ov up ei pl nz na po cy
nt!ObQueryNameStringMode+0x5f:
fffff802`d7d07c8f 410fb64618      movzx   eax,byte ptr [r14+18h] ds:00000000`00000018=??
Resetting default scope

LAST_CONTROL_TRANSFER:  from fffff802d7a0f04a to fffff802d79bed00

STACK_TEXT:  
ffffd000`27042d58 fffff802`d7a0f04a : 00000000`00000050 ffffffff`ffffffe9 00000000`00000000 ffffd000`27042fa0 : nt!KeBugCheckEx
ffffd000`27042d60 fffff802`d7890536 : 00000000`00000000 00000000`00000000 ffffd000`27042fa0 00000000`00000002 : nt! ?? ::FNODOBFM::`string'+0x4174a
ffffd000`27042e50 fffff802`d79c7dbd : 00000000`00000000 ffffe001`ccb10001 00000000`00000006 ffffe001`00000006 : nt!MmAccessFault+0x696
ffffd000`27042fa0 fffff802`d7d07c8f : ffffd000`27043376 ffffc001`8356c060 ffffc001`8356c060 ffffd000`27043200 : nt!KiPageFault+0x13d
ffffd000`27043130 fffff802`d7d9b126 : 00000000`00000001 ffffe001`cd6de030 00000000`000000fe ffffd000`270432c8 : nt!ObQueryNameStringMode+0x5f
ffffd000`27043250 fffff801`3023954c : ffffd000`27043660 00000000`00000001 ffffc001`8356c060 fffff802`d7917a4d : nt!ObQueryNameString+0xe
ffffd000`27043290 fffff801`30238322 : 00000000`00000000 fffff802`d791718f 00000000`00000000 00000000`00000000 : FLTMGR!FltpGetObjectName+0x30
ffffd000`270432c0 fffff801`30232d8c : ffffe001`c91da040 ffffe001`c540bb70 00000000`00000000 00000000`00000000 : FLTMGR!FltpFsControlMountVolume+0xa2
ffffd000`270433a0 fffff802`d7d54ae2 : fffff802`d7bb0540 00000000`00000000 ffffe001`cc1b6800 ffffe001`cc1b6800 : FLTMGR!FltpFsControl+0x14c
ffffd000`27043400 fffff802`d7939364 : ffffe001`c540bb70 ffffd000`27043c00 ffffe001`c540bb70 ffffd000`270436d0 : nt!IopMountVolume+0x35a
ffffd000`27043680 fffff802`d7c9d367 : 00000000`00000045 00000000`00000000 ffffd000`270439e0 00000000`00000000 : nt!IopCheckVpbMounted+0x154
ffffd000`270436d0 fffff802`d7c989d1 : ffffc001`7dc2a718 ffffc001`7dc2a718 ffffd000`270439e0 ffffe001`c540bb40 : nt!IopParseDevice+0x4a7
ffffd000`270438e0 fffff802`d7cf738c : ffffe001`cc3d4401 ffffd000`27043b08 ffffe001`00000040 ffffe001`c520f440 : nt!ObpLookupObjectName+0x711
ffffd000`27043a80 fffff802`d7d7b6eb : ffffd001`00000001 00000000`041fe050 00000000`104a3ab4 00000000`000000b8 : nt!ObOpenObjectByName+0x1ec
ffffd000`27043bb0 fffff802`d79c9363 : ffffe001`cc1b6800 00000000`000003e8 ffffe001`cc1b6800 ffffd000`27043ec0 : nt!NtQueryAttributesFile+0x13b
ffffd000`27043e40 00007fff`e90e38ca : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13
00000000`041fdfe8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007fff`e90e38ca


STACK_COMMAND:  kb

FOLLOWUP_IP: 
nt!ObQueryNameStringMode+5f
fffff802`d7d07c8f 410fb64618      movzx   eax,byte ptr [r14+18h]

SYMBOL_STACK_INDEX:  4

SYMBOL_NAME:  nt!ObQueryNameStringMode+5f

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: nt

IMAGE_NAME:  ntkrnlmp.exe

DEBUG_FLR_IMAGE_TIMESTAMP:  55c9bcb6

IMAGE_VERSION:  10.0.10240.16431

BUCKET_ID_FUNC_OFFSET:  5f

FAILURE_BUCKET_ID:  AV_nt!ObQueryNameStringMode

BUCKET_ID:  AV_nt!ObQueryNameStringMode

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:av_nt!obquerynamestringmode

FAILURE_ID_HASH:  {9eb41a12-81ba-71f6-0c3d-31180b493b51}

Followup: MachineOwner
---------

I don't think it was CCleaner after all, but I can't reproduce this reliably. It doesn't happen on Server 2012 Core (my VM), only on Windows 10. I had exited my filesystem exe but still had a command line open at T:. In Server 2012, doing dir causes The system cannot find the file specified but on Windows 10 it causes a hang and a crash.

I also managed to get the BSOD in #26 again but let's solve this one first.

@ghost
Copy link
Author

ghost commented Aug 22, 2015

OK, I might have reproduced it in a strange way.

  • Mount a drive using mirror.exe
  • Open a command prompt and cd T:\
  • Attach windbg to mirror.exe so it halts
  • dir T:\
  • Exit windbg
  • dir T:\
  • Crash

If it doesn't happen the first time, it may happen the second, and there might be a fair delay before the crash

@Liryna
Copy link
Member

Liryna commented Aug 22, 2015

@ghost
Copy link
Author

ghost commented Aug 22, 2015

@Liryna sorry I hadn't finished writing the comment, I am testing now.

Edit: see #46 (comment)

@Liryna
Copy link
Member

Liryna commented Aug 23, 2015

I am currently working on the .NET wrapper (Win 10) and I probably face the same issue as you.

It seems that the sys driver is still alive even if the application is "break" or stopped. The device is like a zombie and does not unmount as it should.

In my case, I stop DokanNetMirror during debugging/breakpoint, the Dokan drive will still hang but impossible to access and the system become unstable until crash.

@marinkobabic
Copy link
Contributor

Yes the driver is still alive. Depending on how many threads you have defined, those will will try to execute your code on different threads. If the keep alive is turned on, the method will be executed successfully and the driver will not unmount. If you are debugging for a long time, driver will tell the system that there are not enough resources. You should never get a blue screen.

By the way you could make your DokanError enum as ulong. This would simplify the interface and the communication to driver. No translation between DokanError and NTSTATUS needed, because DokanError is the NTSTATUS.

@Liryna
Copy link
Member

Liryna commented Aug 23, 2015

@marinkobabic DokanOptions.KeepAlive 👍 I always forget Thank you 😄

I also changed DokanError to long type (since there is negative value).

@marinkobabic
Copy link
Contributor

@Liryna
This is exactly the point. You should not have a negative value. NTSTATUS has no negative value. Developer must know what should be returned back to file system. DokanError provides the most used options so choose one of them. Such code like -1 should not exist.

@Liryna
Copy link
Member

Liryna commented Aug 23, 2015

😣!
Why this have been done...
But changing this would break compatibility ...

@marinkobabic
Copy link
Contributor

Nobody is forced to move to new version :-)

@Liryna
Copy link
Member

Liryna commented Aug 23, 2015

About @voltagex first issue with Attach windbg to mirror.exe so it halts.
Windbg does not only break mirror but also the service. (or in a unstable situation)
This make the sys driver impossible to contact the service for unmounting.

I have to restart the service to make it work again.

@ghost
Copy link
Author

ghost commented Aug 23, 2015 via email

@Liryna
Copy link
Member

Liryna commented Aug 23, 2015

Dokan is using a service to mount and unmount the device.

When you enable DOKAN_OPTION_KEEP_ALIVE option https://github.com/dokan-dev/dokany/blob/master/dokan_mirror/mirror.c#L1126
The sys drive is checking if the userland software is still here X times.
When it doesn't, the sys drive contact the service to force unmount.

But when you attach to mirror with windbg, the service become unsuable for a reason (no idea why). This make the driver unable to unmount.

The drive stay alive and answer to every call "not enough resources".
Some software (CCleaner, cmd.exe) that try to access it, become unstable as well because they do not get a proper answer for them.

As @marinkobabic explained in another post, this could be fix by removing the service. See #45

@marinkobabic
Copy link
Contributor

Will try to reproduce the issue tomorrow. Service should not be affected. After the mount you can even shutdown the service and all should work properly except the unmount. But like you described above the BSOD happened while debugging and not during the unmount.

You can mount and then shutdown the service and start to debug and see if the BSOD still occurs.

@marinkobabic
Copy link
Contributor

I was unfortunately not able to reproduce the BSOD reported here.

What happens in the background and makes the system unstable:
If the developer is using WinDbg to attach the user mode application, there seems to be no more multithreading. The whole applications stops and we have a lot of IRPs timed out. This is different in case of Visual Studio, only one thread hangs and other IRPs are processed.

It seems like a lot of STATUS_INSUFFICIENT_RESOURCES causes the system to become unstable, because the crash does not happen inside of the dokan library.

What should be changed:

  1. Generally the developer should have the option to set the global timeout for the IRPs and also to execute ResetTimout if the operation takes longer than normally. This way we will not have a lot of IRPs timed out and the system remains stable during debugging.
  2. The driver should in case of timeout not return STATUS_INSUFFICIENT_RESOURCES (it’s not true). This seems actually to make the whole system unstable. The return value should be STATUS_RETRY or STATUS_IO_TIMEOUT. I prefer the first option.

@Liryna
Please can you increase the timeout in the driver and return STATUS_RETRY when timed out and test locally on you machine if you can debug stable without any issues?

@Liryna
Copy link
Member

Liryna commented Aug 26, 2015

@marinkobabic Are you testing on Win8.1 ?
It is impossible for me reproduce it on Win8.1. The device is pratically instantly removed.
The behaviour that I discribed was on my Win10. I will try on it later.

I agree that we could give the ability to the user to set the IRP timeout and ResetTimeout.

Otherwise, it seems that every crash reported it is for a Device Name request.
@voltagex Could you try to build mirror.c and change dokanOperations->GetVolumeInformation = MirrorGetVolumeInformation; to dokanOperations->GetVolumeInformation = NULL; ?
https://github.com/dokan-dev/dokany/blob/master/dokan_mirror/mirror.c#L1152
And try to reproduce the crash.
This will make the driver able to answer to the Device Name request himself.

Probably this function should never fail ?

@marinkobabic
Copy link
Contributor

@Liryna
I was testing on Windows 10 and I had multiple different BSOD, while blocking one, two ore more threads and caused them to timeout this way. Then I used the task manager to kill the application.

KeepAlive was not used in this case. So then we have the next question, why is keep alive an option? Is there a reason to turn it off?

@Liryna
Copy link
Member

Liryna commented Aug 26, 2015

@marinkobabic hahaha ! exactly ! I thought the same !
I think it is the same answer as why NTSTATUS * -1

If you see no reason also, we should force the KeepAlive .

@marinkobabic
Copy link
Contributor

@Liryna
So you are going to introduce some breaking changes? 😏
One more for the todo list. Please add also to the todo list "use LookasideList" for fcb and ccb.

@Liryna
Copy link
Member

Liryna commented Aug 26, 2015

Done.
We will do both in the same time :)

It would be nice to fix this BSOD before beginning the TODO. Like that we will start on a "stable" release.

@viciousviper
Copy link

@marinkobabic

What should be changed:

  1. Generally the developer should have the option to set the global timeout for the IRPs and also to execute ResetTimout if the operation takes longer than normally. This way we will not have a lot of IRPs timed out and the system remains stable during debugging.

Would it be possible to make the timeout configurable per drive via an option of the Mount method?

This would improve the situation for scenarios with hererogeneous Dokan drives - local and networked mounted at the same time.

@marinkobabic
Copy link
Contributor

@viciousviper
Will change the code according to your request. Actually my laptop does not work, so I will have some delay :-(

@ghost
Copy link
Author

ghost commented Sep 21, 2015

Any updates on this?

@Liryna
Copy link
Member

Liryna commented Sep 21, 2015

#55
It has been done but no release has been made for now, we are waiting more changes to release 8.0.0.

@Liryna
Copy link
Member

Liryna commented Sep 28, 2015

@voltagex https://github.com/dokan-dev/dokany/releases/tag/v0.8.0
This is a pre-release, so probably unstable - We would like some feedback about it !

@ghost
Copy link
Author

ghost commented Oct 10, 2015

@Liryna 0.8 RC2 - I still get a BSOD, but it really doesn't look like it was caused by Dokan! Will post another issue just in case

@Liryna
Copy link
Member

Liryna commented Oct 10, 2015

Fixed and release with #55 and the pre-release 0.8.0.
Feel free to reopen in case you face a timeout issue.

@Liryna Liryna closed this as completed Oct 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants