-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Driver load BSOD with Aomei driver #384
Comments
BSOD type 2
|
ATTEMPTED_WRITE_TO_READONLY_MEMORY
|
unloading driver
|
@lundman commented 8 hours ago:
Maybe it's worth asking on some OSR Developer Community forum? NTDEV looks like a close match:
Last time I communicated on OSR was circa 2000, so things might have changed since. But the community seemed very friendly. :) |
Ah yeah, I want to chase down a bit further before I post over there |
Oh! Thanks for the personal update :) |
Probably on there still.. perhaps I come across too aristocraty....:) |
This is the earliest crash (in terms of stack depth) I've seen:
|
OK so this could be an indication
We are called, we pass down to StorPort, which ends up calling us again, and we try to call StorPort again. This is not ideal - it is odd StorPort would call us at all. |
OK so fixing that, and things around there, does not fix the memory corruption - that still exists somewhere - outside of dispatcher |
It' would be interesting to see the changes nevertheless. I guess they will be committed sooner or later. |
heh they are all pretty desperate - trying to narrow the issue down. next is commenting out large chunks in the hopes it either goes away, or not |
OK, I have take out everything ZFS. No SPL, no ZFS, no dispatcher, no dprintf, WPP. Left is the DriverStart() calling StorPortInitialize(). emptyBSOD
Using
|
Then there is this crash:
My x64 is a bit rusty, but it seems it checks for rax being NULL, since it isn't, it does not jump. Then the next store to rax+0x10, now rax is NULL, and we crash! Neat. What would be changing my rax on the fly, if I read that correctly. I believe only preemption/interrupts can execute between those statements, so is it destroying rax somewhere? i386/x64 lists rax as nonvolitile. UPDATE
|
Could you step through that code by chance? My improvised guess is that:
|
here is the whole function
I don't see any jumps to |
It looks you're right:
Very strange indeed. |
Which usually means interrupt, either hardware, preemption, DPC or similar. But I've commented out all those things, calls to __cpuidex(), assembly, etc. Could explain why the crashes are random, and often with NETIO (most common interrupts on my system). |
Turned on storport verifylevel 1, and verifier to check ambdrv, volsnap and openzfs, this is one of the quickest crashes so far: volsnap
This looks corrupt, stack maybe, so not sure it adds any more clues Going up frame #4 in stack is even at address 0x0807060504030201 which seems "suspect". It comes from volsnap:
IofCallDriver(DeviceObject, goes into
Maybe we return something on the stack we shouldn't. |
Interesting that the last call to us
|
What I have seen on powershell get-volume (Open-ZFS ..f74) `DriveLetter FriendlyName FileSystemType DriveType HealthStatus OperationalStatus SizeRemaining Size
I Volume2 NTFS Fixed Healthy OK 3.88 GB 3.91 GB Drive D is a ZFS pool and the only entry with a driveletter, unknown filesystem type,, status unknown and size 0 |
@guenther-alka, your output is so messed up that it took me a half of an hour to reconstruct the structure:
Also I guess that you're posting to the wrong issue. |
@lundman commented 2 days ago:
[...]
@lundman commented 2 days ago:
Umm, is there anything left to comment out except for
This time I would rather blame logic coming from other drivers.
That would be very interesting.
To me it already suggests that Aomei is buggy. |
Found a forum in which might be lurking folks skilled in NT kernel crashes: Eg: Storport.sys BSOD case. |
I took out the |
If Windows crashes with ZFS for an unknown reason, my point was to check in what items ZFS behaves different from other volumes and undefined/unknown states are such a point. |
Yeah it's interesting the difference there, I do suspect we aren't handling MOUNTMGR correctly, so we will have to return to it. But that issue kicks in when something is mounted. I can crash just loading/unloading the driver, without ever using ZFS parts. But let's put that on the list to look into. |
So, instead of calling our zvol's and call it's
But it suggests that it isn't the code. I tried added various Registry entries from the .inf file without luck. Now I am curious if that Service-Fabric-2's miniport driver loads with Aomei at all. |
Err the actual stack
|
Since it is not actually calling us, we can't skip ambakdrv.sys based on "name of caller". I suppose we can skip loading stoport, if we detect it is running on the system. |
OK to get some progress, what I will do for now is, the default value of |
I have been trying to find out why loading openfs.sys will BSOD if Aomei's BackUpper is installed. The crash is generally not instant, but a few second (1-10s) later. The stack is indicative of stack overflow, or list corruption and never related to either driver. Usually when receiving network IO.
I do not think there is a bug in Aomei, but rather simply triggered by them querying openzfs.sys.
However, I can ignore all Driver->MajorFunction calls from Aomei, and we will STILL crash. So it isn't anything returned in dispatcher. But what else is there?
If I do not call
StorPortInitialize()
we do not crash. I do not think there is a storport issue, but rather that it creates a device type that Aomei is interested in, and if we do not have storport, whatever code that triggers the BSOD is just not executed.It doesn't always BSOD. Of 5 loads, it crashes about 4 times. Sometimes it loads fine and I can use it, benchmark etc. So presumably a race somewhere, but not to calling MajorFunctions.
I am unsure what to do next, a lot of things recommend
verifier
- but this is a driver load issue, so I do not know if that will help.I doubt Aomei developers are interested in debugging.
Example stack of BSOD:
!analyze -v
The text was updated successfully, but these errors were encountered: