Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start/pause/stop: Readd 'generateHandler' helper #46

Closed
wants to merge 1 commit into from

Conversation

cfergeau
Copy link
Contributor

@cfergeau cfergeau commented Sep 10, 2022

This reintroduces code which was removed in 481c580.
The Block_copy/Block_release calls from this helper seem to be needed on my
x86_64 macOS11 machine. Without this, example/linux/virtualization segfaults
in startWithCompletionHandler.

This fixes this ASAN error:

=================================================================
==56003==ERROR: AddressSanitizer: requested allocation size 0x53cb4e83f8b48 (0x53cb4e83f9b48 after adjustments for alignment, red zones etc.) exceeds maximum supported size of 0x10000000000 (thread T6)
    #0 0x43f0400 in wrap_malloc+0xa0 (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x44400)
    #1 0x7fff205753ba in _Block_copy+0x5e (libsystem_blocks.dylib:x86_64+0x13ba)
    #2 0x7fff6f594307 in Base::BlockPtr<void (bool)> Base::BlockPtr<void (bool)>::from_callable<-[VZVirtualMachine startWithCompletionHandler:]::$_13>(-[VZVirtualMachine startWithCompletionHandler:]::$_13)::'lambda'(void*, bool)::__invoke(void*, bool)+0xb37 (Virtualization:x86_64+0x24307)
    #3 0x43ef5fa in __wrap_dispatch_async_block_invoke+0xca (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x435fa)
    #4 0x7fff20691622 in _dispatch_call_block_and_release+0xb (libdispatch.dylib:x86_64+0x2622)
    #5 0x7fff20692805 in _dispatch_client_callout+0x7 (libdispatch.dylib:x86_64+0x3805)
    #6 0x7fff206985e9 in _dispatch_lane_serial_drain+0x25d (libdispatch.dylib:x86_64+0x95e9)
    #7 0x7fff206990ac in _dispatch_lane_invoke+0x16d (libdispatch.dylib:x86_64+0xa0ac)
    #8 0x7fff206a2c0c in _dispatch_workloop_worker_thread+0x32a (libdispatch.dylib:x86_64+0x13c0c)
    #9 0x7fff2083945c in _pthread_wqthread+0x139 (libsystem_pthread.dylib:x86_64+0x345c)
    #10 0x7fff2083842e in start_wqthread+0xe (libsystem_pthread.dylib:x86_64+0x242e)

This reintroduces code which was removed in 481c580.
The Block_copy/Block_release calls from this helper seem to be needed on my
x86_64 macOS11 machine. Without this, example/linux/virtualization segfaults
in startWithCompletionHandler.

This fixes this ASAN error:
=================================================================
==56003==ERROR: AddressSanitizer: requested allocation size 0x53cb4e83f8b48 (0x53cb4e83f9b48 after adjustments for alignment, red zones etc.) exceeds maximum supported size of 0x10000000000 (thread T6)
    #0 0x43f0400 in wrap_malloc+0xa0 (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x44400)
    Code-Hex#1 0x7fff205753ba in _Block_copy+0x5e (libsystem_blocks.dylib:x86_64+0x13ba)
    Code-Hex#2 0x7fff6f594307 in Base::BlockPtr<void (bool)> Base::BlockPtr<void (bool)>::from_callable<-[VZVirtualMachine startWithCompletionHandler:]::$_13>(-[VZVirtualMachine startWithCompletionHandler:]::$_13)::'lambda'(void*, bool)::__invoke(void*, bool)+0xb37 (Virtualization:x86_64+0x24307)
    Code-Hex#3 0x43ef5fa in __wrap_dispatch_async_block_invoke+0xca (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x435fa)
    Code-Hex#4 0x7fff20691622 in _dispatch_call_block_and_release+0xb (libdispatch.dylib:x86_64+0x2622)
    Code-Hex#5 0x7fff20692805 in _dispatch_client_callout+0x7 (libdispatch.dylib:x86_64+0x3805)
    Code-Hex#6 0x7fff206985e9 in _dispatch_lane_serial_drain+0x25d (libdispatch.dylib:x86_64+0x95e9)
    Code-Hex#7 0x7fff206990ac in _dispatch_lane_invoke+0x16d (libdispatch.dylib:x86_64+0xa0ac)
    Code-Hex#8 0x7fff206a2c0c in _dispatch_workloop_worker_thread+0x32a (libdispatch.dylib:x86_64+0x13c0c)
    Code-Hex#9 0x7fff2083945c in _pthread_wqthread+0x139 (libsystem_pthread.dylib:x86_64+0x345c)
    Code-Hex#10 0x7fff2083842e in start_wqthread+0xe (libsystem_pthread.dylib:x86_64+0x242e)
@Code-Hex
Copy link
Owner

@cfergeau Thanks.

It seems the problem is defer handler.Delete() place. We have to call in the completion handler.
I want to make sure this fix is really good, so please write a test code.

cfergeau added a commit to cfergeau/vz that referenced this pull request Sep 19, 2022
cfergeau added a commit to cfergeau/vz that referenced this pull request Sep 19, 2022
@cfergeau
Copy link
Contributor Author

I tested git master with this change, but I still get the asan error described in the commit log.
This is related to an allocation which is far too big, not memory corruption/use after free. I don't know i f the nested ^{ could be causing this?

diff --git a/virtualization.go b/virtualization.go
index 019dacb..fd80a4e 100644
--- a/virtualization.go
+++ b/virtualization.go
@@ -222,6 +222,7 @@ func virtualMachineCompletionHandler(cgoHandlerPtr, errPtr unsafe.Pointer) {
 	} else {
 		handler(nil)
 	}
+	cgoHandler.Delete()
 }
 
 func makeHandler(fn func(error)) (func(error), chan struct{}) {
@@ -239,7 +240,6 @@ func makeHandler(fn func(error)) (func(error), chan struct{}) {
 func (v *VirtualMachine) Start(fn func(error)) {
 	h, done := makeHandler(fn)
 	handler := cgo.NewHandle(h)
-	defer handler.Delete()
 	C.startWithCompletionHandler(v.Ptr(), v.dispatchQueue, unsafe.Pointer(&handler))
 	<-done
 }
@@ -251,7 +251,6 @@ func (v *VirtualMachine) Start(fn func(error)) {
 func (v *VirtualMachine) Pause(fn func(error)) {
 	h, done := makeHandler(fn)
 	handler := cgo.NewHandle(h)
-	defer handler.Delete()
 	C.pauseWithCompletionHandler(v.Ptr(), v.dispatchQueue, unsafe.Pointer(&handler))
 	<-done
 }
@@ -263,7 +262,6 @@ func (v *VirtualMachine) Pause(fn func(error)) {
 func (v *VirtualMachine) Resume(fn func(error)) {
 	h, done := makeHandler(fn)
 	handler := cgo.NewHandle(h)
-	defer handler.Delete()
 	C.resumeWithCompletionHandler(v.Ptr(), v.dispatchQueue, unsafe.Pointer(&handler))
 	<-done
 }
@@ -299,7 +297,6 @@ func (v *VirtualMachine) Stop(fn func(error)) {
 	}
 	h, done := makeHandler(fn)
 	handler := cgo.NewHandle(h)
-	defer handler.Delete()
 	C.stopWithCompletionHandler(v.Ptr(), v.dispatchQueue, unsafe.Pointer(&handler))
 	<-done
 }

cfergeau added a commit to cfergeau/vz that referenced this pull request Sep 19, 2022
@Code-Hex
Copy link
Owner

I think you may use go build -asan instead of clang. Because those allocations are not completed in just C world.

Please write test code. Thanks

@cfergeau
Copy link
Contributor Author

I think you may use go build -asan instead of clang. Because those allocations are not completed in just C world.

Please write test code. Thanks

-asan does not work on x86 with the go version I have :( But if I remove all -fsanitize=address from LDFLAGS, and try to run virtualization from git master, or with the changes you suggested, both segfault. With the change from this PR, there is no such segfault.

@Code-Hex
Copy link
Owner

Code-Hex commented Sep 19, 2022

@cfergeau I mean, writing a test code is writing TestXXX function in xxx_test.go. In current PR, you should write TestStart, TestStop, TestResume, TestRequestStop functions in virtualization_test.go

I'm very sorry but I can't trust without some test code.


-asan option is introduced in go1.18

https://tip.golang.org/doc/go1.18

@cfergeau
Copy link
Contributor Author

cfergeau commented Sep 20, 2022

@cfergeau I mean, writing a test code is writing TestXXX function in xxx_test.go. In current PR, you should write TestStart, TestStop, TestResume, TestRequestStop functions in virtualization_test.go

I'm very sorry but I can't trust without some test code.

It's good that you insist on it, especially because I forgot to mention that I'd look into these test functions, but I did not have time yet for it

@Code-Hex Code-Hex closed this Sep 20, 2022
@Code-Hex
Copy link
Owner

Code-Hex commented Sep 20, 2022

@cfergeau I'm sorry I can't accept your PR because I don't have a way to check fixed or not

@cfergeau
Copy link
Contributor Author

I am working on adding tests, but so far I've only been able to reproduce the issue on x86_64 machines. Can we keep this open a while longer while I finish the tests?

@gbraad
Copy link

gbraad commented Sep 20, 2022

Why close this PR?

I forgot to mention that I'd look into these test functions, but I did not have time yet for it

I believe the clue here is 'not ... yet'.
These will be added, but wasn't possible yet at the time of writing the comment: #46 (comment)

@Code-Hex
Copy link
Owner

@cfergeau @gbraad I'm sorry I was misreading it.
But maybe I should write some test code because you seems need to take a lot of time. I'll fix this issue while I'm at it. Thanks 😊

@cfergeau
Copy link
Contributor Author

cfergeau commented Sep 20, 2022

But maybe I should write some test code because you seems need to take a lot of time. I'll fix this issue while I'm at it. Thanks blush

So far I have written cfergeau@36da76c which is missing vmlinuz/initrd, but hopefully good enough for the tests I want to do :)

@cfergeau
Copy link
Contributor Author

Can this be reopened? I've added tests triggered in github actions in https://github.com/cfergeau/vz/tree/asan
The failures which are fixed in this PR should be visible in this run: https://github.com/cfergeau/vz/actions/runs/3089765099/jobs/4997751003

@gbraad
Copy link

gbraad commented Sep 20, 2022

because you seems need to take a lot of time. I'll fix this issue while I'm at it. Thanks

I think he has time to work on this. According to the planning of our current sprint he allocated time to work on time syncing issues on macOS virtualization, vz/vfkit related changes and some other small side issues. So, I think this can be re-opened... unless you have other ideas and/or more time?

@cfergeau
Copy link
Contributor Author

https://github.com/cfergeau/vz/actions/runs/3089765099/jobs/4997751402 is all red, the macOS11 failure is fixed by cfergeau@bb57273 and the macOS12 failure is fixed by cfergeau@cdf340a

With these 2 changes, https://github.com/cfergeau/vz/actions/runs/3089887405/jobs/4998004598 is green.

@Code-Hex
Copy link
Owner

@cfergeau @gbraad From the beginning, only a reply such as "OK, I will fix later" should be sufficient. I was merely responding to the reply, "I don't have time."

Anyway, I will write the base testing code and then fix it. Also add the Contribution Guidelines so that modifications without test code are not accepted. Thanks

Repository owner locked and limited conversation to collaborators Sep 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants