Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llm-tool: crash during async generation #3

Closed
zach-brockway opened this issue Feb 26, 2024 · 11 comments
Closed

llm-tool: crash during async generation #3

zach-brockway opened this issue Feb 26, 2024 · 11 comments

Comments

@zach-brockway
Copy link

Steps to reproduce:

  1. Move LLMTool.swift @main annotation from SyncGenerator to AsyncGenerator.
  2. Run llm-tool target.

Result: Thread 4: EXC_BAD_ACCESS (code=2, address=0x16ff1bfe0)

Call stack:

#0	0x00000001050d4ac0 in mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&)::$_3::operator()(mlx::core::array const&, bool) const at /Users/zach/Library/Developer/Xcode/DerivedData/mlx-swift-examples-ffvauqjflgpefffxjxtewhqyjymm/SourcePackages/checkouts/mlx-swift/Source/Cmlx/mlx/mlx/transforms.cpp:55
#1	0x00000001050d4ab0 in decltype(std::declval<mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&)::$_3&>()(std::declval<mlx::core::array const&>(), std::declval<bool>())) std::__1::__invoke[abi:v160006]<mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&)::$_3&, mlx::core::array const&, bool>(mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&)::$_3&, mlx::core::array const&, bool&&) at /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__functional/invoke.h:394
#2	0x00000001050d4a50 in void std::__1::__invoke_void_return_wrapper<void, true>::__call<mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&)::$_3&, mlx::core::array const&, bool>(mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&)::$_3&, mlx::core::array const&, bool&&) at /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__functional/invoke.h:487

(... snip 5000 frames of mlx::core::eval recursion...)

#5320	0x00000001050c8d9c in mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&) at /Users/zach/Library/Developer/Xcode/DerivedData/mlx-swift-examples-ffvauqjflgpefffxjxtewhqyjymm/SourcePackages/checkouts/mlx-swift/Source/Cmlx/mlx/mlx/transforms.cpp:119
#5321	0x0000000104537828 in ::mlx_eval(mlx_vector_array) at /Users/zach/Library/Developer/Xcode/DerivedData/mlx-swift-examples-ffvauqjflgpefffxjxtewhqyjymm/SourcePackages/checkouts/mlx-swift/Source/Cmlx/include/mlx/c/transforms.cpp:20
#5322	0x0000000105199c80 in eval(_:) at /Users/zach/Library/Developer/Xcode/DerivedData/mlx-swift-examples-ffvauqjflgpefffxjxtewhqyjymm/SourcePackages/checkouts/mlx-swift/Source/MLX/Transforms+Eval.swift:12
#5323	0x0000000104494928 in closure #1 in generate(prompt:model:temp:) at /Users/zach/Documents/Development/GitHub/mlx/mlx-swift-examples/Libraries/LLM/Util.swift:103
#5324	0x0000000104494f14 in partial apply for closure #1 in generate(prompt:model:temp:) ()
#5325	0x0000000104495910 in thunk for @escaping @callee_guaranteed @Sendable @async () -> (@out τ_0_0) ()
#5326	0x0000000104495a6c in partial apply for thunk for @escaping @callee_guaranteed @Sendable @async () -> (@out τ_0_0) ()
@davidkoski
Copy link
Collaborator

This gets really deep call stacks when compiled Debug. These are mostly tail calls and if you compile Release this will work. I will document this.

@zach-brockway
Copy link
Author

Thanks for the quick reply! I wondered if that was what was happening, but after trying a couple of things, I got the impression the stack size can't really be controlled with Swift concurrency (unless/until more executor stuff lands, maybe?). It wasn't immediately clear to me how to e.g. link an optimized build of the dependency into an otherwise debug build of my app, either. I don't have a ton of Swift experience, so grateful for any suggestions!

@davidkoski
Copy link
Collaborator

Yeah, I don't know that the stack size can be controlled either. Another option (maybe) is to put @MainActor on it. This would work fine for a command line tool like this, but wouldn't work for an application, e.g. if you wanted to run this on iOS. This might work for you, depending on what you are doing.

As for how to do that when you have a Debug build, one thing that I tried much earlier was to add -O3 in mlx-swift's Package.swift.

For example around line 87:

            cxxSettings: [
                .headerSearchPath("mlx"),
                .headerSearchPath("include/mlx-c"),

You can add something like .unsafeOptions(["-O3"]),. This isn't suitable for putting in the Package.swift in the repo as it makes it unbuildable in most situations (limitations around the unsafe options) but I think it ought to work for development.

@zach-brockway
Copy link
Author

Makes sense. Yeah, unfortunately, I was trying to wrap it with SwiftUI. I'll poke around at it more this evening after work. Thanks again!

@davidkoski
Copy link
Collaborator

I added some notes in the README: 8870b0d

@zach-brockway
Copy link
Author

zach-brockway commented Feb 27, 2024

Thanks. Sadly, I tried .unsafeFlags(["-O3"]) in both cSettings and cxxSettings for the Cmlx target without any luck.

I ended up kludging together a variation of generate() that's a cheesy mishmash of NSThread + async; I'm not even sure this semaphore backpressure hack will actually work the way I intended, but FWIW:

public class LLMGeneratorThread : Thread {
	public let buffer: AsyncBufferSequence<AsyncChannel<Int>>

	let channel = AsyncChannel<Int>()
	let prompt: MLXArray
	let model: LLMModel
	let sem = DispatchSemaphore(value: 0)

	public init(prompt: MLXArray, model: LLMModel) {
		self.prompt = prompt
		self.model = model
		self.buffer = channel.buffer(policy: .bounded(10))

		super.init()

		self.stackSize = 8 * 1024 * 1024
	}

	public override func main() {
		var y = prompt
		var cache = [(MLXArray, MLXArray)]()
		while !isCancelled {
			var logits: MLXArray
			(logits, cache) = model(
				expandedDimensions(y, axis: 0), cache: cache.isEmpty ? nil : cache)
			y = sample(logits: logits[-1, axis: 1], temp: 1.0)
			eval(y)

			let token = y.item(Int.self)
			Task {
				await channel.send(token)
				sem.signal()
			}
			sem.wait()
		}
	}
}

@davidkoski
Copy link
Collaborator

Yeah, mixing the two like this is not ideal. I wonder if you could use this code for #if DEBUG and have an #else (Release) that was straight async code for Release builds? Also not ideal, but I think if you encapsulate then at least it is one particular spot to watch.

@johnmai-dev
Copy link
Contributor

Yeah, I don't know that the stack size can be controlled either. Another option (maybe) is to put @MainActor on it. This would work fine for a command line tool like this, but wouldn't work for an application, e.g. if you wanted to run this on iOS. This might work for you, depending on what you are doing.

As for how to do that when you have a Debug build, one thing that I tried much earlier was to add -O3 in mlx-swift's Package.swift.

For example around line 87:

            cxxSettings: [
                .headerSearchPath("mlx"),
                .headerSearchPath("include/mlx-c"),

You can add something like .unsafeOptions(["-O3"]),. This isn't suitable for putting in the Package.swift in the repo as it makes it unbuildable in most situations (limitations around the unsafe options) but I think it ought to work for development.

image

Maybe unsafeFlags?

@davidkoski
Copy link
Collaborator

Maybe unsafeFlags?

Yeah, that's the ticket :-)

@johnmai-dev
Copy link
Contributor

johnmai-dev commented Mar 6, 2024

@davidkoski @zach-brockway I have tried all optimization flags, only -O3 does not work, -O, -O1, -O2 and -Os are all working.😂

image

@johnmai-dev johnmai-dev mentioned this issue Mar 6, 2024
@davidkoski
Copy link
Collaborator

The deep recursion issues are fixed as of ml-explore/mlx-swift#67

Eval in an async context is safe. You can even run Debug builds!

pcuenca pushed a commit to pcuenca/mlx-swift-examples that referenced this issue Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants