llm-tool: crash during async generation #3

zach-brockway · 2024-02-26T01:13:22Z

Steps to reproduce:

Move LLMTool.swift @main annotation from SyncGenerator to AsyncGenerator.
Run llm-tool target.

Result: Thread 4: EXC_BAD_ACCESS (code=2, address=0x16ff1bfe0)

Call stack:

#0	0x00000001050d4ac0 in mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&)::$_3::operator()(mlx::core::array const&, bool) const at /Users/zach/Library/Developer/Xcode/DerivedData/mlx-swift-examples-ffvauqjflgpefffxjxtewhqyjymm/SourcePackages/checkouts/mlx-swift/Source/Cmlx/mlx/mlx/transforms.cpp:55
#1	0x00000001050d4ab0 in decltype(std::declval<mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&)::$_3&>()(std::declval<mlx::core::array const&>(), std::declval<bool>())) std::__1::__invoke[abi:v160006]<mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&)::$_3&, mlx::core::array const&, bool>(mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&)::$_3&, mlx::core::array const&, bool&&) at /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__functional/invoke.h:394
#2	0x00000001050d4a50 in void std::__1::__invoke_void_return_wrapper<void, true>::__call<mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&)::$_3&, mlx::core::array const&, bool>(mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&)::$_3&, mlx::core::array const&, bool&&) at /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__functional/invoke.h:487

(... snip 5000 frames of mlx::core::eval recursion...)

#5320	0x00000001050c8d9c in mlx::core::eval(std::__1::vector<mlx::core::array, std::__1::allocator<mlx::core::array>> const&) at /Users/zach/Library/Developer/Xcode/DerivedData/mlx-swift-examples-ffvauqjflgpefffxjxtewhqyjymm/SourcePackages/checkouts/mlx-swift/Source/Cmlx/mlx/mlx/transforms.cpp:119
#5321	0x0000000104537828 in ::mlx_eval(mlx_vector_array) at /Users/zach/Library/Developer/Xcode/DerivedData/mlx-swift-examples-ffvauqjflgpefffxjxtewhqyjymm/SourcePackages/checkouts/mlx-swift/Source/Cmlx/include/mlx/c/transforms.cpp:20
#5322	0x0000000105199c80 in eval(_:) at /Users/zach/Library/Developer/Xcode/DerivedData/mlx-swift-examples-ffvauqjflgpefffxjxtewhqyjymm/SourcePackages/checkouts/mlx-swift/Source/MLX/Transforms+Eval.swift:12
#5323	0x0000000104494928 in closure #1 in generate(prompt:model:temp:) at /Users/zach/Documents/Development/GitHub/mlx/mlx-swift-examples/Libraries/LLM/Util.swift:103
#5324	0x0000000104494f14 in partial apply for closure #1 in generate(prompt:model:temp:) ()
#5325	0x0000000104495910 in thunk for @escaping @callee_guaranteed @Sendable @async () -> (@out τ_0_0) ()
#5326	0x0000000104495a6c in partial apply for thunk for @escaping @callee_guaranteed @Sendable @async () -> (@out τ_0_0) ()

The text was updated successfully, but these errors were encountered:

davidkoski · 2024-02-26T02:01:57Z

This gets really deep call stacks when compiled Debug. These are mostly tail calls and if you compile Release this will work. I will document this.

zach-brockway · 2024-02-26T05:49:49Z

Thanks for the quick reply! I wondered if that was what was happening, but after trying a couple of things, I got the impression the stack size can't really be controlled with Swift concurrency (unless/until more executor stuff lands, maybe?). It wasn't immediately clear to me how to e.g. link an optimized build of the dependency into an otherwise debug build of my app, either. I don't have a ton of Swift experience, so grateful for any suggestions!

davidkoski · 2024-02-26T06:32:04Z

Yeah, I don't know that the stack size can be controlled either. Another option (maybe) is to put @MainActor on it. This would work fine for a command line tool like this, but wouldn't work for an application, e.g. if you wanted to run this on iOS. This might work for you, depending on what you are doing.

As for how to do that when you have a Debug build, one thing that I tried much earlier was to add -O3 in mlx-swift's Package.swift.

For example around line 87:

            cxxSettings: [
                .headerSearchPath("mlx"),
                .headerSearchPath("include/mlx-c"),

You can add something like .unsafeOptions(["-O3"]),. This isn't suitable for putting in the Package.swift in the repo as it makes it unbuildable in most situations (limitations around the unsafe options) but I think it ought to work for development.

zach-brockway · 2024-02-26T16:26:29Z

Makes sense. Yeah, unfortunately, I was trying to wrap it with SwiftUI. I'll poke around at it more this evening after work. Thanks again!

davidkoski · 2024-02-26T18:35:56Z

I added some notes in the README: 8870b0d

zach-brockway · 2024-02-27T06:33:25Z

Thanks. Sadly, I tried .unsafeFlags(["-O3"]) in both cSettings and cxxSettings for the Cmlx target without any luck.

I ended up kludging together a variation of generate() that's a cheesy mishmash of NSThread + async; I'm not even sure this semaphore backpressure hack will actually work the way I intended, but FWIW:

public class LLMGeneratorThread : Thread {
	public let buffer: AsyncBufferSequence<AsyncChannel<Int>>

	let channel = AsyncChannel<Int>()
	let prompt: MLXArray
	let model: LLMModel
	let sem = DispatchSemaphore(value: 0)

	public init(prompt: MLXArray, model: LLMModel) {
		self.prompt = prompt
		self.model = model
		self.buffer = channel.buffer(policy: .bounded(10))

		super.init()

		self.stackSize = 8 * 1024 * 1024
	}

	public override func main() {
		var y = prompt
		var cache = [(MLXArray, MLXArray)]()
		while !isCancelled {
			var logits: MLXArray
			(logits, cache) = model(
				expandedDimensions(y, axis: 0), cache: cache.isEmpty ? nil : cache)
			y = sample(logits: logits[-1, axis: 1], temp: 1.0)
			eval(y)

			let token = y.item(Int.self)
			Task {
				await channel.send(token)
				sem.signal()
			}
			sem.wait()
		}
	}
}

davidkoski · 2024-02-27T16:00:59Z

Yeah, mixing the two like this is not ideal. I wonder if you could use this code for #if DEBUG and have an #else (Release) that was straight async code for Release builds? Also not ideal, but I think if you encapsulate then at least it is one particular spot to watch.

johnmai-dev · 2024-03-03T11:14:20Z

Yeah, I don't know that the stack size can be controlled either. Another option (maybe) is to put @MainActor on it. This would work fine for a command line tool like this, but wouldn't work for an application, e.g. if you wanted to run this on iOS. This might work for you, depending on what you are doing.

As for how to do that when you have a Debug build, one thing that I tried much earlier was to add -O3 in mlx-swift's Package.swift.

For example around line 87:
            cxxSettings: [
                .headerSearchPath("mlx"),
                .headerSearchPath("include/mlx-c"),
You can add something like .unsafeOptions(["-O3"]),. This isn't suitable for putting in the Package.swift in the repo as it makes it unbuildable in most situations (limitations around the unsafe options) but I think it ought to work for development.

Maybe unsafeFlags?

davidkoski · 2024-03-05T19:12:03Z

Maybe unsafeFlags?

Yeah, that's the ticket :-)

johnmai-dev · 2024-03-06T16:18:28Z

@davidkoski @zach-brockway I have tried all optimization flags, only -O3 does not work, -O, -O1, -O2 and -Os are all working.😂

davidkoski · 2024-04-26T07:12:03Z

The deep recursion issues are fixed as of ml-explore/mlx-swift#67

Eval in an async context is safe. You can even run Debug builds!

Some cleanup

johnmai-dev mentioned this issue Mar 6, 2024

fix: typo #16

Merged

davidkoski closed this as completed Apr 26, 2024

davidkoski mentioned this issue May 20, 2024

Phi-3 mini stop token not recognized #74

Closed

pcuenca pushed a commit to pcuenca/mlx-swift-examples that referenced this issue Feb 18, 2025

Merge pull request ml-explore#3 from pcuenca/smolvlm-changes

2807b88

Some cleanup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-tool: crash during async generation #3

llm-tool: crash during async generation #3

zach-brockway commented Feb 26, 2024

davidkoski commented Feb 26, 2024

zach-brockway commented Feb 26, 2024

davidkoski commented Feb 26, 2024

zach-brockway commented Feb 26, 2024

davidkoski commented Feb 26, 2024

zach-brockway commented Feb 27, 2024 •

edited

Loading

davidkoski commented Feb 27, 2024

johnmai-dev commented Mar 3, 2024

davidkoski commented Mar 5, 2024

johnmai-dev commented Mar 6, 2024 •

edited

Loading

davidkoski commented Apr 26, 2024

llm-tool: crash during async generation #3

llm-tool: crash during async generation #3

Comments

zach-brockway commented Feb 26, 2024

davidkoski commented Feb 26, 2024

zach-brockway commented Feb 26, 2024

davidkoski commented Feb 26, 2024

zach-brockway commented Feb 26, 2024

davidkoski commented Feb 26, 2024

zach-brockway commented Feb 27, 2024 • edited Loading

davidkoski commented Feb 27, 2024

johnmai-dev commented Mar 3, 2024

davidkoski commented Mar 5, 2024

johnmai-dev commented Mar 6, 2024 • edited Loading

davidkoski commented Apr 26, 2024

zach-brockway commented Feb 27, 2024 •

edited

Loading

johnmai-dev commented Mar 6, 2024 •

edited

Loading