-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llm-tool: crash during async generation #3
Comments
This gets really deep call stacks when compiled Debug. These are mostly tail calls and if you compile Release this will work. I will document this. |
Thanks for the quick reply! I wondered if that was what was happening, but after trying a couple of things, I got the impression the stack size can't really be controlled with Swift concurrency (unless/until more executor stuff lands, maybe?). It wasn't immediately clear to me how to e.g. link an optimized build of the dependency into an otherwise debug build of my app, either. I don't have a ton of Swift experience, so grateful for any suggestions! |
Yeah, I don't know that the stack size can be controlled either. Another option (maybe) is to put As for how to do that when you have a Debug build, one thing that I tried much earlier was to add For example around line 87:
You can add something like |
Makes sense. Yeah, unfortunately, I was trying to wrap it with SwiftUI. I'll poke around at it more this evening after work. Thanks again! |
I added some notes in the README: 8870b0d |
Thanks. Sadly, I tried I ended up kludging together a variation of public class LLMGeneratorThread : Thread {
public let buffer: AsyncBufferSequence<AsyncChannel<Int>>
let channel = AsyncChannel<Int>()
let prompt: MLXArray
let model: LLMModel
let sem = DispatchSemaphore(value: 0)
public init(prompt: MLXArray, model: LLMModel) {
self.prompt = prompt
self.model = model
self.buffer = channel.buffer(policy: .bounded(10))
super.init()
self.stackSize = 8 * 1024 * 1024
}
public override func main() {
var y = prompt
var cache = [(MLXArray, MLXArray)]()
while !isCancelled {
var logits: MLXArray
(logits, cache) = model(
expandedDimensions(y, axis: 0), cache: cache.isEmpty ? nil : cache)
y = sample(logits: logits[-1, axis: 1], temp: 1.0)
eval(y)
let token = y.item(Int.self)
Task {
await channel.send(token)
sem.signal()
}
sem.wait()
}
}
} |
Yeah, mixing the two like this is not ideal. I wonder if you could use this code for |
Yeah, that's the ticket :-) |
@davidkoski @zach-brockway I have tried all optimization flags, only -O3 does not work, -O, -O1, -O2 and -Os are all working.😂 ![]() |
The deep recursion issues are fixed as of ml-explore/mlx-swift#67 Eval in an async context is safe. You can even run Debug builds! |
Steps to reproduce:
@main
annotation fromSyncGenerator
toAsyncGenerator
.Result:
Thread 4: EXC_BAD_ACCESS (code=2, address=0x16ff1bfe0)
Call stack:
(... snip 5000 frames of
mlx::core::eval
recursion...)The text was updated successfully, but these errors were encountered: