-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trace model outputs to a binary file #477
Conversation
I did run test dump and then parsed it with the analyzer script and it works. |
I think this is a great idea. Make it as a "trace.cpp" tool like there are "quantize.cpp" and "perplexity.cpp" already. |
@anzz1 Thank you for expressing your interest. I'm replying to #331 (comment) here:
Yes I plan to finalize a v1 format once the specifications are better defined. Right now it's pretty basic, there is a single record type, for a specific application. I try to keep it mind that this should be kept as simple as possible, while being an helpful debugging tool. I see two side for this: the I have a lot ideas for improving the format, but have not yet weighted them properly. The activations could be encoded in f16, or made sparse by removing the tail. In any case it depends a lot on the downstream task. To ensure expandability we could use a generic object format like BSON. A simple serializer can be made from few functions, without requiring tagged unions or dynamic types. This might also be a candidate for the ggml model header, but it is not as easy to write a deserializer in a lean way.
The main value is indeed post-mortem analysis. However, Transformers' outputs are easily reproduced by feeding back the prompt and inferred text. Essentially, it's largely about the ease of use and the amount of CPU time waisted for a replay. Also, I have yet to see tools that bring llama activations and weights in a repl for numerical analysis (but python binding may eventually get there too).
I did something very similar in #246. You can already get a long way by un-comment the debug code, but some of it broke during refactoring.
I have the same issue, my Westmere era machine lacks AVX, so I have to connect to a friend's WSL2 system.
Same here 😄 I feel like I'm shooting in the dark with such a complex and sensitive to noise system. I'm sure that C gurus are doing fine with printf and awk. Honestly, I'm just not feeling very confident in what I'm doing without better access to the data. |
This adds a
--trace
option that exports the decoder activations to a file. The format can be read with:It's a basic app that comes with a parser, designed to help building analysis tools and tests. For now, it only replicates the soft max and the top k and nucleus filtering.
The format is versioned and should be easily extendable with more data (eg. embeddings, insertions in caches).
I'm using it for the same purpose as #246 (which is retired now) - to perform numerical analysis in Python for exploring #331.
Is this approach acceptable and useful to others?
This might become redundant once Python bindings allows to do the same thing in process memory, without going through a serialized format.