Trace model outputs to a binary file #477

Piezoid · 2023-03-24T21:45:39Z

This adds a --trace option that exports the decoder activations to a file. The format can be read with:

python -m examples.traceparser trace.bin

It's a basic app that comes with a parser, designed to help building analysis tools and tests. For now, it only replicates the soft max and the top k and nucleus filtering.

The format is versioned and should be easily extendable with more data (eg. embeddings, insertions in caches).

I'm using it for the same purpose as #246 (which is retired now) - to perform numerical analysis in Python for exploring #331.

Is this approach acceptable and useful to others?
This might become redundant once Python bindings allows to do the same thing in process memory, without going through a serialized format.

main.cpp

utils.cpp

thement · 2023-03-25T19:07:20Z

I did run test dump and then parsed it with the analyzer script and it works.
I would maybe think about making a separate directory for all the analysis-related scripts and tools (maybe make a tools directory?).

anzz1 · 2023-03-26T13:12:47Z

I did run test dump and then parsed it with the analyzer script and it works. I would maybe think about making a separate directory for all the analysis-related scripts and tools (maybe make a tools directory?).

I think this is a great idea. Make it as a "trace.cpp" tool like there are "quantize.cpp" and "perplexity.cpp" already.

Piezoid · 2023-03-27T19:28:22Z

@anzz1 Thank you for expressing your interest.

I'm replying to #331 (comment) here:

I'm also rooting for you finishing the trace tool at some point.

Yes I plan to finalize a v1 format once the specifications are better defined. Right now it's pretty basic, there is a single record type, for a specific application.

I try to keep it mind that this should be kept as simple as possible, while being an helpful debugging tool. I see two side for this: the --trace option in the main example that could perform few basic data extractions task, and the trace_* intended for debugging numerical code in an ad hoc fashion.

I have a lot ideas for improving the format, but have not yet weighted them properly. The activations could be encoded in f16, or made sparse by removing the tail. In any case it depends a lot on the downstream task.

To ensure expandability we could use a generic object format like BSON. A simple serializer can be made from few functions, without requiring tagged unions or dynamic types. This might also be a candidate for the ggml model header, but it is not as easy to write a deserializer in a lean way.

I see that it could be highly valuable. Especially for larger scale testing and graphing, but even for smaller cases like debugging and general interest it's a cool feature to have to be able to see how and why the decisions were made and which tokens are what, etc.

The main value is indeed post-mortem analysis. However, Transformers' outputs are easily reproduced by feeding back the prompt and inferred text. Essentially, it's largely about the ease of use and the amount of CPU time waisted for a replay. Also, I have yet to see tools that bring llama activations and weights in a repl for numerical analysis (but python binding may eventually get there too).

I had the idea some time ago to have a command line option to output that stuff to console, maybe redirect stderr but tbh your idea is a lot better.

I did something very similar in #246. You can already get a long way by un-comment the debug code, but some of it broke during refactoring.

my CPU is too low powered to do any proper quantitative analysis like using perplexity tool.

I have the same issue, my Westmere era machine lacks AVX, so I have to connect to a friend's WSL2 system.

To be perfectly honest I dont really know wtf i'm doing half the time

Same here 😄 I feel like I'm shooting in the dark with such a complex and sensitive to noise system. I'm sure that C gurus are doing fine with printf and awk. Honestly, I'm just not feeling very confident in what I'm doing without better access to the data.

Piezoid mentioned this pull request Mar 25, 2023

with logits_all == true, use to the last logits vector #498

Closed

thement reviewed Mar 25, 2023

View reviewed changes

main.cpp Outdated Show resolved Hide resolved

thement reviewed Mar 25, 2023

View reviewed changes

utils.cpp Outdated Show resolved Hide resolved

trace logits to a file

840645d

This was referenced Mar 26, 2023

Longer and infinite output #71

Closed

[Feature Suggestion] Load/Save current conversation's tokens into file #532

Closed

anzz1 added the enhancement New feature or request label Mar 27, 2023

anzz1 mentioned this pull request Mar 27, 2023

Improving the repetition penalty #331

Closed

Piezoid closed this May 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trace model outputs to a binary file #477

Trace model outputs to a binary file #477

Piezoid commented Mar 24, 2023 •

edited

Loading

thement commented Mar 25, 2023

anzz1 commented Mar 26, 2023

Piezoid commented Mar 27, 2023

Trace model outputs to a binary file #477

Trace model outputs to a binary file #477

Conversation

Piezoid commented Mar 24, 2023 • edited Loading

thement commented Mar 25, 2023

anzz1 commented Mar 26, 2023

Piezoid commented Mar 27, 2023

Piezoid commented Mar 24, 2023 •

edited

Loading