A dirty and minimal port of @danielgrittner llama2.rs
- Clone repo
git clone https://github.com/mtb0x1/llama2.rs.wasm
cd llama2.rs.wasm/port3/
- Download @Karpathy's baby Llama2 (Orig instructions) pretrained on TinyStories dataset and place them in
www
folder.
wget -P www/ https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
wget -P www/ https://huggingface.co/karpathy/tinyllamas/resolve/main/stories42M.bin
wget -P www/ https://huggingface.co/karpathy/tinyllamas/resolve/main/stories110M.bin
stories42M is used by default (for now @todo), you can change this in
index.html
- Run (requires wasm-pack)
wasm-pack build --release --target web --out-dir www/pkg/
- Run a minimal webserver with
www
folder :- Run (requires python 3), you can use other webservers if you want
cd www && python3 -m http.server 8080
- go to http://localhost:8080/
- open browser console (@todo)
- (Optional) if you want to make changes :(reload browser/clear cache after changes)
- Temperature : 0.9
- Sequence length: 20
tok/s | 15M | 42M | 110M | 7B |
---|---|---|---|---|
wasm v1 | ? | ? | ? | ? |
Not really sure about result (yet!).
- Tests
- Display bench result in webpage instead of browser console (wip need cleaning and remove console.info hack)
- Infrence based on user inputs (done)
- Optmization : simd, rayon (wip) ... etc
MIT