Llama #270

Ivan-Zhou · 2023-08-02T16:50:18Z

Implement Llama based on HF implementation and the paper.

A few noteable difference from Gpt2:

Rotary Positional Embedding;
No dropout being used (therefore, many of the call() doesn't need key as input).

Tasks

Add Llama code based on HF implementation
Refactor with Haliax
Match Levanter's state dict with HF's
Roundtrip test
Ensure perf roughly matches GPT-2 implementation at scale

src/levanter/models/llama.py

dlwh · 2023-08-16T15:15:27Z

btw i fixed the problem in latest haliax. You should still make them static fields, but you won't get an error there

…

On Wed, Aug 16, 2023 at 7:58 AM Ivan Zhou ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/levanter/models/llama.py <#270 (comment)> : > + residual = x + x = self.ln_2(x) + mlp_output = self.mlp(x) + output = residual + mlp_output + return output + + +class LlamaTransformer(StateDictSerializationMixin, eqx.Module): + config: LlamaConfig = eqx.static_field() + layers: Stacked[LlamaDecoderLayer] + ln_f: hnn.LayerNorm + + @staticmethod + def init(config: LlamaConfig, *, key) -> "LlamaTransformer": + # TODO: here it reports an error that is related to _get_rotary_emb() in LlamaAttention + # TypeError: Output from batched function Axis(name='head_size', size=4) with type <class 'haliax.axis.Axis'> is not a valid JAX type Will make a try — Reply to this email directly, view it on GitHub <#270 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAACLIKWRMRODVFPZROCRK3XVTN3BANCNFSM6AAAAAA3BRV53M> . You are receiving this because you commented.Message ID: ***@***.***>

src/levanter/models/llama.py

tests/test_llama.py

Co-authored-by: David Hall <[email protected]>

Ivan-Zhou · 2023-08-30T04:34:38Z

Untie word_embeddings at LMHead is done. The only pending issue is Jax leakage when loading HF weight to Levanter's model.

tests/test_llama.py

src/levanter/models/llama.py

tests/test_llama.py

src/levanter/compat/hf_checkpoints.py

Ivan-Zhou · 2023-08-30T19:59:37Z

Great thanks to @dlwh for helping with the roundtrip tests and massively improve the code style & taste in this PR 👍

Ivan-Zhou added 12 commits July 30, 2023 16:47

Add Llama config, Mlp, Attention, and RotaryEmb

ca6ada5

address integration test

667a56c

test compare with hf implementation

0e77701

Finish LlamaRotaryEmbedding

ed10ee1

Add LlamaLinearScalingRotaryEmbedding

6034aee

Add LlamaDynamicNTKScalingRotaryEmbedding

e9a7c56

Refactor to simplified class differences

6e97f5f

Implement _get_rotary_emb

271be4b

work on attention

669e6e7

test initialize attention

e45b39b

_apply_rotary_pos_emb

bbcd0ec

update llama and test

442b93d

dlwh reviewed Aug 8, 2023

View reviewed changes

Ivan-Zhou added 5 commits August 13, 2023 13:26

Merge branch 'main' into llama

673fa17

update llama and test

40bb7b4

Finish Llama Attention

466c1db

Finish implementing LlamaLMHeadModel

f9db049

fix build

6781a8c

Ivan-Zhou marked this pull request as ready for review August 14, 2023 00:56

fix build

abb70fa

raisin reviewed Aug 14, 2023

View reviewed changes

src/levanter/models/llama.py Show resolved Hide resolved

Ivan-Zhou added 2 commits August 15, 2023 19:54

remove max_position_embeddings

f21cd1c

Fix issues found from testing

6e222b7

Ivan-Zhou commented Aug 16, 2023

View reviewed changes

src/levanter/models/llama.py Outdated Show resolved Hide resolved

Ivan-Zhou and others added 5 commits August 19, 2023 17:46

Fix issues found from end-to-end tests

182327e

Fix torch import issue

46939d5

Refactor RoPE

92a7f23

remove ()

65f5888

NamedArray type hint

3f028b9

make test pass even without auth token

a022bb1

dlwh requested changes Aug 28, 2023

View reviewed changes

Ivan-Zhou and others added 10 commits August 28, 2023 19:26

Intermediate -> Mlp

ec742c4

Update src/levanter/models/llama.py

a64e63a

Co-authored-by: David Hall <[email protected]>

Merge branch 'llama' of github.com:stanford-crfm/levanter into llama

1820ea9

Update src/levanter/models/llama.py

1bd25ca

Co-authored-by: David Hall <[email protected]>

Update src/levanter/models/llama.py

d7412d4

Co-authored-by: David Hall <[email protected]>

Update src/levanter/models/llama.py

2e3c1fb

Co-authored-by: David Hall <[email protected]>

Fix issues from pre-commit checks

05f684f

Start from llama 2 hf in roundtrip

f6157df

Update model_id in the round trip test

5109b55

Untie weight at LMHead Linear Layer

0a5d36d

dlwh and others added 2 commits August 30, 2023 09:12

fix round trip test, use compile time eval for the cos/sin cache

c056942

Update src/levanter/models/llama.py

8380283

dlwh reviewed Aug 30, 2023

View reviewed changes

tests/test_llama.py Outdated Show resolved Hide resolved

dlwh and others added 6 commits August 30, 2023 09:20

Update tests/test_llama.py

93eeebe

let's just use llama names where reasonable

dca9157

implement LmHeadModel in LLama

9ccd795

use haliax's built in attention

5d8a102

Merge remote-tracking branch 'origin/main' into llama

d6e780c

update for latest main: tokenizer resizing

8c6a20d

dlwh approved these changes Aug 30, 2023

View reviewed changes

Ivan-Zhou commented Aug 30, 2023

View reviewed changes

src/levanter/models/llama.py Show resolved Hide resolved

Ivan-Zhou commented Aug 30, 2023

View reviewed changes

tests/test_llama.py Show resolved Hide resolved

Ivan-Zhou commented Aug 30, 2023

View reviewed changes

src/levanter/compat/hf_checkpoints.py Show resolved Hide resolved

Ivan-Zhou merged commit d07ff37 into main Aug 30, 2023

Ivan-Zhou deleted the llama branch August 30, 2023 19:59

dlwh mentioned this pull request Aug 30, 2023

Add LLaMA #235

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama #270

Llama #270

Ivan-Zhou commented Aug 2, 2023 •

edited

Loading

dlwh commented Aug 16, 2023 via email

Ivan-Zhou commented Aug 30, 2023

Ivan-Zhou commented Aug 30, 2023

Llama #270

Llama #270

Conversation

Ivan-Zhou commented Aug 2, 2023 • edited Loading

dlwh commented Aug 16, 2023 via email

Ivan-Zhou commented Aug 30, 2023

Ivan-Zhou commented Aug 30, 2023

Ivan-Zhou commented Aug 2, 2023 •

edited

Loading