decoding LM vocab #19

calderma · 2022-04-03T21:01:01Z

hello, I trained a model using something like the wt103 task and modified the sashimi generate script to generate text like a CLM. So basically conditioning on a text string, generate the next N words sequentially in the same loop like the Sashimi generation script. I believe that I have it working however I don't know what integer output corresponds to what word in the vocab. Is there a hash table or something that stores the vocab somewhere that's easily accessible? Sorry I can't seem to find any obvious place that it would reside. Thank you for your help.

albertfgu · 2022-04-03T23:37:02Z

The mapping between token IDs and words are stored in the tokenizer, which is this line: https://github.com/HazyResearch/state-spaces/blob/83a9f136a6353648681cdd5dcc2a0eac48a69340/src/dataloaders/lm.py#L431

If you've already trained your model, the tokenizer should be cached: https://github.com/HazyResearch/state-spaces/blob/83a9f136a6353648681cdd5dcc2a0eac48a69340/src/dataloaders/lm.py#L469

We have not actually tried to generate from the trained LM, so unfortunately can't help you with this. Let us know if you get it working and maybe we can get incorporate it into a PR.

calderma · 2022-04-08T21:36:20Z

Great thanks. I'm still working on getting it working. If I do I'll let you know.

albertfgu · 2022-08-11T18:03:19Z

The generation script has been improved, and we now have a trained WikiText-103 checkpoint that generates text. Instructions can be found here

calderma closed this as completed Apr 8, 2022

ethanbar11 mentioned this issue Jul 20, 2022

Memory Corruption Error in Kernel _setup_linear #56

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

decoding LM vocab #19

decoding LM vocab #19

calderma commented Apr 3, 2022

albertfgu commented Apr 3, 2022

calderma commented Apr 8, 2022

albertfgu commented Aug 11, 2022

decoding LM vocab #19

decoding LM vocab #19

Comments

calderma commented Apr 3, 2022

albertfgu commented Apr 3, 2022

calderma commented Apr 8, 2022

albertfgu commented Aug 11, 2022