Example Code always produces Max Length Sequences #171

tawheeler · 2024-03-10T00:41:22Z

Thank you for this great package!

I tried modifying the example "Copy Task" code to have a 50% chance of producing a 9-token string and otherwise produce a 10-token string:

sample_data() = (d = join(map(string, rand(1:10, (rand() < 0.5 ? 9 : 10))), ' '); (d,d))

When I train this, the model learns to always produce a 10-token string:

I originally noticed this when I changed the code to only produce 1 or 2-token sequences, and there it also would only ever produce 2-token sequences after training. I suspect there is some issue with masking or maybe with the loss function, but I haven't figured it out yet.

FWIW, the loss never gets extremely low (~1e-5) like it does if you only train with 10-token sequences, but reaches about 0.5.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example Code always produces Max Length Sequences #171

Example Code always produces Max Length Sequences #171

tawheeler commented Mar 10, 2024

Example Code always produces Max Length Sequences #171

Example Code always produces Max Length Sequences #171

Comments

tawheeler commented Mar 10, 2024