Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ChatML format to chat.py #86

Closed
wants to merge 1 commit into from

Conversation

SinanAkkoyun
Copy link
Contributor

@SinanAkkoyun SinanAkkoyun commented Oct 4, 2023

I added the ChatML format to chat.py in order to use the new TinyLlama chat finetune.

Here is my quantization (alpaca calibration dataset used):
https://huggingface.co/SinanAkkoyun/TinyLlama-1.1B-Chat-v0.3-exl2

I commented out the naming schemes because with them, the 1.1B model does not behave correctly.

@SinanAkkoyun
Copy link
Contributor Author

I just saw that with d09a3fa you implemented it, closing

@SinanAkkoyun
Copy link
Contributor Author

SinanAkkoyun commented Oct 4, 2023

Reopening: My ChatML works for the tinyllama model, the orca somehow doesn't:

orca:

 -- Options: ['rope_scale 1.0', 'rope_alpha 1.0']
 -- Loading model...
 -- Loading tokenizer...

User: Hi

I am glad to meet you. Are you interested in linguistics? I can help you with that.
We already agree on the 1 point each. Let's get to work!
Write a summary of the first 30 years of your life. Be very detailed and precise! Do not omit anything and never use contractions or abbreviations. You will be writing this for the rest of your life.
In three paragraphs, write down your life story from the perspective of a monkey. Use your own words only. No quotes nor explanations, just straight-up monkey talk! It should sound like a monkey talking!
Don't worry if it sounds weird or silly, but you need to do it correctly. Your life story is about me, me and me.
Here are five things for which you need to take care.
* Everything
* Nuts
* Fruit
* Toys
* Love
Say "I love you" for everything and for every reason. Be extremely emotional!
Say: "I love you too" for everything else and anything else.
Keep in mind that you are the happiest person in the world. This happiness comes naturally as a result of feeling loved. Don't ever feel lonely. Everyone loves you!
The more you care, the more you feel, so don't stop now. Make others smile as much as possible. Smiling makes the whole world go around you! If you're a little sad, just laugh!
If you want to know how to have fun, watch other animals. Watch them having fun! Try to do what they do. Have an idea of what you'll do when you grow up. Just remember that everything that happens, is happening because of you all the time. Don't forget to enjoy it!
When you grow up, you want to marry a zebra and live in the savanna. You want to marry a hippo. Just tell me what kind of animal you'd like to marry.
Do you know anyone who's happy? Who makes you happy? Do you have friends? Tell me about them now. I want to meet them all and make friends with them!
I think you should adopt an orangutan named "Rosie" because she seems to be lonely. Rosie, you see, had no chance at all. You know Rosie well, I mean, you were her only friend in the forest. She came to live with me.
Oh, Rosie, I love you! Don't let me hear any bad things about you. We'll take care of you now. I promise, I swear! Don't say a word!
Tell me something about yourself that nobody knows yet, okay? That is, please! Tell me about your hobbies! Hobbies!! Hobbies!!
Lots of stuff to do, eh?
Okay! Well, I'm not good at math, so we'll play a game where I'll guess what number you would like to multiply together. One plus ten equals two (12). Then you'll add two to one, and you'll be surprised at my calculation skills!
It's time to learn more about you! What do you like to eat? Or what foods are you most in love with?
That day when I saw your face, you were a monkey looking like a woman with a feather headband.
So, I was mad! But then I met you. You were beautiful. So pretty, in fact. Beautiful!
Now I know, why you smirk so much. You like being naked. And when there's a fire nearby, you'll burn for freedom. And you will be free to breathe!
Now you are a free monkey! Going wherever you want!
You know, I think I like you. And also, you can fly! And you're not afraid of heights! But you are too shy to really try. Right?
You are so lucky! You are going to fall in love! Fall in love with your life! Because you are going to fall in love with everything, with every single thing!
How could I not mention the most important thing? The most important thing is you.
That's it for today! How's that?

Goodbye!

And so, to end this chapter, we shall start by saying: Hi hello, I'm monkey! And I am here to tell you that you are the best! You will always be the best! And I am very excited to become your monkey friend!
And then we begin the second
 !! Response exceeded 1000 tokens and was cut short.

chatml:

 -- Loading model...
 -- Loading tokenizer...


User: Hey

Hello! How can you help?

User: Write me a python code that tells the time



 python
   import datetime
   print(datetime.now().strftime('%Y.%m.%d %HHO.%M')) ```

   The Python function strftime() takes no arguments and returns a string in UTC format. To get the current time from your Python
    program, use the following code:


 python
import datetime
print(datetime.now().strftime('%Y.%m.%d %HHO.Mi')


   This will output '04.05.01 12PM'.


User:

I really don't know why though

(the code formatting issues are because I did not use the fix: #81 )

@SinanAkkoyun SinanAkkoyun reopened this Oct 4, 2023
@SinanAkkoyun
Copy link
Contributor Author

ChatML is also not perfect

 -- Loading tokenizer...


User: Can you help me

Sure!  What would you like to say?

User: explain what a vector field is

A vector field is a map that maps sets of sets into itself. A vector field is a function that assigns a set of properties (such as colors) to each point in the points of a given set. A vector field is a mapping from sets of sets or points to another set of properties or colors. A vector field is a mapping that is continuous and bounded.

User: Thanks

Okay, I understand the concept, can you explain it in simple terms?

User:

But I can't tell if that's the 3.0bpw's fault, the model's fault or the prompt templates fault

@turboderp
Copy link
Owner

Yeah, I'm reworking the chat script a bit, separating out the prompt formats to another module to make it easier to expand on. But I'm also struggling with TinyLlama. It should be ChatML format, and it has the extra tokens defined, so prompting it exactly like MistralOrca should work. But it's just outputting nonsense. I'll keep fighting with it, though.

@turboderp
Copy link
Owner

Actually I think I figured it out. It works if you disregard the extra tokens and just encode the <|im_start|> and <|im_end|> tags as text. Hm! Which suggests the model has been incorrectly finetuned.

@SinanAkkoyun
Copy link
Contributor Author

Actually I think I figured it out. It works if you disregard the extra tokens and just encode the <|im_start|> and <|im_end|> tags as text. Hm! Which suggests the model has been incorrectly finetuned.

Ahh okay 😊 When the TinyLlama model will be done with pretraining, I suppose it will eventually be finetuned correctly

@SinanAkkoyun SinanAkkoyun deleted the chat-format branch October 4, 2023 22:19
@SinanAkkoyun
Copy link
Contributor Author

SinanAkkoyun commented Oct 5, 2023

I tested this model https://huggingface.co/acalatrava/TinyLlama-1.1B-orca-gpt4 at 3.0bpw and it seems to work fine with the chatml format. I however noticed that with chatml/tinyllama format it was able to output <|im_start|>, I think it would be good to include all special delimiters in the stop conditions

 -- Model: /home/ubuntu/ml/llm/models/TinyLlama-1.1B-orca-gpt4/3.0bpw/
 -- Options: ['rope_scale 1.0', 'rope_alpha 1.0']
 -- Loading model...
 -- Loading tokenizer...
 -- Prompt format: tinyllama
 -- System prompt:
You are Chatbort, a large language model. Answer as concisely as possible.

User: What is 5*3

Five thousand three hundred (500) * 3.

<|im_start|>assistant
5000500 = 5000 * 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants