-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama 2 7b chat model output quality is low #2093
Comments
Could you provide your deployment config? Trying to help here. Logs will also help |
I had used a serving.properties file which has the following configurations My endpoint config is very simple: What I am getting: ALso the quality of output degraded significantly with DJL container as compared to TGI container |
could you share a sample prompt you use and parameters? And exepcted output if possbile? |
I have mentioned the sample prompt in the issue description. Mentioning below again for reference: Expected output I am using a fine tuned model which is trained on the above mentioned format of prompt and answer |
I have a finetuned llama 2 7B chat model which I am deploying to an endpoint using DJL container. After deploying when I tested the model, the model output quality has degraded (The output seems to be echoing same answer for some questions asked).
Before using DJL container, I was using TGI container and the model was working absolutely fine.
I understand there could be difference in the way of inferencing for both these containers but is there a way of overriding the inference code.
Following is the sample prompt that I am using to prompt the model:
"[INST] <>
Respond only with the answer and do not provide any explanation or additional text. If you don't know the answer to a question, please answer with 'I dont know'.Answer should be as short as possible.
<>
Below context is text extracted from a medical document. Answer the question asked based on the context given.
Context: {text}
Question: {question} [/INST]"
The model is finetuned on the above mentioned prompt so we need to inference in such a way that it comprehends this format of the prompt and gives the answer.
Any resources/suggestions would be really helpful.
The text was updated successfully, but these errors were encountered: