Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of causal models for generation #82

Open
dipankarsrirag opened this issue Jun 26, 2024 · 3 comments
Open

Use of causal models for generation #82

dipankarsrirag opened this issue Jun 26, 2024 · 3 comments

Comments

@dipankarsrirag
Copy link

dipankarsrirag commented Jun 26, 2024

This is an amazing work. I have been working on something that would require me to evaluate the generated outputs of models like Mistral, using a prompt like:
"Fill the [MASK] token in the sentence. Generate a single output."

Now earlier, I would simply instruction fine-tune a Mistral Model. But I would like to explore the possibility of using these models with a bi-directional attention.

I see that the library allows me to access the backbone model underneath. But it is not clear to me if this model has the bi-directional attention. Can you please clarify this? If it does, I could simply use the backbone.generate() function for my purpose.

Thanks in advance!

@SeanLee97
Copy link
Owner

Hi @dipankarsrirag, thanks for your kind words. AnglE supports bi-directional LLMs.

If you want to train AnglE embedding with bi-directional LLMs, you can refer to this documentation, in Examples/b.LLM-based

If you just want to test the prompt with biLLM, you can directly use our BiLLM toolkit: https://github.com/WhereIsAI/BiLLM. It is compatible with huggingface transformers.

@dipankarsrirag
Copy link
Author

Hi @SeanLee97, thanks for the quick reply. I have been working with AnglE for the past few hours now. Just need a clarification:

  1. When I initialise a bidirectional LLM with AnglE like this:
    angle = AnglE.from_pretrained( 'mistralai/Mistral-7b-Instruct-v0.2', is_llm=True, apply_billm=True, billm_model_class = "MistralForCausalLM", load_kbit=4, torch_dtype=torch.bfloat16, pooling_strategy="last", trust_remote_code=True )
    Would the model returned by model = angle.backbone, have its attentions changed to bidirectional?

  2. I have a mask filling task with each input being a <masked_sentence, target_word>, which according to the documentation is in Prompts.C format. But when I use the angle.fit() method for finetuning, I get an error saying that only Prompts.A format is supported. This made me use the SFTTrainer with model. Is this correct. If not, how would I do it otherwise?

@SeanLee97
Copy link
Owner

SeanLee97 commented Jun 27, 2024

hi @dipankarsrirag, here are the answers to the questions:

  1. Yes. when you set is_llm=True and apply_billm=True, the backbone will be bi-directional.

  2. The Prompts setting only works for the inference phase. If you use angle-trainer and want to apply a prompt for all text columns in the training stage, you can specify the prompt via --prompt_template "Here is your custom prompt {text}". If you use custom code, you can assign a prompt to prompt_template in AngleDataTokenizer, see this documentation. Other situations, for example, just apply a prompt to a specific text column, please set a prompt to it manually, i.e., do it in the preprocessing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants