-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qwen - supervised finetuning script and guide for SFT. #660
Comments
Related issues#324: bigcode/tiny_starcoder_py · Hugging Face### DetailsSimilarity score: 0.9 > **Note:** > > [bigcode/tiny_starcoder_py · Hugging Face](https://huggingface.co/bigcode/tiny_starcoder_py) > > TinyStarCoderPy > > This is a 164M parameters model with the same architecture as StarCoder (8k context length, MQA & FIM). It was trained on the Python data from StarCoderData for ~6 epochs which amounts to 100B tokens. > > Use > > Intended use > > The model was trained on GitHub code, to assist with some tasks like Assisted Generation. For pure code completion, we advise using our 15B models StarCoder or StarCoderBase. > > Generation > > ```python > # pip install -q transformers > from transformers import AutoModelForCausalLM, AutoTokenizer > > checkpoint = "bigcode/tiny_starcoder_py" > device = "cuda" # for GPU usage or "cpu" for CPU usage > > tokenizer = AutoTokenizer.from_pretrained(checkpoint) > model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device) > > inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device) > outputs = model.generate(inputs) > print(tokenizer.decode(outputs[0])) > ``` > > Fill-in-the-middle > > Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output: > > ```python > input_text = "def print_one_two_three():\n print('one')\n \n print('three')" > inputs = tokenizer.encode(input_text, return_tensors="pt").to(device) > outputs = model.generate(inputs) > print(tokenizer.decode(outputs[0])) > ``` > > Training > > Model > > - Architecture: GPT-2 model with multi-query attention and Fill-in-the-Middle objective > - Pretraining steps: 50k > - Pretraining tokens: 100 billion > - Precision: bfloat16 > > Hardware > > - GPUs: 32 Tesla A100 > - Training time: 18 hours > > Software > > - Orchestration: Megatron-LM > - Neural networks: PyTorch > - BP16 if applicable: apex > > License > > The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement [here](https://huggingface.co/bigcode/tiny_starcoder_py/blob/main/LICENSE). > > #### Suggested labels > > - { "key": "llm-pretraining", "value": "Information related to the pretraining process of Large Language Models" }#167: Model training code from NousResearch/StripedHyenaTrainer### DetailsSimilarity score: 0.89 - [ ] [NousResearch/StripedHyenaTrainer](https://github.com/NousResearch/StripedHyenaTrainer)
First, tokenize your data python tokenization.py accelerate launch finetune.py #389: AWQ Quantization support - New generic converter for all HF llama-like models - Tutorials - OpenNMT### DetailsSimilarity score: 0.89 - [ ] [AWQ Quantization support - New generic converter for all HF llama-like models - Tutorials - OpenNMT](https://forum.opennmt.net/t/awq-quantization-support-new-generic-converter-for-all-hf-llama-like-models/5569)Quantization and Acceleration We have added support for already quantized models, and revamped the converter for all llama-like models, whether they are quantized or not. Here's an example of the syntax: python tools/convert_HF_llamalike.py --model_dir "TheBloke/Nous-Hermes-Llama2-AWQ" --output "/dataAI/llama2-7B/Hermes/Nous-Hermes-onmt.pt" --format safetensors
For llama-like models, we download the After converting, you will need a config file to run transforms: [sentencepiece]
#### Subword
src_subword_model: "/dataAI/llama2-7B/Hermes/tokenizer.model"
tgt_subword_model: "/dataAI/llama2-7B/Hermes/tokenizer.model"
# Model info
model: "/dataAI/llama2-7B/Hermes/Nous-Hermes-onmt.pt"
# Inference
# ... When considering your priority:
Please read more here: GitHub - casper-hansen/AutoAWQ Important Note:
Offline Quantizer Script:
Enjoy! VS: Fast Inference with vLLM Recently, Mistral reported 100 tokens/second for Mistral-7B at batch size 1 and 1250 tokens/sec for a batch of 60 prompts using vLLM. When using Mistral-instruct-v0.2-onmt-awq, the performance was as follows:
This was with a GEMM model. To make a fair comparison, adjust the throughput for the step0 (prompt prefill) time. Suggested labels{ "key": "llm-quantization", "value": "Discussions and tools for handling quantized large language models" }#431: awq llama quantization### DetailsSimilarity score: 0.89 - [ ] [awq llama quantization](huggingface.co)Quantization and AccelerationWe have added support for already quantized models, and revamped the converter for all llama-like models, whether they are quantized or not. Model ConversionHere's an example of the syntax for converting a model: tools/convert_HF_llamalike.py --model_dir "TheBloke/Nous-Hermes-Llama2-AWQ" --output "/dataAI/llama2-7B/Hermes/Nous-Hermes-onmt.pt" --format safetensors
For llama-like models, we download the tokenizer.model and generate a vocab file during the process. If the model is a AWQ quantized model, we will convert it to an OpenNMT-py AWQ quantized model. Config FileAfter converting, you will need a config file to run transforms: [sentencepiece]
Subword:
src_subword_model: "/dataAI/llama2-7B/Hermes/tokenizer.model"
tgt_subword_model: "/dataAI/llama2-7B/Hermes/tokenizer.model"
Model info:
model: "/dataAI/llama2-7B/Hermes/Nous-Hermes-onmt.pt"
Inference:
# ... PriorityWhen considering your priority:
Important Note
Offline Quantizer ScriptWe will provide an offline quantizer script for OpenNMT-py generic models. However, for small NMT models, AWQ may make things slower, so it might not be relevant for NMT. vLLM PerformanceRecently, Mistral reported 100 tokens/second for Mistral-7B at batch size 1 and 1250 tokens/sec for a batch of 60 prompts using vLLM. When using Mistral-instruct-v0.2-onmt-awq, the performance was as follows:
Suggested labelsnull#383: deepseek-ai/deepseek-coder-5.7bmqa-base · Hugging Face### DetailsSimilarity score: 0.89 - [ ] [deepseek-ai/deepseek-coder-5.7bmqa-base · Hugging Face](https://huggingface.co/deepseek-ai/deepseek-coder-5.7bmqa-base)Deepseek Coder IntroductionDeepseek Coder is a series of code language models, each trained from scratch on 2T tokens with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on a project-level code corpus with a window size of 16K and an extra fill-in-the-blank task, supporting project-level code completion and infilling. Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks. Key Features
Model Summary
How to UseThis section provides examples of how to use the Deepseek Coder model for code completion, code insertion, and repository-level code completion tasks. Code Completionfrom transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True).cuda()
input_text = "#write a quick sort algorithm"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) Code Insertionfrom transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True).cuda()
input_text = """<|begin|>def quick_sort(arr):
if len(arr) <= 1:
return arr
pivot = arr[0]
left = []
right = []
<|hole|>
if arr[i] < pivot:
left.append(arr[i])
else:
right.append(arr[i])
return quick_sort(left) + [pivot] + quick_sort(right)<|end|>"""
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):]) Repository Level Code Completionfrom transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True).cuda()
input_text = """#utils.py
import torch
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
def load_data():
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Standardize the data
scaler = StandardScaler()
X = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Convert numpy data to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.int64)
y_test = torch.tensor(y_test, dtype=torch.int64)
return X_train, X_test, y_train, y_test
def evaluate_predictions(y_test, y_pred):
return accuracy_score(y_test, y_pred)
#model.py
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
class IrisClassifier(nn.Module):
def __init__(self):
super(IrisClassifier, self).__init__()
self.fc = nn.Sequential(
nn.Linear(4, 16),
nn.ReLU(),
nn.Linear(16, 3)
)
def forward(self, x):
return self.fc(x)
def train_model(self, X_train, y_train, epochs, lr, batch_size):
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(self.parameters(), lr=lr)
# Create DataLoader for batches
dataset = TensorDataset(X_train, y_train)
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
for epoch in range(epochs):
for batch_X, batch_y in dataloader:
optimizer.zero_grad()
outputs = self(batch_X)
loss = criterion(outputs, batch_y)
loss.backward()
optimizer.step()
def predict(self, X_test):
with torch.no_grad():
outputs = self(X_test)
_, predicted = outputs.max(1)
return predicted.numpy()
#main.py
from utils import load_data, evaluate_predictions
from model import IrisClassifier as Classifier
def main():
# Model training and evaluation
"""
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=140)
print(tokenizer.decode(outputs[0])) LicenseThis code repository is licensed under the MIT License. The use of Deepseek Coder models is subject to the Model License. DeepSeek Coder supports commercial use. See the LICENSE-MODEL for more details. ContactIf you have any questions, please raise an issue or contact us at [email protected]. Suggested labels{ "key": "llm-experiments", "value": "Experiments and results related to Large Language Models" } { "key": "AI-Chatbots", "value": "Topics related to advanced chatbot platforms integrating multiple AI models" }#309: openai/human-eval: Code for the paper "Evaluating Large Language Models Trained on Code"### DetailsSimilarity score: 0.88 - [ ] [openai/human-eval: Code for the paper "Evaluating Large Language Models Trained on Code"](https://github.com/openai/human-eval)HumanEval: Hand-Written Evaluation Set This is an evaluation harness for the HumanEval problem solving dataset described in the paper "Evaluating Large Language Models Trained on Code". Installation Make sure to use python 3.7 or later: $ conda create -n codex python=3.7 $ git clone https://github.com/openai/human-eval This program exists to run untrusted model-generated code. Users are strongly encouraged not to do so outside of a robust security sandbox. The execution call in execution.py is deliberately commented out to ensure users read this disclaimer before running code in a potentially unsafe manner. See the comment in execution.py for more information and instructions. After following the above instructions to enable execution, generate samples and save them in the following JSON Lines (jsonl) format, where each sample is formatted into a single line like so: {"task_id": "Corresponding HumanEval task ID", "completion": "Completion only without the prompt"} Here is nearly functional example code (you just have to provide generate_one_completion to make it work) that saves generated completions to samples.jsonl. from human_eval.data import write_jsonl, read_problems problems = read_problems() num_samples_per_task = 200 $ evaluate_functional_correctness samples.jsonl As a quick sanity-check, the example samples should yield 0.5 pass@1. $ evaluate_functional_correctness data/example_samples.jsonl --problem_file=data/example_problem.jsonl $ evaluate_functional_correctness --help Known Issues While evaluation uses very little memory, you might see the following error message when the system is running out of RAM. Since this may cause some correct programs to fail, we recommend that you free some memory and try again. malloc: can't allocate region Please cite using the following bibtex entry: @Article{chen2021codex, Suggested labels{ "key": "llm-evaluation", "value": "Evaluating Large Language Models performance and behavior through human-written evaluation sets" } |
Example - Qwen
DESCRIPTION:
Here we provide a very simple script for supervised finetuning, which is revised from the training script in
Fastchat
. The script is used to finetune Qwen with Hugging Face Trainer. You can check the script here. This script for supervised finetuning (SFT) has the following features:In the following, we introduce more details about the usage of the script.
Installation
Before you start, make sure you have installed the following packages:
Data Preparation
For data preparation, we advise you to organize the data in a jsonl file, where each line is a dictionary as demonstrated below:
Above are two examples of each data sample in the dataset. Each sample is a JSON object with the following fields: type, messages, and source. messages is required while the others are optional for you to label your data format and data source. The messages field is a list of JSON objects, each of which has two fields: role and content. role can be system, user, or assistant. content is the text of the message. source is the source of the data, which can be self-made, alpaca, open-hermes, or any other string.
To make the jsonl file, you can use json to save a list of dictionaries to the jsonl file:
Quickstart
For you to start finetuning quickly, we directly provide a shell script for you to run without paying attention to details. You need different hyperparameters for different types of training, e.g., single-GPU / multi-GPU training, full-parameter tuning, LoRA, or Q-LoRA.
Specify the
<model_path>
for your model,<data_path>
for your data, and<config_path>
for your deepspeed configuration. If you use LoRA or Q-LoRA, just add--use_lora True
or--q_lora True
based on your requirements. This is the simplest way to start finetuning. If you want to change more hyperparameters, you can dive into the script and modify those parameters.Advanced Usages
In this section, we introduce the details of the scripts, including the core python script as well as the corresponding shell script.
Shell Script
Before we introduce the python code, we provide a brief introduction to the shell script with commands. We provide some guidance inside the shell script and here we take
finetune.sh
as an example.To set up the environment variables for distributed training (or single-GPU training), specify the following variables:
GPUS_PER_NODE
,NNODES
,NODE_RANK
,MASTER_ADDR
, andMASTER_PORT
. No need to worry too much about them as we provide the default settings for you. In the command, you can pass in the argument-m
and-d
to specify the model path and data path, respectively. You can also pass in the argument--deepspeed
to specify the deepspeed configuration file. We provide two configuration files for ZeRO2 and ZeRO3, and you can choose one based on your requirements. In most cases, we recommend using ZeRO3 for multi-GPU training except for Q-LoRA, where we recommend using ZeRO2.There are a series of hyperparameters to tune. Passing in
--bf16
or--fp16
to specify the precision for mixed precision training. The other significant hyperparameters include:--output_dir
: the path of your output models or adapters.--num_train_epochs
: the number of training epochs.--gradient_accumulation_steps
: the number of gradient accumulation steps.--per_device_train_batch_size
: the batch size per GPU for training, and the total batch size is equal toper_device_train_batch_size * number_of_gpus * gradient_accumulation_steps
.--learning_rate
: the learning rate.--warmup_steps
: the number of warmup steps.--lr_scheduler_type
: the type of learning rate scheduler.--weight_decay
: the value of weight decay.--adam_beta2
: the value of in Adam.--model_max_length
: the maximum sequence length.--use_lora
: whether to use LoRA. Adding--q_lora
can enable Q-LoRA.--gradient_checkpointing
: whether to use gradient checkpointing.URL: https://qwen.readthedocs.io/en/latest/training/SFT/example.html
Suggested labels
The text was updated successfully, but these errors were encountered: