You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I try to reproduce the llave v1.5 on llama3. on pretraining stage, I find the preprocess func is using the preprocess_v1, not the plain. But following the official training script in v1.5 pretrain.sh, the --version is set the plain.
I tried to debug the code, found that
if model_args.version == "v0":
if tokenizer.pad_token is None:
smart_tokenizer_and_embedding_resize(
special_tokens_dict=dict(pad_token="[PAD]"),
tokenizer=tokenizer,
model=model,
)
elif model_args.version == "v0.5":
tokenizer.pad_token = tokenizer.unk_token
else:
# tokenizer.pad_token = tokenizer.unk_token
tokenizer.pad_token = tokenizer.eos_token
if model_args.version in conversation_lib.conv_templates:
print("a")
conversation_lib.default_conversation = conversation_lib.conv_templates[model_args.version]
else:
conversation_lib.default_conversation = conversation_lib.conv_templates["vicuna_v1"]
the code block is setting the default_conversation to plain, but when trainer.train() start,
def __getitem__(self, i) -> Dict[str, torch.Tensor]:
sources = self.list_data_dict[i]
if isinstance(i, int):
sources = [sources]
assert len(sources) == 1, "Don't know why it is wrapped to a list" # FIXME
if 'image' in sources[0]:
image_file = self.list_data_dict[i]['image']
image_folder = self.data_args.image_folder
processor = self.data_args.image_processor
image = Image.open(os.path.join(image_folder, image_file)).convert('RGB')
if self.data_args.image_aspect_ratio == 'pad':
def expand2square(pil_img, background_color):
width, height = pil_img.size
if width == height:
return pil_img
elif width > height:
result = Image.new(pil_img.mode, (width, width), background_color)
result.paste(pil_img, (0, (width - height) // 2))
return result
else:
result = Image.new(pil_img.mode, (height, height), background_color)
result.paste(pil_img, ((height - width) // 2, 0))
return result
image = expand2square(image, tuple(int(x * 255) for x in processor.image_mean))
image = processor.preprocess(image, return_tensors='pt')['pixel_values'][0]
else:
image = processor.preprocess(image, return_tensors='pt')['pixel_values'][0]
sources = preprocess_multimodal(
copy.deepcopy([e["conversations"] for e in sources]),
self.data_args)
else:
sources = copy.deepcopy([e["conversations"] for e in sources])
data_dict = preprocess(
sources,
self.tokenizer,
has_image=('image' in self.list_data_dict[i]))
if isinstance(i, int):
data_dict = dict(input_ids=data_dict["input_ids"][0],
labels=data_dict["labels"][0])
# image exist in the data
if 'image' in self.list_data_dict[i]:
data_dict['image'] = image
elif self.data_args.is_multimodal:
# image does not exist in the data, but the model is multimodal
crop_size = self.data_args.image_processor.crop_size
data_dict['image'] = torch.zeros(3, crop_size['height'], crop_size['width'])
return data_dict
when code is running to the dataset getitem func, the conversation_lib.default_conversation is v1, so the preprocess is using the preprocess_v1.
Does someone encountered the same question?
Does the official llava is using preprocess_v1 in the pretraining stage?
Question
When I try to reproduce the llave v1.5 on llama3. on pretraining stage, I find the preprocess func is using the preprocess_v1, not the plain. But following the official training script in v1.5 pretrain.sh, the --version is set the plain.
I tried to debug the code, found that
the code block is setting the default_conversation to plain, but when trainer.train() start,
when code is running to the dataset getitem func, the conversation_lib.default_conversation is v1, so the preprocess is using the preprocess_v1.
Does someone encountered the same question?
Does the official llava is using preprocess_v1 in the pretraining stage?
Blow is my training script:
The text was updated successfully, but these errors were encountered: