-
-
Notifications
You must be signed in to change notification settings - Fork 926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(vlm): handle legacy conversation data format and check image in data #2018
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
NanoCode012
force-pushed
the
fix/mm_chat_template
branch
from
November 11, 2024 14:53
44ed2c1
to
69db32b
Compare
would love to have a unit test for this in place. |
winglian
reviewed
Nov 19, 2024
winglian
reviewed
Nov 19, 2024
winglian
reviewed
Nov 19, 2024
winglian
reviewed
Nov 19, 2024
NanoCode012
changed the title
fix: handle legacy conversation data format and check image in data
fix(vlm): handle legacy conversation data format and check image in data
Nov 23, 2024
winglian
reviewed
Nov 25, 2024
winglian
approved these changes
Nov 30, 2024
bursteratom
pushed a commit
that referenced
this pull request
Dec 4, 2024
…ata (#2018) [skip ci] * fix: handle legacy conversation data format and check image in data * feat: add test for llama vision * feat: add max_steps to test * fix: incorrect indent and return preprocess * feat: use smaller model and dataset * chore: add extra config for sharegpt dataset
bursteratom
pushed a commit
that referenced
this pull request
Dec 4, 2024
…ata (#2018) [skip ci] * fix: handle legacy conversation data format and check image in data * feat: add test for llama vision * feat: add max_steps to test * fix: incorrect indent and return preprocess * feat: use smaller model and dataset * chore: add extra config for sharegpt dataset
3 tasks
djsaunde
pushed a commit
that referenced
this pull request
Dec 16, 2024
…ata (#2018) [skip ci] * fix: handle legacy conversation data format and check image in data * feat: add test for llama vision * feat: add max_steps to test * fix: incorrect indent and return preprocess * feat: use smaller model and dataset * chore: add extra config for sharegpt dataset
djsaunde
pushed a commit
that referenced
this pull request
Dec 17, 2024
…ata (#2018) [skip ci] * fix: handle legacy conversation data format and check image in data * feat: add test for llama vision * feat: add max_steps to test * fix: incorrect indent and return preprocess * feat: use smaller model and dataset * chore: add extra config for sharegpt dataset
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The current vision data processing expects in the OAI format and with images. This PR allows passing text-only dataset to training vision models and converts the old sharegpt-like format datasets.
The caveat is that, it still does not allow mixing text-only and text+image data points per batch yet. We would need to patch upstream for this.
Motivation and Context
Could not pass normal sharegpt dataset.
How has this been tested?
Not yet after refactor.
Screenshots (if appropriate)
Types of changes
Social Handles (Optional)
@hjc-puro