Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the Phi 3 vision model #351

Merged
merged 184 commits into from
Jun 7, 2024
Merged

Implement the Phi 3 vision model #351

merged 184 commits into from
Jun 7, 2024

Conversation

EricLBuehler
Copy link
Owner

@EricLBuehler EricLBuehler commented May 27, 2024

This PR implements microsoft/Phi-3-vision-128k-instruct.

🚨 This PR is going to merge soon! 🚨

Ironing out bugs status:

Finalization status:

  • Remove some asserts, replace with error message
  • Rust example
  • Toml and normal loaders
  • Python API
  • Examples and cookbooks

Dev Status:

  • CLIP
    • MLP
    • Embeddings
    • Encoder
  • Phi 3 model
  • Phi3ImageProcessor
    • Normalize
    • Convert to RGB
    • Reshape to (336,336) with bicubic
    • HD transform on image
    • Pad to max_num_crops
    • Reshape HD images
    • Calculate and pass image sizes
    • Pad to max num crops
    • Create pixel values
  • Phi3Loader
  • Phi3Processor
    • Refactor to allow Processor to have access to the prepared image inputs

@EricLBuehler EricLBuehler merged commit 5a7ebb7 into master Jun 7, 2024
11 checks passed
@EricLBuehler EricLBuehler deleted the phi3_vision branch June 7, 2024 19:49
EricLBuehler added a commit that referenced this pull request Jun 8, 2024
* Intial work on phi3v

* Add the image embedding layer

* Lints

* Implement the loader

* Add infrastructure for phi3 image processor

* Merge

* Merge

* Merge

* Merge

* Partially implement padding

* Implement the hd transform step

* Work on the image processor

* Clippy

* Complete the phi3v inputs processor

* Rename

* Merge

* Merge

* Rename to phi3v and fix deser

* Fix varbuilder

* Fix varbuilder

* Default for do convert rgb

* Some defaults

* Allow no processor config

* Setup debug flag

* Add phi3v

* Implement messages flattening

* Update

* Rewrite the pad, hd transform

* Clippy

* Detect num channels

* Fix reshape

* Fix global image channel dim

* Fix assert

* Fix dtype

* Fix gt

* Fix image id neg

* Fix dim0 of pixel values

* Fix dtype

* Check if model supports gemm

* Fix some shape errors

* Fix some shape errors

* Fix rank of slice_assign

* Fix image toks

* Properly downcase

* Fix response

* Fix response

* Allow no images in prompt

* Output correct hidden state

* Fix nonzero and add test

* Fix n image toks

* Add mistralrs_vision

* Typo

* Fix and add tests

* Fix indexing

* Fix test condition

* Fix unsqueeze

* Fix dtype for norm

* Update clip

* Clippy

* Run clip in f32

* Run in bf16

* Run in bf16 again

* Fix dtype

* Set toks to have correct context lens

* Set toks to have correct context lens

* Support multiple GGUF files (#379)

* Move to gguf module

* Add content abstraction for multiple gguf files

* Fix test

* Allow specifying and loading multiple gguf files

* Update docs and examples

* Print some info

* Merge

* Organize normal loading metadata (#381)

* Organize normal loading metadata

* Fix

* Bump version 0.1.13 -> 0.1.14 (#382)

* Patch incorrect unwrap and bump version (#383)

* Patch incorrect unwrap

* Bump version to 0.1.15

* More verbose logging during loading (#385)

* More verbose logging when loading

* More logging

* Refactor enabling debug logging (#387)

* Refactor enabling debug logging

* Fix reversed order

* Merge

* Merge

* Merge

* Use precise gelu

* Use correct kernel

* Debugging commit

* Add fused bias linear

* Finish merge

* Use fused layer in clip

* Save progress

* Remove debugs

* Update example

* Resize exact

* Update interpolate

* Fix batch dim

* Update test and transform

* It works

* Add some examples

* Allow more than one image

* Add support in python api

* Add to toml selector

* Update python api

* Overhaul readme and docs

* Update

* Export vision arch

* Export vision arch

* Export vision arch

* Fix max img dim

* Fix unwrap
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
models Additions to model or architectures new feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant