Implement the Phi 3 vision model #351

EricLBuehler · 2024-05-27T14:20:54Z

This PR implements microsoft/Phi-3-vision-128k-instruct.

🚨 This PR is going to merge soon! 🚨

Ironing out bugs status:

Incorrect n image toks
Incorrect tokenization (probably due to the extra space?)
It generates readable text!
Sometimes the generated text's meaning doesn't match the image (probably a simple bug)
Erroneous panic on https://upload.wikimedia.org/wikipedia/commons/e/e7/Everest_North_Face_toward_Base_Camp_Tibet_Luca_Galuzzi_2006.jpg

Finalization status:

Dev Status:

…rward

* Intial work on phi3v * Add the image embedding layer * Lints * Implement the loader * Add infrastructure for phi3 image processor * Merge * Merge * Merge * Merge * Partially implement padding * Implement the hd transform step * Work on the image processor * Clippy * Complete the phi3v inputs processor * Rename * Merge * Merge * Rename to phi3v and fix deser * Fix varbuilder * Fix varbuilder * Default for do convert rgb * Some defaults * Allow no processor config * Setup debug flag * Add phi3v * Implement messages flattening * Update * Rewrite the pad, hd transform * Clippy * Detect num channels * Fix reshape * Fix global image channel dim * Fix assert * Fix dtype * Fix gt * Fix image id neg * Fix dim0 of pixel values * Fix dtype * Check if model supports gemm * Fix some shape errors * Fix some shape errors * Fix rank of slice_assign * Fix image toks * Properly downcase * Fix response * Fix response * Allow no images in prompt * Output correct hidden state * Fix nonzero and add test * Fix n image toks * Add mistralrs_vision * Typo * Fix and add tests * Fix indexing * Fix test condition * Fix unsqueeze * Fix dtype for norm * Update clip * Clippy * Run clip in f32 * Run in bf16 * Run in bf16 again * Fix dtype * Set toks to have correct context lens * Set toks to have correct context lens * Support multiple GGUF files (#379) * Move to gguf module * Add content abstraction for multiple gguf files * Fix test * Allow specifying and loading multiple gguf files * Update docs and examples * Print some info * Merge * Organize normal loading metadata (#381) * Organize normal loading metadata * Fix * Bump version 0.1.13 -> 0.1.14 (#382) * Patch incorrect unwrap and bump version (#383) * Patch incorrect unwrap * Bump version to 0.1.15 * More verbose logging during loading (#385) * More verbose logging when loading * More logging * Refactor enabling debug logging (#387) * Refactor enabling debug logging * Fix reversed order * Merge * Merge * Merge * Use precise gelu * Use correct kernel * Debugging commit * Add fused bias linear * Finish merge * Use fused layer in clip * Save progress * Remove debugs * Update example * Resize exact * Update interpolate * Fix batch dim * Update test and transform * It works * Add some examples * Allow more than one image * Add support in python api * Add to toml selector * Update python api * Overhaul readme and docs * Update * Export vision arch * Export vision arch * Export vision arch * Fix max img dim * Fix unwrap

EricLBuehler added 30 commits May 14, 2024 15:46

Begin works on idefics

bf003ed

Begin works on idefics

410de48

Implement the vision transformer part

95d6394

Merge branch 'master' into idefics2

4543f40

Add the connector model

c7f8791

Add config

83575dc

Merge branch 'master' into idefics2

ad69fc5

Merge

7ff2b01

Merge branch 'master' into idefics2

69e4859

Merge branch 'master' into idefics2

fab36bc

Merge branch 'master' into idefics2

0ce1152

Merge

345982c

More progress

b1b7bf8

Merge branch 'master' into idefics2

5797660

Implement the bucketize functions

31e9e9e

Complete the bucketize, unfold functions and finish idefic2 global fo…

6d5af54

…rward

Merge branch 'master' into idefics2

e543ba2

Merge branch 'master' into idefics2

477e319

Mask

8eb9251

Clippy

3e77b03

Add framework for image pre processors

0580be3

Implement utility functions for image preprocessor

32f470a

Implement some functions for image processor

c4bc747

Merge branch 'master' into idefics2

7582a81

Clippy

9e76a4f

Merge branch 'master' into idefics2

33f3d0a

Calculate pixel values

e7b5fd6

Pass and integrate pixel attention mask

34994bc

Add vision pipeline and major refactor

5932aea

Add model category state

e8efbe8

EricLBuehler added 13 commits June 6, 2024 20:20

Add fused bias linear

40963c9

Merge branch 'master' into phi3_vision

1f2bf87

Finish merge

428b36f

Use fused layer in clip

1a89341

Save progress

3ccd3e6

Remove debugs

1327893

Update example

2b8cb17

Resize exact

e7dff6c

Update interpolate

3b6cbbc

Fix batch dim

298e56e

Update test and transform

14e3f2f

It works

ced3cab

Add some examples

cbccb41

EricLBuehler force-pushed the phi3_vision branch from 7124056 to cbccb41 Compare June 7, 2024 14:56

EricLBuehler added 12 commits June 7, 2024 11:06

Merge branch 'master' into phi3_vision

6827df2

Allow more than one image

21443aa

Add support in python api

1aba518

Add to toml selector

cdd71ce

Update python api

ee69dfd

Overhaul readme and docs

d7a7c3c

Update

77885e6

Export vision arch

af5b83d

Export vision arch

34a21c4

Export vision arch

f70370c

Fix max img dim

d65e884

Fix unwrap

600ef37

EricLBuehler merged commit 5a7ebb7 into master Jun 7, 2024
11 checks passed

EricLBuehler deleted the phi3_vision branch June 7, 2024 19:49

polarathene mentioned this pull request Jun 7, 2024

Refactor: GGUF metadata tokenizer #389

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the Phi 3 vision model #351

Implement the Phi 3 vision model #351

EricLBuehler commented May 27, 2024 •

edited

Loading

Implement the Phi 3 vision model #351

Implement the Phi 3 vision model #351

Conversation

EricLBuehler commented May 27, 2024 • edited Loading

EricLBuehler commented May 27, 2024 •

edited

Loading