Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the Phi 3 vision model #351

Merged
merged 184 commits into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
184 commits
Select commit Hold shift + click to select a range
bf003ed
Begin works on idefics
EricLBuehler May 14, 2024
410de48
Begin works on idefics
EricLBuehler May 14, 2024
95d6394
Implement the vision transformer part
EricLBuehler May 15, 2024
4543f40
Merge branch 'master' into idefics2
EricLBuehler May 15, 2024
c7f8791
Add the connector model
EricLBuehler May 15, 2024
83575dc
Add config
EricLBuehler May 15, 2024
ad69fc5
Merge branch 'master' into idefics2
EricLBuehler May 15, 2024
7ff2b01
Merge
EricLBuehler May 15, 2024
69e4859
Merge branch 'master' into idefics2
EricLBuehler May 15, 2024
fab36bc
Merge branch 'master' into idefics2
EricLBuehler May 15, 2024
0ce1152
Merge branch 'master' into idefics2
EricLBuehler May 16, 2024
345982c
Merge
EricLBuehler May 16, 2024
b1b7bf8
More progress
EricLBuehler May 16, 2024
5797660
Merge branch 'master' into idefics2
EricLBuehler May 18, 2024
31e9e9e
Implement the bucketize functions
EricLBuehler May 20, 2024
6d5af54
Complete the bucketize, unfold functions and finish idefic2 global fo…
EricLBuehler May 21, 2024
e543ba2
Merge branch 'master' into idefics2
EricLBuehler May 21, 2024
477e319
Merge branch 'master' into idefics2
EricLBuehler May 21, 2024
8eb9251
Mask
EricLBuehler May 21, 2024
3e77b03
Clippy
EricLBuehler May 21, 2024
0580be3
Add framework for image pre processors
EricLBuehler May 21, 2024
32f470a
Implement utility functions for image preprocessor
EricLBuehler May 21, 2024
c4bc747
Implement some functions for image processor
EricLBuehler May 21, 2024
7582a81
Merge branch 'master' into idefics2
EricLBuehler May 22, 2024
9e76a4f
Clippy
EricLBuehler May 22, 2024
33f3d0a
Merge branch 'master' into idefics2
EricLBuehler May 22, 2024
e7b5fd6
Calculate pixel values
EricLBuehler May 22, 2024
34994bc
Pass and integrate pixel attention mask
EricLBuehler May 22, 2024
5932aea
Add vision pipeline and major refactor
EricLBuehler May 23, 2024
e8efbe8
Add model category state
EricLBuehler May 23, 2024
5eb7f7a
Remove some todos
EricLBuehler May 23, 2024
9585476
Merge branch 'master' into idefics2
EricLBuehler May 23, 2024
18beaca
Merge branch 'master' into idefics2
EricLBuehler May 23, 2024
4fe2fe6
Get rid of some todos
EricLBuehler May 23, 2024
c84e8fb
Refactor slightly
EricLBuehler May 23, 2024
60c6d41
Prepare inputs for vision model
EricLBuehler May 23, 2024
ab8c6de
Clippy
EricLBuehler May 23, 2024
eca3f16
Add better defaults for image processor
EricLBuehler May 23, 2024
fe24e08
Implement scheduling based on image dims
EricLBuehler May 23, 2024
b0d16f7
Implement scheduling based on image dims
EricLBuehler May 23, 2024
818b740
Better scheduling with pad_to
EricLBuehler May 23, 2024
b6c2747
Properly get images from request
EricLBuehler May 23, 2024
e3167a5
Implement for the http interface
EricLBuehler May 23, 2024
83ee92b
Fix
EricLBuehler May 23, 2024
cfe1b33
Implement preprocessor usage and load processor config
EricLBuehler May 24, 2024
30b5407
Allow handling of content messages
EricLBuehler May 24, 2024
f37a378
Add processor infrastructure
EricLBuehler May 24, 2024
0feb9b0
Load processor based on vision model kind
EricLBuehler May 24, 2024
4185ca2
Add a new test for vision chat templates
EricLBuehler May 24, 2024
ccb0f3c
Clippy
EricLBuehler May 24, 2024
a401b44
Add the vision plain model
EricLBuehler May 24, 2024
7c39e8e
A batch of fixes
EricLBuehler May 24, 2024
9ba563b
Remove arc get_mut usage for adding special tokens
EricLBuehler May 24, 2024
cc8c122
Add idefics2 example
EricLBuehler May 24, 2024
043e071
Fixes with http
EricLBuehler May 24, 2024
4a2bdde
Fixes
EricLBuehler May 24, 2024
a27797d
Calculate padding shapes properly
EricLBuehler May 25, 2024
747f7cd
Fix
EricLBuehler May 25, 2024
8989957
Fixes
EricLBuehler May 25, 2024
469da8c
Fix index select
EricLBuehler May 25, 2024
b10bfd4
Track
EricLBuehler May 25, 2024
2e09066
Merge branch 'master' into idefics2
EricLBuehler May 26, 2024
c09dd81
Fix vision attention mask
EricLBuehler May 26, 2024
b1eac47
Merge branch 'master' into idefics2
EricLBuehler May 26, 2024
fd8412e
Intial work on phi3v
EricLBuehler May 27, 2024
31e62d2
Add the image embedding layer
EricLBuehler May 27, 2024
c519e81
Lints
EricLBuehler May 27, 2024
de1a8d5
Implement the loader
EricLBuehler May 27, 2024
00d2fb5
Add infrastructure for phi3 image processor
EricLBuehler May 28, 2024
e0d9a5b
Merge branch 'master' into phi3_vision
EricLBuehler May 28, 2024
e83bcf1
Merge
EricLBuehler May 28, 2024
71aec32
Merge branch 'master' into phi3_vision
EricLBuehler May 28, 2024
0921b90
Merge
EricLBuehler May 28, 2024
737a9fc
Merge branch 'master' into phi3_vision
EricLBuehler May 28, 2024
7c4c1c0
Merge
EricLBuehler May 28, 2024
b2036bf
Merge
EricLBuehler May 29, 2024
a56c09a
Merge
EricLBuehler May 29, 2024
87bb4ae
Partially implement padding
EricLBuehler May 29, 2024
17589a8
Implement the hd transform step
EricLBuehler May 29, 2024
2b742f6
Merge branch 'master' into phi3_vision
EricLBuehler May 29, 2024
50f7830
Work on the image processor
EricLBuehler May 29, 2024
0960640
Clippy
EricLBuehler May 29, 2024
f550bee
Complete the phi3v inputs processor
EricLBuehler May 31, 2024
37a5b8f
Rename
EricLBuehler May 31, 2024
68e57b6
Merge branch 'master' into phi3_vision
EricLBuehler May 31, 2024
a8d30bd
Merge branch 'master' into phi3_vision
EricLBuehler May 31, 2024
124720a
Merge
EricLBuehler May 31, 2024
0ec9aec
Merge branch 'master' into phi3_vision
EricLBuehler May 31, 2024
5a27c94
Merge branch 'master' into phi3_vision
EricLBuehler May 31, 2024
971ffa9
Merge
EricLBuehler May 31, 2024
24092e0
Rename to phi3v and fix deser
EricLBuehler May 31, 2024
88a2df8
Fix varbuilder
EricLBuehler May 31, 2024
b126d28
Fix varbuilder
EricLBuehler May 31, 2024
6e1a6a8
Default for do convert rgb
EricLBuehler May 31, 2024
989eb32
Some defaults
EricLBuehler May 31, 2024
f42b527
Allow no processor config
EricLBuehler May 31, 2024
7a1b8ce
Setup debug flag
EricLBuehler May 31, 2024
c1d6b48
Add phi3v
EricLBuehler May 31, 2024
1135b46
Implement messages flattening
EricLBuehler May 31, 2024
402fa16
Update
EricLBuehler May 31, 2024
5634a3a
Rewrite the pad, hd transform
EricLBuehler Jun 1, 2024
95f7952
Clippy
EricLBuehler Jun 1, 2024
6889b6e
Detect num channels
EricLBuehler Jun 1, 2024
fc545d8
Fix reshape
EricLBuehler Jun 1, 2024
b153315
Fix global image channel dim
EricLBuehler Jun 1, 2024
34b4020
Fix assert
EricLBuehler Jun 1, 2024
46e4de4
Fix dtype
EricLBuehler Jun 1, 2024
c740f20
Fix gt
EricLBuehler Jun 1, 2024
a49ee63
Fix image id neg
EricLBuehler Jun 1, 2024
6da7303
Fix dim0 of pixel values
EricLBuehler Jun 1, 2024
c39bc50
Fix dtype
EricLBuehler Jun 1, 2024
5e66df0
Check if model supports gemm
EricLBuehler Jun 1, 2024
ff0cf0f
Fix some shape errors
EricLBuehler Jun 1, 2024
d2f955d
Fix some shape errors
EricLBuehler Jun 1, 2024
2dfa979
Fix rank of slice_assign
EricLBuehler Jun 1, 2024
5561db2
Fix image toks
EricLBuehler Jun 1, 2024
1aae49e
Properly downcase
EricLBuehler Jun 1, 2024
6ce1d28
Fix response
EricLBuehler Jun 1, 2024
98269bc
Fix response
EricLBuehler Jun 1, 2024
da3a7da
Allow no images in prompt
EricLBuehler Jun 1, 2024
b8b31c6
Output correct hidden state
EricLBuehler Jun 1, 2024
ca6390b
Fix nonzero and add test
EricLBuehler Jun 1, 2024
a37b32d
Fix n image toks
EricLBuehler Jun 2, 2024
54093a0
Merge branch 'master' into phi3_vision
EricLBuehler Jun 2, 2024
5392dbe
Add mistralrs_vision
EricLBuehler Jun 2, 2024
2872365
Typo
EricLBuehler Jun 2, 2024
4f26273
Fix and add tests
EricLBuehler Jun 2, 2024
25e0a9e
Fix indexing
EricLBuehler Jun 2, 2024
52ec34b
Fix test condition
EricLBuehler Jun 2, 2024
399c178
Fix unsqueeze
EricLBuehler Jun 2, 2024
20aa9a1
Fix dtype for norm
EricLBuehler Jun 2, 2024
40a38ab
Merge
EricLBuehler Jun 3, 2024
9ed7b6f
Update clip
EricLBuehler Jun 3, 2024
cf3693b
Clippy
EricLBuehler Jun 3, 2024
76323ec
Run clip in f32
EricLBuehler Jun 3, 2024
3d3cacd
Run in bf16
EricLBuehler Jun 3, 2024
e6ad82b
Merge branch 'master' into phi3_vision
EricLBuehler Jun 3, 2024
d550662
Run in bf16 again
EricLBuehler Jun 3, 2024
75fc861
Fix dtype
EricLBuehler Jun 3, 2024
3550300
Set toks to have correct context lens
EricLBuehler Jun 4, 2024
7eebd0d
Set toks to have correct context lens
EricLBuehler Jun 4, 2024
401d603
Merge branch 'master' into phi3_vision
EricLBuehler Jun 4, 2024
a8c2b41
Support multiple GGUF files (#379)
EricLBuehler Jun 5, 2024
9b46c1c
Merge branch 'master' into phi3_vision
EricLBuehler Jun 5, 2024
bec9a4b
Merge
EricLBuehler Jun 5, 2024
19ca7ac
Organize normal loading metadata (#381)
EricLBuehler Jun 5, 2024
818808b
Bump version 0.1.13 -> 0.1.14 (#382)
EricLBuehler Jun 5, 2024
9712da6
Patch incorrect unwrap and bump version (#383)
EricLBuehler Jun 5, 2024
798adb4
More verbose logging during loading (#385)
EricLBuehler Jun 5, 2024
89dea1b
Refactor enabling debug logging (#387)
EricLBuehler Jun 5, 2024
5c5476d
Merge branch 'master' into phi3_vision
EricLBuehler Jun 5, 2024
5029063
Merge
EricLBuehler Jun 5, 2024
9b7543b
Merge
EricLBuehler Jun 5, 2024
c6ed513
Refactor enabling debug logging (#387)
EricLBuehler Jun 5, 2024
998aa96
Merge
EricLBuehler Jun 5, 2024
2330f56
Use precise gelu
EricLBuehler Jun 5, 2024
65e1a79
Use correct kernel
EricLBuehler Jun 5, 2024
17a87bc
Debugging commit
EricLBuehler Jun 5, 2024
8d4888b
Merge branch 'master' into phi3_vision
EricLBuehler Jun 6, 2024
40963c9
Add fused bias linear
EricLBuehler Jun 7, 2024
1f2bf87
Merge branch 'master' into phi3_vision
EricLBuehler Jun 7, 2024
428b36f
Finish merge
EricLBuehler Jun 7, 2024
1a89341
Use fused layer in clip
EricLBuehler Jun 7, 2024
3ccd3e6
Save progress
EricLBuehler Jun 7, 2024
1327893
Remove debugs
EricLBuehler Jun 7, 2024
2b8cb17
Update example
EricLBuehler Jun 7, 2024
e7dff6c
Resize exact
EricLBuehler Jun 7, 2024
3b6cbbc
Update interpolate
EricLBuehler Jun 7, 2024
298e56e
Fix batch dim
EricLBuehler Jun 7, 2024
14e3f2f
Update test and transform
EricLBuehler Jun 7, 2024
ced3cab
It works
EricLBuehler Jun 7, 2024
cbccb41
Add some examples
EricLBuehler Jun 7, 2024
6827df2
Merge branch 'master' into phi3_vision
EricLBuehler Jun 7, 2024
21443aa
Allow more than one image
EricLBuehler Jun 7, 2024
1aba518
Add support in python api
EricLBuehler Jun 7, 2024
cdd71ce
Add to toml selector
EricLBuehler Jun 7, 2024
ee69dfd
Update python api
EricLBuehler Jun 7, 2024
d7a7c3c
Overhaul readme and docs
EricLBuehler Jun 7, 2024
77885e6
Update
EricLBuehler Jun 7, 2024
af5b83d
Export vision arch
EricLBuehler Jun 7, 2024
34a21c4
Export vision arch
EricLBuehler Jun 7, 2024
f70370c
Export vision arch
EricLBuehler Jun 7, 2024
d65e884
Fix max img dim
EricLBuehler Jun 7, 2024
600ef37
Fix unwrap
EricLBuehler Jun 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ members = [
"mistralrs-pyo3",
"mistralrs",
"mistralrs-bench",
"mistralrs-vision",
]
resolver = "2"

Expand Down Expand Up @@ -32,10 +33,12 @@ tracing = "0.1.40"
tracing-subscriber = { version = "0.3.18", features = ["env-filter"] }
futures = "0.3"
clap = { version = "4.5.1", features = ["derive"] }
pyo3 = { version = "0.21.0", features = ["full", "extension-module"] }
pyo3 = { version = "0.21.0", features = ["full", "extension-module", "either"] }
tokio = { version = "1.36.0", features = ["full", "rt-multi-thread"] }
once_cell = "1.19.0"
image = "0.25.1"
reqwest = { version = "0.12.4", features = ["blocking"] }
base64 = "0.22.1"

[profile.profiling]
inherits = "release"
Expand Down
122 changes: 83 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,19 +13,47 @@ Blazingly fast LLM inference.

Mistral.rs is a fast LLM inference platform supporting inference on a variety of devices, quantization, and easy-to-use application with an Open-AI API compatible HTTP server and Python bindings.

## Upcoming features
- More models: please submit requests [here](https://github.com/EricLBuehler/mistral.rs/issues/156).
- X-LoRA: Scalings `topk` and softmax `topk` ([#48](https://github.com/EricLBuehler/mistral.rs/issues/48)).
- Parallel linear layers (sharding) ([#50](https://github.com/EricLBuehler/mistral.rs/issues/50)).
- Vision models: Idefics 2 ([#309](https://github.com/EricLBuehler/mistral.rs/pull/309)).
Please submit requests for new models [here](https://github.com/EricLBuehler/mistral.rs/issues/156).

**Running the new Llama 3 model**
## Get started fast 🚀

`cargo run --release --features ... -- -i plain -m meta-llama/Meta-Llama-3-8B-Instruct -a llama`
1) [Install](#installation-and-build)

**Running the new Phi 3 model with 128K context window**
2) [Get models](#getting-models)

`cargo run --release --features ... -- -i plain -m microsoft/Phi-3-mini-128k-instruct -a phi3`
3) Deploy with our easy to use APIs
- [Python](examples/python)
- [Rust](mistralrs/examples)
- [OpenAI compatible HTTP server](examples/http.md)

## Quick examples
- 🦙 Run the Llama 3 model

*After following installation instructions*

```
./mistralrs_server -i plain -m meta-llama/Meta-Llama-3-8B-Instruct -a llama
```

- φ³ Run the Phi 3 model with 128K context window

*After following installation instructions*

```
./mistralrs_server -i plain -m microsoft/Phi-3-mini-128k-instruct -a phi3
```

- φ³📷 Run the Phi 3 vision model: [documentation and guide here](docs/PHI3V.md)

<img src="https://static.vecteezy.com/system/resources/previews/012/168/187/large_2x/beautiful-sunset-on-the-beach-with-palm-tree-for-travel-and-vacation-free-photo.JPG" alt="Sunset on a beach" width = "400" height = "267">

*After following installation instructions*

```
./mistralrs_server --port 1234 vision-plain -m microsoft/Phi-3-vision-128k-instruct -a phi3v
```

- Other models: [see supported models](#supported-models) and [how to run them](#run-with-the-cli)

## Description
**Fast**:
Expand Down Expand Up @@ -69,19 +97,24 @@ https://github.com/EricLBuehler/mistral.rs/assets/65165915/3396abcd-8d44-4bf7-95
Please see [this section](#supported-models) for details on quantization and LoRA support.

## APIs and Integrations
**Rust Library API**

Rust multithreaded API for easy integration into any application.
<details>
<summary><b>Rust Crate</b></summary>

Rust multithreaded/async API for easy integration into any application.

- [Docs](https://ericlbuehler.github.io/mistral.rs/mistralrs/)
- [Examples](mistralrs/examples/)
- To install: Add `mistralrs = { git = "https://github.com/EricLBuehler/mistral.rs.git" }`

**Python API**
</details>

<details>
<summary><b>Python API</b></summary>

Python API for mistral.rs.

- [Installation](mistralrs-pyo3/README.md)
- [Installation including PyPI](mistralrs-pyo3/README.md)
- [Docs](mistralrs-pyo3/API.md)
- [Example](examples/python/python_api.py)
- [Cookbook](examples/python/cookbook.ipynb)
Expand Down Expand Up @@ -113,18 +146,26 @@ print(res.choices[0].message.content)
print(res.usage)
```

**HTTP Server**
</details>

<details>
<summary><b>HTTP Server</b></summary>

OpenAI API compatible API server

- [API Docs](examples/http.md).
- [Running](README.md#run)
- [Example](examples/server/chat.py)

**Llama Index integration**
</details>

<details>
<summary><b>Llama Index integration</b></summary>

- Docs: https://docs.llamaindex.ai/en/stable/examples/llm/mistral_rs/

</details>

---

## Supported accelerators
Expand All @@ -149,13 +190,11 @@ Enabling features is done by passing `--features ...` to the build system. When
|A10 GPU, CUDA|78|78|[mistral-7b](TheBloke/Mistral-7B-Instruct-v0.1-GGUF)|4_K_M|
|Intel Xeon 8358 CPU, AVX|6|19|[mistral-7b](TheBloke/Mistral-7B-Instruct-v0.1-GGUF)|4_K_M|
|Raspberry Pi 5 (8GB), Neon|2|3|[mistral-7b](TheBloke/Mistral-7B-Instruct-v0.1-GGUF)|2_K|
|A100 GPU, CUDA|110|119|[mistral-7b](TheBloke/Mistral-7B-Instruct-v0.1-GGUF)|4_K_M|
|A100 GPU, CUDA|119|119|[mistral-7b](TheBloke/Mistral-7B-Instruct-v0.1-GGUF)|4_K_M|

Please submit more benchmarks via raising an issue!

## Usage
### Installation and Build
To install mistral.rs, one should ensure they have Rust installed by following [this](https://rustup.rs/) link. Additionally, the Hugging Face token should be provided in `~/.cache/huggingface/token` by running `huggingface-cli login` to enable automatic download of gated models.
## Installation and Build

1) Install required packages
- `openssl` (ex., `sudo apt install libssl-dev`)
Expand Down Expand Up @@ -224,6 +263,7 @@ To install mistral.rs, one should ensure they have Rust installed by following [
There are 2 ways to run a model with mistral.rs:
- From Hugging Face Hub (easiest)
- From local files
- Running a GGUF model fully locally

### Getting models from Hugging Face Hub

Expand Down Expand Up @@ -284,16 +324,14 @@ please consider using the method demonstrated in examples below, where the token
**Supported GGUF tokenizer types**
- `llama`

## Run

To start a server serving Mistral GGUF on `localhost:1234`,
```bash
./mistralrs_server --port 1234 --log output.log gguf -m TheBloke/Mistral-7B-Instruct-v0.1-GGUF -t mistralai/Mistral-7B-Instruct-v0.1 -f mistral-7b-instruct-v0.1.Q4_K_M.gguf
```
## Run with the CLI

Mistral.rs uses subcommands to control the model type. They are generally of format `<XLORA/LORA>-<QUANTIZATION>`. Please run `./mistralrs_server --help` to see the subcommands.

Additionally, for models without quantization, the model architecture should be provided as the `--arch` or `-a` argument in contrast to GGUF models which encode the architecture in the file. It should be one of the following:
Additionally, for models without quantization, the model architecture should be provided as the `--arch` or `-a` argument in contrast to GGUF models which encode the architecture in the file.

### Architecture for plain models

- `mistral`
- `gemma`
- `mixtral`
Expand All @@ -302,6 +340,10 @@ Additionally, for models without quantization, the model architecture should be
- `phi3`
- `qwen2`

### Architecture for vision models

- `phi3v`

**Interactive mode:**

You can launch interactive mode, a simple chat application running in the terminal, by passing `-i`:
Expand All @@ -310,7 +352,7 @@ You can launch interactive mode, a simple chat application running in the termin
./mistralrs_server -i plain -m microsoft/Phi-3-mini-128k-instruct -a phi3
```

### Quick examples:
## More quick examples:

- X-LoRA with no quantization

Expand Down Expand Up @@ -362,13 +404,14 @@ Example:
./mistralrs_server --port 1234 toml -f toml-selectors/gguf.toml
```

**Command line docs**

Command line docs [here](docs/CMD_LINE_DOCS.md)

---

## Supported models

Mistal.rs supports several model categories:
- text
- vision (see [the docs](docs/VISION_MODELS.md))

**Quantization support**
|Model|GGUF|GGML|
|--|--|--|
Expand All @@ -379,13 +422,15 @@ Command line docs [here](docs/CMD_LINE_DOCS.md)
|Phi 2|✅| |
|Phi 3|✅| |
|Qwen 2| | |
|Phi 3 Vision| | |

**Device mapping support**
|Model|Supported|
|--|--|
|Normal|✅|
|Plain|✅|
|GGUF|✅|
|GGML| |
|Vision Plain| |

**X-LoRA and LoRA support**
|Model|X-LoRA|X-LoRA+GGUF|X-LoRA+GGML|
Expand All @@ -397,17 +442,19 @@ Command line docs [here](docs/CMD_LINE_DOCS.md)
|Phi 2|✅| | |
|Phi 3|✅|✅| |
|Qwen 2| | | |
|Phi 3 Vision| | | |

**Using derivative models**
### Using derivative model

To use a derivative model, select the model architecture using the correct subcommand. To see what can be passed for the architecture, pass `--help` after the subcommand. For example, when using a different model than the default, specify the following for the following types of models:

- **Normal**: Model id
- **Plain**: Model id
- **Quantized**: Quantized model id, quantized filename, and tokenizer id
- **X-LoRA**: Model id, X-LoRA ordering
- **X-LoRA quantized**: Quantized model id, quantized filename, tokenizer id, and X-LoRA ordering
- **LoRA**: Model id, LoRA ordering
- **LoRA quantized**: Quantized model id, quantized filename, tokenizer id, and LoRA ordering
- **Vision Plain**: Model id

See [this](#adapter-ordering-file) section to determine if it is necessary to prepare an X-LoRA/LoRA ordering file, it is always necessary if the target modules or architecture changed, or if the adapter order changed.

Expand All @@ -421,16 +468,13 @@ For example, when using a Zephyr model:

An adapter model is a model with X-LoRA or LoRA. X-LoRA support is provided by selecting the `x-lora-*` architecture, and LoRA support by selecting the `lora-*` architecture. Please find docs for adapter models [here](docs/ADAPTER_MODELS.md)

---

### Chat Templates and Tokenizer
Mistral.rs will attempt to automatically load a chat template and tokenizer. This enables high flexibility across models and ensures accurate and flexible chat templating. However, this behavior can be customized. Please find detailed documentation [here](docs/CHAT_TOK.md).

## Contributing
If you have any problems or want to contribute something, please raise an issue or pull request!


If you want to add a new model, please see [our guide](docs/ADDING_MODELS.md).
Thank you for contributing! If you have any problems or want to contribute something, please raise an issue or pull request.
If you want to add a new model, please contact us via an issue and we can coordinate how to do this.

## FAQ
- Debugging with the environment variable `MISTRALRS_DEBUG=1` causes the following things
Expand Down
Loading
Loading