Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

Enable Qwen1-5 #146

Merged
merged 15 commits into from
Mar 5, 2024
Merged

Enable Qwen1-5 #146

merged 15 commits into from
Mar 5, 2024

Conversation

intellinjun
Copy link
Contributor

@intellinjun intellinjun commented Mar 1, 2024

Type of Change

feature or bug fix or documentation or others
API changed or not

Description

qwen1-5 enabling
detail description

  • new convert function
  • new forward function
  • gguf format

intellinjun and others added 3 commits February 27, 2024 17:47
@intellinjun intellinjun marked this pull request as ready for review March 1, 2024 05:32
Copy link
Contributor

@a32543254 a32543254 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

neural_speed/models/qwen/qwen_utils.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@airMeng airMeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An extension test is needed

intellinjun and others added 3 commits March 1, 2024 15:12
neural_speed/models/qwen/qwen.cpp Show resolved Hide resolved
@intellinjun
Copy link
Contributor Author

intellinjun commented Mar 4, 2024

@Zhenzhong1
Copy link
Contributor

Please check this https://huggingface.co/Qwen/Qwen1.5-7B-Chat-GGUF and update supported_models.md

Signed-off-by: intellinjun <[email protected]>
@a32543254
Copy link
Contributor

please also add mixtral 8*7B in support model list

@intellinjun
Copy link
Contributor Author

<style> </style>
qwen-1_5 32 32 1 32 q4_j_i8_g128   New 27.15 4190.4 72.9 914.54 27.38 72.9
qwen-1_5 1024 32 1 32 q4_j_i8_g128   New 29.77 4343.65 1041.29 1964.16 35.02 1041.29
qwen-1_5 2012 32 1 32 q4_j_i8_g128   New 31.32 4712.84 2147.69 3118.68 31.95 2147.69
qwen-1_5 32 32 1 48 q4_j_i8_g128   New 24.56 5532.91 57.06 818.42 24.73 57.06
qwen-1_5 1024 32 1 48 q4_j_i8_g128   New 28.16 5279.89 859.34 1732.33 27.37 859.34
qwen-1_5 2012 32 1 48 q4_j_i8_g128   New 29.84 5257.13 1776.02 2701.09 30.09 1776.02
qwen-1_5 32 32 1 56 q4_j_i8_g128   New 25.31 5481.34 62.05 846.77 25.51 62.05
qwen-1_5 1024 32 1 56 q4_j_i8_g128   New 27.56 5338.05 769.49 1623.89 27.81 769.49
qwen-1_5 2012 32 1 56 q4_j_i8_g128   New 30.27 5314.82 1582.25 2520.77 30.45 1582.25
qwen-1_5 32 32 1 32 q4_j_i8_g32   New 31.7 5164.92 190.52 1173.22 31.96 190.52
qwen-1_5 1024 32 1 32 q4_j_i8_g32   New 36.67 4472.71 2072.26 3208.92 44.24 2072.26
qwen-1_5 2012 32 1 32 q4_j_i8_g32   New 36.7 4930.47 4091.69 5229.29 36.9 4091.69
qwen-1_5 32 32 1 48 q4_j_i8_g32   New 29.94 5442.93 307.86 1236.01 30.06 307.86
qwen-1_5 1024 32 1 48 q4_j_i8_g32   New 31.97 5262.03 1756.79 2747.76 32.16 1756.79
qwen-1_5 2012 32 1 48 q4_j_i8_g32   New 35.19 5311.04 3680.12 4771.07 35.39 3680.12
qwen-1_5 32 32 1 56 q4_j_i8_g32   New 30.69 5432.81 160.35 1111.7 30.89 160.35
qwen-1_5 1024 32 1 56 q4_j_i8_g32   New 32.89 5323.78 1641.17 2660.72 33.09 1641.17
qwen-1_5 2012 32 1 56 q4_j_i8_g32   New 36.06 5347.23 3443.91 4561.86 36.54 3443.91
qwen-1_5 32 32 1 32 q4_0   New 34.39 4796.24 457.98 1524.17 39.44 457.98
qwen-1_5 1024 32 1 32 q4_0   New 33.11 4762.58 12449.43 13475.96 33.43 12449.43
qwen-1_5 2012 32 1 32 q4_0   New 35.47 5655.83 26196.5 27296.14 35.66 26196.5
qwen-1_5 32 32 1 48 q4_0   New 27.01 5344.82 329.98 1167.39 27.25 329.98
qwen-1_5 1024 32 1 48 q4_0   New 30.27 5277.15 9383.2 10321.59 30.46 9383.2
qwen-1_5 2012 32 1 48 q4_0   New 32.89 5542.24 19753.62 20773.2 33.06 19753.62
qwen-1_5 32 32 1 56 q4_0   New 27.48 5377.97 314.21 1166.22 27.74 314.21
qwen-1_5 1024 32 1 56 q4_0   New 30.22 5327.12 8735.25 9672.2 30.58 8735.25
qwen-1_5 2012 32 1 56 q4_0   New 32.63 5489.69 18234.3 19245.84 32.87 18234.3

@VincyZhang VincyZhang merged commit 750b356 into main Mar 5, 2024
11 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants