Fixes for codellama #2768

slaren · 2023-08-24T15:33:14Z

Changes convert.py to allow missing vocab_size in params.json, adds enum value for 34b model.

Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6

model	backend	n_gpu_layers	test	t/s
LLaMA v2 34B mostly Q4_K - Small	CUDA	99	pp 512	530.11 ± 0.60
LLaMA v2 34B mostly Q4_K - Small	CUDA	99	tg 128	18.55 ± 0.02

… models

slaren · 2023-08-24T15:43:58Z

Short perplexity test:
7b q5_k_m: [1]6.3455,[2]7.2255,[3]9.2265,[4]10.3542,[5]10.4333,[6]9.9436,[7]10.4459,[8]10.3019,[9]10.8475,[10]11.3143,[11]11.8393,[12]11.8581,
13b q5_k_m: [1]5.8744,[2]7.0650,[3]7.9541,[4]9.2436,[5]9.7071,[6]9.6262,[7]9.7747,[8]9.8889,[9]10.2961,[10]10.7754,[11]11.1154,[12]11.1430,
34b q4_k_m: [1]5.3722,[2]6.8634,[3]18.5997,[4]18.0402,[5]20.8216,[6]18.8895,[7]24.5365,[8]23.1267,[9]30.7042,[10]32.4732,[11]38.5647,[12]35.2927,

34b seems to be increasing a bit too much, might have to look into it.

slaren · 2023-08-24T16:00:47Z

Final perplexity for 34b q4_k_m is 63.1600,. Might be simply due to the different dataset, but something may be wrong.
7b q5_k_m: 10.1548

llama_model_loader: - kv   0:                       general.architecture str
llama_model_loader: - kv   1:                               general.name str
llama_model_loader: - kv   2:                       llama.context_length u32
llama_model_loader: - kv   3:                     llama.embedding_length u32
llama_model_loader: - kv   4:                          llama.block_count u32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32
llama_model_loader: - kv   7:                 llama.attention.head_count u32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32
llama_model_loader: - kv  10:                          general.file_type u32
llama_model_loader: - kv  11:                       tokenizer.ggml.model str
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr
llama_model_loader: - kv  15:               general.quantization_version u32
llama_model_loader: - type  f32:   97 tensors
llama_model_loader: - type q4_K:  289 tensors
llama_model_loader: - type q6_K:   49 tensors
llm_load_print_meta: format         = GGUF V1 (latest)
llm_load_print_meta: arch           = llama
llm_load_print_meta: vocab type     = SPM
llm_load_print_meta: n_vocab        = 32000
llm_load_print_meta: n_merges       = 0
llm_load_print_meta: n_ctx_train    = 4096
llm_load_print_meta: n_ctx          = 512
llm_load_print_meta: n_embd         = 8192
llm_load_print_meta: n_head         = 64
llm_load_print_meta: n_head_kv      = 8
llm_load_print_meta: n_layer        = 48
llm_load_print_meta: n_rot          = 128
llm_load_print_meta: n_gqa          = 8
llm_load_print_meta: f_norm_eps     = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff           = 22016
llm_load_print_meta: freq_base      = 10000.0
llm_load_print_meta: freq_scale     = 1
llm_load_print_meta: model type     = 34B
llm_load_print_meta: model ftype    = mostly Q4_K - Medium
llm_load_print_meta: model size     = 33.74 B
llm_load_print_meta: general.name   = LLaMA v2
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.13 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required  =  140.76 MB (+   96.00 MB per state)
llm_load_tensors: offloading 48 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloading v cache to GPU
llm_load_tensors: offloading k cache to GPU
llm_load_tensors: offloaded 51/51 layers to GPU
llm_load_tensors: VRAM used: 19238 MB
....................................................................................................
llama_new_context_with_model: kv self size  =   96.00 MB
llama_new_context_with_model: compute buffer total size =  119.41 MB
llama_new_context_with_model: VRAM scratch buffer: 118.00 MB

system_info: n_threads = 1 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity: tokenizing the input ..
perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 0.99 seconds per pass - ETA 10.77 minutes
[1]5.3722,[2]6.8634,[3]18.5997,[4]18.0402,[5]20.8216,[6]18.8895,[7]24.5365,[8]23.1267,[9]30.7042,[10]32.4732,[11]38.5647,[12]35.2927,[13]35.3977,[14]40.8779,[15]50.0018,[16]44.6572,[17]42.0392,[18]47.4379,[19]42.1442,[20]45.5599,[21]48.5124,[22]50.1700,[23]46.9413,[24]49.1561,[25]48.3779,[26]44.4299,[27]43.9957,[28]42.0548,[29]43.4902,[30]43.0095,[31]43.9184,[32]42.1812,[33]42.3804,[34]44.5481,[35]45.8786,[36]44.1231,[37]45.8371,[38]46.3058,[39]47.8326,[40]49.2811,[41]47.9219,[42]47.5123,[43]48.8565,[44]49.9723,[45]51.3837,[46]52.2847,[47]51.7251,[48]53.2990,[49]55.3860,[50]56.1291,[51]56.6182,[52]56.4147,[53]57.7867,[54]59.2960,[55]56.8689,[56]55.7473,[57]55.4897,[58]55.7443,[59]54.9167,[60]56.6449,[61]57.6218,[62]56.8220,[63]57.7998,[64]58.3952,[65]58.9115,[66]59.2383,[67]60.9043,[68]60.6064,[69]61.1762,[70]61.9318,[71]63.0806,[72]61.8627,[73]62.5991,[74]63.5537,[75]64.4164,[76]64.6967,[77]64.8567,[78]64.9029,[79]65.9337,[80]64.3252,[81]64.6137,[82]63.1367,[83]61.4556,[84]62.1606,[85]62.9551,[86]62.4851,[87]62.9987,[88]62.0748,[89]62.3586,[90]62.7458,[91]63.4891,[92]64.8459,[93]65.2256,[94]65.7529,[95]66.7645,[96]67.4958,[97]66.2010,[98]66.1251,[99]65.4783,[100]66.1518,[101]66.7187,[102]67.5872,[103]66.4947,[104]66.4975,[105]67.1665,[106]67.8499,[107]67.4641,[108]68.1799,[109]67.6239,[110]68.2992,[111]69.2309,[112]69.8823,[113]68.8550,[114]68.9047,[115]69.3004,[116]68.2032,[117]67.1597,[118]67.8393,[119]68.4106,[120]69.3725,[121]69.1116,[122]68.4353,[123]68.2800,[124]68.0477,[125]67.9936,[126]68.6714,[127]68.1732,[128]68.9529,[129]69.4866,[130]70.0720,[131]70.8555,[132]69.7161,[133]68.5270,[134]69.1271,[135]68.9255,[136]69.2516,[137]69.4818,[138]69.5431,[139]69.7582,[140]70.5190,[141]70.7511,[142]69.8141,[143]70.1926,[144]70.3150,[145]70.4458,[146]69.4104,[147]69.6069,[148]69.8050,[149]69.8345,[150]69.0615,[151]69.2670,[152]69.2654,[153]69.4935,[154]69.6644,[155]70.3173,[156]69.5862,[157]69.7767,[158]69.1744,[159]69.5966,[160]68.7871,[161]68.4688,[162]68.1816,[163]67.2025,[164]66.6679,[165]66.4302,[166]66.5904,[167]65.4620,[168]64.5073,[169]64.7642,[170]64.6514,[171]64.4600,[172]64.4797,[173]64.3071,[174]64.3380,[175]64.4337,[176]64.5575,[177]63.7445,[178]64.0980,[179]64.1851,[180]64.3830,[181]63.7599,[182]63.8480,[183]64.0105,[184]64.2083,[185]63.7754,[186]63.9489,[187]63.6941,[188]64.3531,[189]64.6876,[190]64.3726,[191]65.3649,[192]65.8998,[193]66.5370,[194]67.0823,[195]67.5497,[196]68.0094,[197]68.3190,[198]68.0144,[199]67.5266,[200]67.0679,[201]66.3224,[202]66.9096,[203]66.4625,[204]67.1994,[205]66.8987,[206]67.5093,[207]67.6564,[208]68.1417,[209]68.3621,[210]68.5502,[211]69.0086,[212]69.3362,[213]69.2351,[214]68.7124,[215]68.7493,[216]68.9633,[217]68.8705,[218]68.4066,[219]68.6391,[220]68.1637,[221]68.5218,[222]68.7122,[223]68.7446,[224]68.9553,[225]68.8326,[226]69.4308,[227]69.9229,[228]70.4930,[229]70.8096,[230]70.3801,[231]70.7470,[232]70.8915,[233]70.7748,[234]70.7758,[235]70.8871,[236]70.1911,[237]70.4651,[238]70.7678,[239]70.8104,[240]70.7656,[241]70.3186,[242]70.6932,[243]70.8081,[244]70.1327,[245]70.0990,[246]69.4259,[247]69.5657,[248]69.5496,[249]69.7259,[250]69.1864,[251]69.0940,[252]69.1617,[253]69.4336,[254]69.4729,[255]69.4173,[256]68.8389,[257]68.5638,[258]68.8591,[259]69.0085,[260]69.0906,[261]68.7520,[262]68.6570,[263]68.6046,[264]68.5303,[265]68.7943,[266]68.6624,[267]68.6287,[268]68.2216,[269]67.7220,[270]67.7484,[271]67.3327,[272]66.8540,[273]67.0361,[274]67.0683,[275]67.4446,[276]67.2734,[277]67.5035,[278]67.3129,[279]67.5298,[280]67.7587,[281]67.4256,[282]67.5814,[283]67.8374,[284]67.9660,[285]67.5812,[286]67.7368,[287]67.6688,[288]67.2755,[289]66.8072,[290]66.2715,[291]66.3255,[292]66.4076,[293]65.8640,[294]66.0011,[295]66.0790,[296]66.2347,[297]66.3860,[298]66.1118,[299]66.2495,[300]65.6643,[301]65.2227,[302]64.7275,[303]64.6542,[304]64.5785,[305]64.3063,[306]63.8716,[307]63.9514,[308]64.0732,[309]63.8202,[310]63.9167,[311]63.9081,[312]63.5917,[313]63.2475,[314]63.5068,[315]62.9764,[316]62.7250,[317]62.2487,[318]61.6874,[319]61.4673,[320]61.7716,[321]61.9084,[322]61.5076,[323]61.1308,[324]60.8254,[325]60.7589,[326]60.8205,[327]60.4973,[328]60.7845,[329]61.0918,[330]61.3219,[331]61.5470,[332]61.5831,[333]61.8672,[334]62.0101,[335]62.1867,[336]62.1230,[337]62.2017,[338]61.9051,[339]61.9544,[340]62.1529,[341]62.3669,[342]62.0460,[343]62.2432,[344]62.3634,[345]62.0058,[346]62.1173,[347]62.2805,[348]62.3061,[349]62.4143,[350]62.5531,[351]62.6050,[352]62.4007,[353]62.4609,[354]62.7498,[355]63.1266,[356]63.4078,[357]63.4392,[358]63.7693,[359]64.1597,[360]63.7484,[361]63.4239,[362]63.2723,[363]63.6244,[364]63.6976,[365]63.8372,[366]63.8226,[367]64.0798,[368]64.0947,[369]64.2145,[370]64.3675,[371]64.0243,[372]64.1487,[373]64.2855,[374]64.2745,[375]64.3428,[376]64.5512,[377]64.1334,[378]64.3588,[379]64.5686,[380]64.5594,[381]64.6034,[382]64.7722,[383]64.8803,[384]64.7609,[385]64.9052,[386]65.0941,[387]65.2881,[388]65.5088,[389]65.4680,[390]65.2245,[391]65.3933,[392]65.3887,[393]65.1051,[394]64.9781,[395]64.7609,[396]64.4236,[397]64.6313,[398]64.3495,[399]64.5929,[400]64.8183,[401]64.5031,[402]64.8714,[403]64.5638,[404]64.8526,[405]65.0616,[406]65.1516,[407]65.1070,[408]65.0348,[409]65.3795,[410]65.6611,[411]65.9290,[412]65.7917,[413]66.0696,[414]66.2572,[415]66.4053,[416]66.3929,[417]66.6110,[418]66.7518,[419]66.8844,[420]67.1552,[421]67.3992,[422]67.6390,[423]67.4414,[424]67.6983,[425]67.5210,[426]67.7952,[427]67.5891,[428]67.8714,[429]67.8638,[430]67.7827,[431]67.8524,[432]68.0215,[433]68.0250,[434]67.7200,[435]67.8556,[436]67.6048,[437]67.7432,[438]67.8956,[439]67.8467,[440]67.9014,[441]67.9599,[442]68.0346,[443]68.1949,[444]68.3085,[445]68.4728,[446]68.5187,[447]68.7361,[448]68.6779,[449]68.7231,[450]68.4663,[451]68.6147,[452]68.3170,[453]68.3787,[454]68.1536,[455]67.8817,[456]67.9065,[457]68.1159,[458]68.1468,[459]68.3322,[460]68.1498,[461]67.8467,[462]67.5300,[463]67.6604,[464]67.3659,[465]67.1501,[466]66.8277,[467]66.7319,[468]66.5062,[469]66.2973,[470]66.3300,[471]66.0265,[472]66.0738,[473]65.7277,[474]65.4825,[475]65.6804,[476]65.8642,[477]66.0072,[478]65.9554,[479]66.2864,[480]66.4256,[481]66.3546,[482]66.4843,[483]66.3786,[484]66.6304,[485]66.6555,[486]66.3278,[487]66.2141,[488]66.3310,[489]66.0924,[490]66.3018,[491]66.0328,[492]65.7608,[493]65.5610,[494]65.3118,[495]65.3401,[496]65.1191,[497]64.8696,[498]64.6983,[499]64.4070,[500]64.4543,[501]64.2186,[502]64.0064,[503]64.1239,[504]63.9233,[505]63.6977,[506]63.4843,[507]63.6471,[508]63.7544,[509]64.0045,[510]64.0976,[511]63.9092,[512]63.7219,[513]63.8090,[514]63.6563,[515]63.8533,[516]63.7793,[517]63.8389,[518]63.8802,[519]63.9988,[520]64.0833,[521]64.0832,[522]64.0158,[523]64.1246,[524]64.2134,[525]64.3350,[526]64.1216,[527]64.2748,[528]64.0177,[529]64.0979,[530]64.0446,[531]64.1003,[532]63.9388,[533]64.1983,[534]64.0059,[535]64.0748,[536]63.7816,[537]63.9685,[538]64.0965,[539]64.3060,[540]64.1924,[541]64.3957,[542]64.1433,[543]64.1330,[544]64.2264,[545]64.2380,[546]64.2640,[547]64.3215,[548]64.2925,[549]64.3039,[550]64.2860,[551]64.3630,[552]64.5191,[553]64.6287,[554]64.3604,[555]64.0817,[556]64.1676,[557]64.3209,[558]64.3894,[559]64.4420,[560]64.4271,[561]64.4476,[562]64.4579,[563]64.6372,[564]64.6691,[565]64.8215,[566]64.9431,[567]65.0567,[568]64.8633,[569]64.9597,[570]64.7805,[571]64.8904,[572]65.0106,[573]65.1171,[574]65.0246,[575]65.0512,[576]64.8336,[577]64.8709,[578]64.9671,[579]65.1184,[580]65.1459,[581]64.9673,[582]64.7708,[583]64.6237,[584]64.4728,[585]64.1978,[586]64.1859,[587]63.9967,[588]64.1102,[589]63.9630,[590]64.0796,[591]64.1820,[592]64.3731,[593]64.2488,[594]64.0475,[595]64.1433,[596]64.1900,[597]63.9622,[598]63.8459,[599]63.6558,[600]63.8030,[601]63.8747,[602]63.9622,[603]63.8415,[604]63.8371,[605]63.9250,[606]63.7467,[607]63.5332,[608]63.2838,[609]63.3835,[610]63.4859,[611]63.5186,[612]63.7273,[613]63.6170,[614]63.3900,[615]63.3528,[616]63.5236,[617]63.4472,[618]63.3583,[619]63.2271,[620]63.0657,[621]62.8304,[622]62.6378,[623]62.7276,[624]62.7355,[625]62.6736,[626]62.7857,[627]62.7025,[628]62.5853,[629]62.3819,[630]62.4502,[631]62.6082,[632]62.5224,[633]62.5602,[634]62.4200,[635]62.5675,[636]62.6213,[637]62.6924,[638]62.8519,[639]62.9235,[640]63.0890,[641]62.8776,[642]62.8108,[643]62.8572,[644]62.8797,[645]62.7199,[646]62.7847,[647]62.7171,[648]62.5697,[649]62.6333,[650]62.8029,[651]63.0165,[652]63.1887,[653]63.3320,[654]63.3318,[655]63.1600,

llama_print_timings:        load time =  5183.42 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time = 650506.37 ms / 335360 tokens (    1.94 ms per token,   515.54 tokens per second)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time = 677581.77 ms

slaren · 2023-08-24T16:08:49Z

The issue may be due to the long context fine-tuning. Using --rope-freq-base 1e6 the results are looking much better.

ggerganov · 2023-08-24T16:19:30Z

We have to add rope_theta to the convert.py script and write it in the meta data of the mode

slaren · 2023-08-24T16:19:31Z

Ok, that's definitely an issue. The final ppl with 34b q4_k_m --rope-freq-base 1e6 is 5.7811

llama_model_loader: - kv   0:                       general.architecture str
llama_model_loader: - kv   1:                               general.name str
llama_model_loader: - kv   2:                       llama.context_length u32
llama_model_loader: - kv   3:                     llama.embedding_length u32
llama_model_loader: - kv   4:                          llama.block_count u32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32
llama_model_loader: - kv   7:                 llama.attention.head_count u32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32
llama_model_loader: - kv  10:                          general.file_type u32
llama_model_loader: - kv  11:                       tokenizer.ggml.model str
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr
llama_model_loader: - kv  15:               general.quantization_version u32
llama_model_loader: - type  f32:   97 tensors
llama_model_loader: - type q4_K:  289 tensors
llama_model_loader: - type q6_K:   49 tensors
llm_load_print_meta: format         = GGUF V1 (latest)
llm_load_print_meta: arch           = llama
llm_load_print_meta: vocab type     = SPM
llm_load_print_meta: n_vocab        = 32000
llm_load_print_meta: n_merges       = 0
llm_load_print_meta: n_ctx_train    = 4096
llm_load_print_meta: n_ctx          = 512
llm_load_print_meta: n_embd         = 8192
llm_load_print_meta: n_head         = 64
llm_load_print_meta: n_head_kv      = 8
llm_load_print_meta: n_layer        = 48
llm_load_print_meta: n_rot          = 128
llm_load_print_meta: n_gqa          = 8
llm_load_print_meta: f_norm_eps     = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff           = 22016
llm_load_print_meta: freq_base      = 1000000.0
llm_load_print_meta: freq_scale     = 1
llm_load_print_meta: model type     = 34B
llm_load_print_meta: model ftype    = mostly Q4_K - Medium
llm_load_print_meta: model size     = 33.74 B
llm_load_print_meta: general.name   = LLaMA v2
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.13 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required  =  140.76 MB (+   96.00 MB per state)
llm_load_tensors: offloading 48 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloading v cache to GPU
llm_load_tensors: offloading k cache to GPU
llm_load_tensors: offloaded 51/51 layers to GPU
llm_load_tensors: VRAM used: 19238 MB
....................................................................................................
llama_new_context_with_model: kv self size  =   96.00 MB
llama_new_context_with_model: compute buffer total size =  119.41 MB
llama_new_context_with_model: VRAM scratch buffer: 118.00 MB

system_info: n_threads = 1 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity: tokenizing the input ..
perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 1.00 seconds per pass - ETA 10.88 minutes
[1]4.2372,[2]4.7320,[3]5.5051,[4]6.3183,[5]6.4880,[6]6.4093,[7]6.5309,[8]6.5565,[9]6.8206,[10]7.0476,[11]7.2814,[12]7.3205,[13]7.2565,[14]7.3485,[15]7.5977,[16]7.2239,[17]7.0971,[18]7.0294,[19]6.6729,[20]6.6685,[21]6.5518,[22]6.3390,[23]6.2829,[24]6.1878,[25]6.1765,[26]5.9847,[27]5.7738,[28]5.6386,[29]5.5307,[30]5.3570,[31]5.2739,[32]5.3031,[33]5.2486,[34]5.2887,[35]5.3046,[36]5.3362,[37]5.3191,[38]5.3193,[39]5.3439,[40]5.3798,[41]5.3973,[42]5.4398,[43]5.4031,[44]5.4433,[45]5.4429,[46]5.4097,[47]5.4334,[48]5.4142,[49]5.4021,[50]5.3661,[51]5.3801,[52]5.3764,[53]5.4184,[54]5.4072,[55]5.3937,[56]5.4206,[57]5.4382,[58]5.4544,[59]5.4711,[60]5.5030,[61]5.4940,[62]5.5545,[63]5.5759,[64]5.5782,[65]5.6078,[66]5.6046,[67]5.6132,[68]5.6288,[69]5.6525,[70]5.6789,[71]5.7012,[72]5.7352,[73]5.7758,[74]5.7811,[75]5.7850,[76]5.7937,[77]5.8052,[78]5.7914,[79]5.8198,[80]5.8168,[81]5.8321,[82]5.8398,[83]5.7918,[84]5.7946,[85]5.7940,[86]5.7804,[87]5.7329,[88]5.7193,[89]5.7046,[90]5.6888,[91]5.7118,[92]5.7031,[93]5.6973,[94]5.6979,[95]5.7305,[96]5.7293,[97]5.7267,[98]5.7199,[99]5.7056,[100]5.7005,[101]5.7222,[102]5.7176,[103]5.7371,[104]5.7421,[105]5.7402,[106]5.7571,[107]5.7603,[108]5.7683,[109]5.7672,[110]5.7628,[111]5.7838,[112]5.8061,[113]5.8043,[114]5.8025,[115]5.8081,[116]5.8000,[117]5.8072,[118]5.8323,[119]5.8557,[120]5.8899,[121]5.9036,[122]5.9262,[123]5.9611,[124]5.9752,[125]5.9661,[126]6.0024,[127]6.0339,[128]6.0583,[129]6.0453,[130]6.0491,[131]6.0429,[132]6.0352,[133]6.0147,[134]6.0197,[135]6.0119,[136]5.9990,[137]5.9899,[138]5.9669,[139]5.9587,[140]5.9516,[141]5.9228,[142]5.9162,[143]5.8873,[144]5.8662,[145]5.8512,[146]5.8402,[147]5.8395,[148]5.8380,[149]5.8297,[150]5.8262,[151]5.8293,[152]5.8183,[153]5.8056,[154]5.7982,[155]5.8032,[156]5.8023,[157]5.8175,[158]5.8185,[159]5.8227,[160]5.8300,[161]5.8414,[162]5.8157,[163]5.8066,[164]5.7872,[165]5.7615,[166]5.7366,[167]5.7018,[168]5.6776,[169]5.6686,[170]5.6591,[171]5.6381,[172]5.6227,[173]5.6066,[174]5.5824,[175]5.5619,[176]5.5499,[177]5.5312,[178]5.5122,[179]5.4977,[180]5.4907,[181]5.4753,[182]5.4583,[183]5.4466,[184]5.4455,[185]5.4420,[186]5.4456,[187]5.4566,[188]5.4579,[189]5.4803,[190]5.4809,[191]5.5010,[192]5.5167,[193]5.5298,[194]5.5432,[195]5.5652,[196]5.5787,[197]5.5996,[198]5.6119,[199]5.6158,[200]5.6211,[201]5.6152,[202]5.6305,[203]5.6400,[204]5.6377,[205]5.6518,[206]5.6578,[207]5.6537,[208]5.6669,[209]5.6712,[210]5.6753,[211]5.6881,[212]5.6958,[213]5.7033,[214]5.7065,[215]5.7077,[216]5.7186,[217]5.7336,[218]5.7462,[219]5.7446,[220]5.7430,[221]5.7343,[222]5.7304,[223]5.7209,[224]5.7141,[225]5.7087,[226]5.7265,[227]5.7304,[228]5.7380,[229]5.7426,[230]5.7369,[231]5.7487,[232]5.7380,[233]5.7210,[234]5.7059,[235]5.6820,[236]5.6794,[237]5.6711,[238]5.6748,[239]5.6639,[240]5.6540,[241]5.6547,[242]5.6552,[243]5.6521,[244]5.6418,[245]5.6389,[246]5.6281,[247]5.6184,[248]5.6122,[249]5.6094,[250]5.6134,[251]5.6047,[252]5.6015,[253]5.5930,[254]5.5861,[255]5.5745,[256]5.5577,[257]5.5464,[258]5.5379,[259]5.5360,[260]5.5281,[261]5.5234,[262]5.5196,[263]5.5132,[264]5.4880,[265]5.4891,[266]5.4855,[267]5.4798,[268]5.4881,[269]5.4898,[270]5.4923,[271]5.5008,[272]5.5048,[273]5.5066,[274]5.5055,[275]5.5106,[276]5.5165,[277]5.5285,[278]5.5376,[279]5.5451,[280]5.5482,[281]5.5584,[282]5.5649,[283]5.5780,[284]5.5876,[285]5.5969,[286]5.6099,[287]5.6085,[288]5.6138,[289]5.6064,[290]5.5959,[291]5.5852,[292]5.5742,[293]5.5635,[294]5.5650,[295]5.5655,[296]5.5713,[297]5.5717,[298]5.5760,[299]5.5750,[300]5.5669,[301]5.5683,[302]5.5637,[303]5.5564,[304]5.5486,[305]5.5461,[306]5.5347,[307]5.5373,[308]5.5376,[309]5.5261,[310]5.5231,[311]5.5192,[312]5.5215,[313]5.5185,[314]5.5199,[315]5.5058,[316]5.5025,[317]5.4878,[318]5.4697,[319]5.4834,[320]5.4943,[321]5.4977,[322]5.4925,[323]5.4887,[324]5.4899,[325]5.5017,[326]5.5030,[327]5.5049,[328]5.5085,[329]5.5123,[330]5.5162,[331]5.5278,[332]5.5243,[333]5.5315,[334]5.5268,[335]5.5211,[336]5.5234,[337]5.5223,[338]5.5222,[339]5.5185,[340]5.5132,[341]5.5180,[342]5.5199,[343]5.5236,[344]5.5245,[345]5.5255,[346]5.5233,[347]5.5262,[348]5.5283,[349]5.5314,[350]5.5305,[351]5.5303,[352]5.5297,[353]5.5242,[354]5.5238,[355]5.5298,[356]5.5350,[357]5.5323,[358]5.5420,[359]5.5458,[360]5.5433,[361]5.5427,[362]5.5507,[363]5.5611,[364]5.5675,[365]5.5717,[366]5.5741,[367]5.5817,[368]5.5797,[369]5.5816,[370]5.5846,[371]5.5795,[372]5.5846,[373]5.5890,[374]5.5876,[375]5.5875,[376]5.5952,[377]5.5923,[378]5.5950,[379]5.5998,[380]5.5939,[381]5.5917,[382]5.5877,[383]5.5872,[384]5.5880,[385]5.5881,[386]5.5874,[387]5.5896,[388]5.5870,[389]5.5845,[390]5.5792,[391]5.5740,[392]5.5725,[393]5.5746,[394]5.5791,[395]5.5775,[396]5.5718,[397]5.5804,[398]5.5867,[399]5.5953,[400]5.5946,[401]5.5957,[402]5.5981,[403]5.6007,[404]5.6063,[405]5.6021,[406]5.6002,[407]5.6029,[408]5.6044,[409]5.6164,[410]5.6269,[411]5.6376,[412]5.6538,[413]5.6655,[414]5.6725,[415]5.6791,[416]5.6871,[417]5.6974,[418]5.7001,[419]5.7059,[420]5.7138,[421]5.7247,[422]5.7293,[423]5.7369,[424]5.7471,[425]5.7567,[426]5.7647,[427]5.7691,[428]5.7773,[429]5.7807,[430]5.7896,[431]5.8027,[432]5.8053,[433]5.8037,[434]5.8000,[435]5.8028,[436]5.8056,[437]5.8156,[438]5.8245,[439]5.8218,[440]5.8220,[441]5.8191,[442]5.8181,[443]5.8192,[444]5.8205,[445]5.8189,[446]5.8215,[447]5.8235,[448]5.8271,[449]5.8256,[450]5.8267,[451]5.8229,[452]5.8228,[453]5.8164,[454]5.8121,[455]5.8147,[456]5.8194,[457]5.8227,[458]5.8217,[459]5.8218,[460]5.8309,[461]5.8296,[462]5.8304,[463]5.8341,[464]5.8333,[465]5.8324,[466]5.8270,[467]5.8310,[468]5.8332,[469]5.8364,[470]5.8370,[471]5.8349,[472]5.8407,[473]5.8359,[474]5.8398,[475]5.8379,[476]5.8406,[477]5.8360,[478]5.8367,[479]5.8466,[480]5.8527,[481]5.8555,[482]5.8522,[483]5.8507,[484]5.8540,[485]5.8545,[486]5.8508,[487]5.8521,[488]5.8505,[489]5.8475,[490]5.8483,[491]5.8478,[492]5.8454,[493]5.8431,[494]5.8416,[495]5.8417,[496]5.8396,[497]5.8369,[498]5.8371,[499]5.8337,[500]5.8258,[501]5.8212,[502]5.8232,[503]5.8241,[504]5.8169,[505]5.8203,[506]5.8212,[507]5.8141,[508]5.8091,[509]5.8083,[510]5.8091,[511]5.8131,[512]5.8154,[513]5.8173,[514]5.8225,[515]5.8179,[516]5.8173,[517]5.8182,[518]5.8179,[519]5.8203,[520]5.8210,[521]5.8219,[522]5.8243,[523]5.8248,[524]5.8310,[525]5.8341,[526]5.8345,[527]5.8370,[528]5.8325,[529]5.8331,[530]5.8275,[531]5.8264,[532]5.8323,[533]5.8348,[534]5.8334,[535]5.8360,[536]5.8321,[537]5.8301,[538]5.8352,[539]5.8361,[540]5.8379,[541]5.8401,[542]5.8395,[543]5.8411,[544]5.8416,[545]5.8399,[546]5.8406,[547]5.8373,[548]5.8312,[549]5.8311,[550]5.8289,[551]5.8265,[552]5.8247,[553]5.8215,[554]5.8193,[555]5.8165,[556]5.8158,[557]5.8202,[558]5.8175,[559]5.8172,[560]5.8147,[561]5.8152,[562]5.8122,[563]5.8122,[564]5.8171,[565]5.8188,[566]5.8196,[567]5.8192,[568]5.8199,[569]5.8186,[570]5.8210,[571]5.8222,[572]5.8217,[573]5.8220,[574]5.8184,[575]5.8171,[576]5.8169,[577]5.8154,[578]5.8140,[579]5.8136,[580]5.8075,[581]5.8045,[582]5.8050,[583]5.8072,[584]5.8078,[585]5.7998,[586]5.7930,[587]5.7930,[588]5.7967,[589]5.8017,[590]5.8037,[591]5.8058,[592]5.8042,[593]5.8020,[594]5.8026,[595]5.8006,[596]5.8040,[597]5.8015,[598]5.7991,[599]5.8010,[600]5.8000,[601]5.7994,[602]5.8026,[603]5.8041,[604]5.8053,[605]5.8082,[606]5.8100,[607]5.8101,[608]5.8069,[609]5.8072,[610]5.8110,[611]5.8088,[612]5.8104,[613]5.8066,[614]5.8014,[615]5.7944,[616]5.7954,[617]5.7883,[618]5.7826,[619]5.7770,[620]5.7639,[621]5.7573,[622]5.7555,[623]5.7561,[624]5.7553,[625]5.7558,[626]5.7544,[627]5.7571,[628]5.7577,[629]5.7569,[630]5.7603,[631]5.7645,[632]5.7705,[633]5.7687,[634]5.7722,[635]5.7732,[636]5.7700,[637]5.7671,[638]5.7688,[639]5.7653,[640]5.7670,[641]5.7673,[642]5.7735,[643]5.7755,[644]5.7762,[645]5.7748,[646]5.7788,[647]5.7770,[648]5.7779,[649]5.7775,[650]5.7791,[651]5.7837,[652]5.7848,[653]5.7883,[654]5.7821,[655]5.7811,

llama_print_timings:        load time =  5546.65 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time = 649534.48 ms / 335360 tokens (    1.94 ms per token,   516.31 tokens per second)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time = 677274.58 ms

The correct value of rope_freq_base seems to be in params.json as "rope_theta": 1000000, but looks like convert.py doesn't export this value currently. Is there a metadata in gguf for this parameter?

jxy · 2023-08-24T16:20:16Z

We need to add rope-freq-base to GGUF.

ggerganov · 2023-08-24T16:25:10Z

We don't have one yet - we should introduce and add it to the spec: ggerganov/ggml#302

Btw, the vocab now is 32016.
The K-quants require the Y dimension also to be divisible by 256. Why is this needed? Don't we need just the X dimension (i.e. ne[0]) to be divisible by 256?

llama.cpp/llama.cpp

Lines 4533 to 4541 in ef955fb

    
           if (new_type == GGML_TYPE_Q2_K || new_type == GGML_TYPE_Q3_K || new_type == GGML_TYPE_Q4_K || 
        
               new_type == GGML_TYPE_Q5_K || new_type == GGML_TYPE_Q6_K) { 
        
               int nx = tensor->ne[0]; 
        
               int ny = tensor->ne[1]; 
        
               if (nx % QK_K != 0 || ny % QK_K != 0) { 
        
                   LLAMA_LOG_INFO("\n\nTensor sizes %d x %d are not divisible by %d, required for k-quants.\n",nx,ny,QK_K); 
        
                   convert_incompatible_tensor = true; 
        
               } 
        
           }

slaren · 2023-08-24T16:29:44Z

Btw, the vocab now is 32016.

Is this with all models? For 34b the sizes of the tensors suggest a n_vocab of 32000:

tok_embeddings.weight                            -> token_embd.weight                        | UnquantizedDataType(name='BF16') | [32000, 8192]
norm.weight                                      -> output_norm.weight                       | UnquantizedDataType(name='BF16') | [8192]
output.weight                                    -> output.weight                            | UnquantizedDataType(name='BF16') | [32000, 8192]

jxy · 2023-08-24T16:49:03Z

7B and 13B are tuned with infix, which uses special tokens.

ggerganov · 2023-08-24T17:15:23Z

Here are some results on M2 Ultra:

model	backend	size	test	t/s
codellama 7B F16	Metal	13G	pp 512	663.32 ± 1.10
codellama 7B mostly Q8_0	Metal	6.7G	pp 512	631.39 ± 0.30
codellama 7B mostly Q6_K	Metal	5.3G	pp 512	562.97 ± 0.54
codellama 7B mostly Q5_K - Medium	Metal	4.6G	pp 512	562.58 ± 0.22
codellama 7B mostly Q4_K - Medium	Metal	3.9G	pp 512	589.01 ± 0.09
codellama 7B mostly Q4_1	Metal	3.9G	pp 512	635.65 ± 0.30
codellama 7B mostly Q4_0	Metal	3.5G	pp 512	633.74 ± 0.30
codellama 7B mostly Q3_K - Medium	Metal	3.2G	pp 512	582.17 ± 0.36
codellama 7B mostly Q2_K	Metal	2.8G	pp 512	581.32 ± 0.99
codellama 7B F16	Metal	13G	tg 64	29.61 ± 0.05
codellama 7B mostly Q8_0	Metal	6.7G	tg 64	61.56 ± 0.14
codellama 7B mostly Q6_K	Metal	5.3G	tg 64	67.49 ± 0.03
codellama 7B mostly Q5_K - Medium	Metal	4.6G	tg 64	68.46 ± 0.15
codellama 7B mostly Q4_K - Medium	Metal	3.9G	tg 64	79.03 ± 0.03
codellama 7B mostly Q4_1	Metal	3.9G	tg 64	82.60 ± 0.12
codellama 7B mostly Q4_0	Metal	3.5G	tg 64	87.73 ± 0.30
codellama 7B mostly Q3_K - Medium	Metal	3.2G	tg 64	75.79 ± 0.08
codellama 7B mostly Q2_K	Metal	2.8G	tg 64	74.56 ± 0.12

build: 01f2224 (1053)

model	backend	size	test	t/s
codellama 13B F16	Metal	24G	pp 512	390.99 ± 0.06
codellama 13B mostly Q8_0	Metal	13G	pp 512	368.56 ± 0.22
codellama 13B mostly Q6_K	Metal	10G	pp 512	324.54 ± 0.04
codellama 13B mostly Q5_K - Medium	Metal	8.8G	pp 512	321.51 ± 0.05
codellama 13B mostly Q4_K - Medium	Metal	7.5G	pp 512	340.60 ± 0.12
codellama 13B mostly Q4_1	Metal	7.6G	pp 512	371.14 ± 0.05
codellama 13B mostly Q4_0	Metal	6.8G	pp 512	369.43 ± 0.06
codellama 13B mostly Q3_K - Medium	Metal	6.1G	pp 512	336.34 ± 0.14
codellama 13B mostly Q2_K	Metal	5.3G	pp 512	336.66 ± 0.08
codellama 13B F16	Metal	24G	tg 64	16.44 ± 0.02
codellama 13B mostly Q8_0	Metal	13G	tg 64	36.69 ± 0.05
codellama 13B mostly Q6_K	Metal	10G	tg 64	41.11 ± 0.07
codellama 13B mostly Q5_K - Medium	Metal	8.8G	tg 64	42.46 ± 0.03
codellama 13B mostly Q4_K - Medium	Metal	7.5G	tg 64	48.80 ± 0.04
codellama 13B mostly Q4_1	Metal	7.6G	tg 64	51.26 ± 0.05
codellama 13B mostly Q4_0	Metal	6.8G	tg 64	55.35 ± 0.09
codellama 13B mostly Q3_K - Medium	Metal	6.1G	tg 64	46.22 ± 0.03
codellama 13B mostly Q2_K	Metal	5.3G	tg 64	47.41 ± 0.05

build: 01f2224 (1053)

model	backend	size	test	t/s
codellama 34B F16	Metal	63G	pp 512	149.52 ± 0.34
codellama 34B mostly Q8_0	Metal	33G	pp 512	140.89 ± 0.03
codellama 34B mostly Q6_K	Metal	26G	pp 512	123.76 ± 0.04
codellama 34B mostly Q5_K - Medium	Metal	22G	pp 512	123.63 ± 0.01
codellama 34B mostly Q4_K - Medium	Metal	19G	pp 512	130.65 ± 0.01
codellama 34B mostly Q4_1	Metal	20G	pp 512	142.01 ± 0.03
codellama 34B mostly Q4_0	Metal	18G	pp 512	141.55 ± 0.02
codellama 34B mostly Q3_K - Medium	Metal	15G	pp 512	128.37 ± 0.00
codellama 34B mostly Q2_K	Metal	13G	pp 512	128.07 ± 0.04
codellama 34B F16	Metal	63G	tg 64	7.32 ± 0.00
codellama 34B mostly Q8_0	Metal	33G	tg 64	16.85 ± 0.01
codellama 34B mostly Q6_K	Metal	26G	tg 64	19.22 ± 0.00
codellama 34B mostly Q5_K - Medium	Metal	22G	tg 64	20.84 ± 0.01
codellama 34B mostly Q4_K - Medium	Metal	19G	tg 64	25.36 ± 0.00
codellama 34B mostly Q4_1	Metal	20G	tg 64	25.75 ± 0.01
codellama 34B mostly Q4_0	Metal	18G	tg 64	27.93 ± 0.01
codellama 34B mostly Q3_K - Medium	Metal	15G	tg 64	23.89 ± 0.00
codellama 34B mostly Q2_K	Metal	13G	tg 64	23.52 ± 0.04

build: 01f2224 (1053)

Anyone working on adding the rope base to the meta data?
If not, I'll add it in about 15 mins

slaren · 2023-08-24T17:17:52Z

Anyone working on adding the rope base to the meta data?

I am not working on it, I was waiting for some input since I don't know all the details of gguf.

ggerganov · 2023-08-24T17:22:50Z

You have to add the KV constant in the gguf.py and in llama.cpp similar to LLM_KV_ROPE_SCALE_LINEAR.
Just grep for all uses of LLM_KV_ROPE_SCALE_LINEAR and replicate as a new KV, for example LLK_KV_ROPE_BASE

And in convert.py in add_meta_arch() add a new call:

self.gguf.add_rope_base(params.f_rope_base)

TheBloke · 2023-08-24T17:32:38Z

Does this affect all the new Code Llama models or only 34B? Something I'm ready elsewhere suggests all, is that right?

slaren · 2023-08-24T17:37:11Z

Does this affect all the new Code Llama models or only 34B? Something I'm ready elsewhere suggests all, is that right?

My plan is to only affect CodeLLama model, the rope freq base will be added as an optional metadata that will be omitted for the other models, so they won't change. But that may change after the review.

ggerganov · 2023-08-24T17:37:20Z

All new Code Llama are affacted - without this change one would need to provide the rope base manually, which is inconvenient

slaren · 2023-08-24T18:26:51Z

@TheBloke the change has been merged, it should be safe to convert the models now.

ggerganov · 2023-08-24T18:31:00Z

Just a heads up - I expect in the near future to tune the quantum mixtures to some extend.
For example, currently Q4_0 does not use a high-bit output tensor (e.g. Q6_K) because the tensor has ne[1] not a multiple of 256. Not sure why we have this restriction, but probably we'll fix it and it would result in a new quantum model

slaren · 2023-08-24T18:33:38Z

Additionally, it might be a good idea to convert them with --ctx 16384, since the converter will default to 4096. Maybe we should change convert.py to use this value automatically if theta_scale is 1e6?

slaren · 2023-08-24T18:57:16Z

@TheBloke I tried your codellama-7b-python.Q4_K_M.gguf and it fails with this error:

error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected  4096, 32016, got  4096, 32000,     1,     1

I tried converting this model myself, and it works for me, so I am not sure what went wrong there. Maybe you used a different tokenizer.model?

Jipok · 2023-08-24T19:09:04Z

@slaren Q8 model works for me:
./main -m ~/Downloads/codellama-7b-instruct.Q8_0.gguf -e -p "<s>[INST] How does hpa work in kubernetes?[/INST]" -s 0 --temp 0 --rope-freq-base 1e6

TheBloke · 2023-08-24T19:09:35Z

ugh yeah I see what went wrong, I converted to HF first and the convert_llama_weights_to_hf reads tokenizer.model from the root directory, not the model weight dir, so I must have done them all with the same tokenizer.model

I'm re-doing everything now

TheBloke · 2023-08-24T19:54:38Z

I'm confused re rope_frequency_Base - I have rope_theta in my config.json but convert.py is not picking it up?

(pytorch2)  ubuntu@a10:/workspace/git/gguf-llama (master ✔) ᐅ grep rope_theta /workspace/models_codellama/7B/config.json
    "rope_theta": 1000000

(pytorch2)  ubuntu@a10:/workspace/git/gguf-llama (master ✔) ᐅ python3 ./convert.py --outtype f16 --outfile /workspace/process/codellama-7b/gguf/codellama-7b.fp16.gguf /workspace/models_codellama/7B
Loading model file /workspace/models_codellama/7B/model-00001-of-00002.safetensors
Loading model file /workspace/models_codellama/7B/model-00001-of-00002.safetensors
Loading model file /workspace/models_codellama/7B/model-00002-of-00002.safetensors
params = Params(n_vocab=32016, n_embd=4096, n_mult=5504, n_layer=32, n_ctx=16384, n_ff=11008, n_head=32, n_head_kv=32, f_norm_eps=1e-05, f_rope_freq_base=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('/workspace/models_codellama/7B'))
Loading vocab file '/workspace/models_codellama/7B/tokenizer.model', type 'spm'

f_rope_freq_base=None ? And then when I do inference on this fp16

llm_load_print_meta: format         = GGUF V1 (latest)
llm_load_print_meta: arch           = llama
llm_load_print_meta: vocab type     = SPM
llm_load_print_meta: n_vocab        = 32016
llm_load_print_meta: n_merges       = 0
llm_load_print_meta: n_ctx_train    = 16384
llm_load_print_meta: n_ctx          = 4096
llm_load_print_meta: n_embd         = 4096
llm_load_print_meta: n_head         = 32
llm_load_print_meta: n_head_kv      = 32
llm_load_print_meta: n_layer        = 32
llm_load_print_meta: n_rot          = 128
llm_load_print_meta: n_gqa          = 1
llm_load_print_meta: f_norm_eps     = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff           = 11008
llm_load_print_meta: freq_base      = 10000.0
llm_load_print_meta: freq_scale     = 1
llm_load_print_meta: model type     = 7B
llm_load_print_meta: model ftype    = mostly F16
llm_load_print_meta: model size     = 6.74 B
llm_load_print_meta: general.name   = LLaMA
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.09 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required  = 12853.35 MB (+ 2048.00 MB per state)
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/35 layers to GPU
llm_load_tensors: VRAM used: 0 MB

freq_base is 10,000 still.

Am I doing something wrong? or misunderstanding something?

TheBloke · 2023-08-24T19:59:26Z

Oh I am misunderstanding - that's the section of convert.py that reads params.json, not config.json!

OK what am I meant to do when making a model from HF format, how do I set the correct rope_freq_base then?

TheBloke · 2023-08-24T20:00:03Z

Maybe I should just make the models from PTH, I feel like I'm making life much harder for myself trying to go PTH -> HF -> GGUF

slaren · 2023-08-24T20:01:18Z

It's not supported for HF models. If you can point me to an HF model, I can try to add it, assuming that the parameter is somewhere in config.json or params.json.

TheBloke · 2023-08-24T20:03:02Z

Yeah I guess it's not officially in there. We don't know how HF are going to offficially do this yet.

User emozilla has created custom Llama modelling code which uses rope_theta in config.json and I'm duplicating that for my repos, eg as seen here: https://huggingface.co/TheBloke/CodeLlama-7B-Python-fp16/blob/main/config.json

But whether HF will stick with that I have no idea. Are you OK with supporting that temporarily at least?

slaren · 2023-08-24T20:05:17Z

Yeah no problem, if everything goes well I'll open a PR in a short while.

TheBloke · 2023-08-24T20:12:16Z

Thanks so much!

And just to triple check I'm not screwing anything else up:

7B/13B/34B = vocab 32016
*-Instruct = vocab 32016
*-Python = vocab 32000

Is that right? That seems to work fine just want to be extra sure

slaren · 2023-08-24T20:19:16Z

The 34B base model has a vocab of 32000, only 7B and 13B should have the extended vocab.

I am not sure about the instruct and python models yet, I can check, but it's going to take a while, I am running out of disk space

TheBloke · 2023-08-24T20:20:12Z

Ok thanks! I've not looked at 34B yet, will be shortly.

Don't worry it's fine, I was just wanting a sanity check if you already knew. I've done some tests converting direct from PTH and it shows what I described above (at least for 7B and 13B) so I'm confident that must be correct

… models (ggerganov#2768)

fix convert.py for codellama, add llama 34B to the list of recognized…

f06caa3

… models

ggerganov approved these changes Aug 24, 2023

View reviewed changes

slaren merged commit fea95c6 into master Aug 24, 2023

slaren deleted the codellama-fixes branch August 24, 2023 15:44

slaren mentioned this pull request Aug 24, 2023

GGUF file format specification ggerganov/ggml#302

Merged

wsxiaoys mentioned this pull request Aug 28, 2023

Llama support TabbyML/tabby#352

Closed

akawrykow pushed a commit to akawrykow/llama.cpp that referenced this pull request Aug 29, 2023

fix convert.py for codellama, add llama 34B to the list of recognized…

0b09988

… models (ggerganov#2768)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes for codellama #2768

Fixes for codellama #2768

slaren commented Aug 24, 2023 •

edited

Loading

slaren commented Aug 24, 2023 •

edited

Loading

slaren commented Aug 24, 2023 •

edited

Loading

slaren commented Aug 24, 2023 •

edited

Loading

ggerganov commented Aug 24, 2023

slaren commented Aug 24, 2023

jxy commented Aug 24, 2023

ggerganov commented Aug 24, 2023

slaren commented Aug 24, 2023

jxy commented Aug 24, 2023

ggerganov commented Aug 24, 2023 •

edited

Loading

slaren commented Aug 24, 2023

ggerganov commented Aug 24, 2023 •

edited

Loading

TheBloke commented Aug 24, 2023

slaren commented Aug 24, 2023

ggerganov commented Aug 24, 2023

slaren commented Aug 24, 2023

ggerganov commented Aug 24, 2023 •

edited

Loading

slaren commented Aug 24, 2023

slaren commented Aug 24, 2023

Jipok commented Aug 24, 2023

TheBloke commented Aug 24, 2023 •

edited

Loading

TheBloke commented Aug 24, 2023 •

edited

Loading

TheBloke commented Aug 24, 2023

TheBloke commented Aug 24, 2023

slaren commented Aug 24, 2023

TheBloke commented Aug 24, 2023

slaren commented Aug 24, 2023

TheBloke commented Aug 24, 2023

slaren commented Aug 24, 2023

TheBloke commented Aug 24, 2023

Fixes for codellama #2768

Fixes for codellama #2768

Conversation

slaren commented Aug 24, 2023 • edited Loading

slaren commented Aug 24, 2023 • edited Loading

slaren commented Aug 24, 2023 • edited Loading

slaren commented Aug 24, 2023 • edited Loading

ggerganov commented Aug 24, 2023

slaren commented Aug 24, 2023

jxy commented Aug 24, 2023

ggerganov commented Aug 24, 2023

slaren commented Aug 24, 2023

jxy commented Aug 24, 2023

ggerganov commented Aug 24, 2023 • edited Loading

slaren commented Aug 24, 2023

ggerganov commented Aug 24, 2023 • edited Loading

TheBloke commented Aug 24, 2023

slaren commented Aug 24, 2023

ggerganov commented Aug 24, 2023

slaren commented Aug 24, 2023

ggerganov commented Aug 24, 2023 • edited Loading

slaren commented Aug 24, 2023

slaren commented Aug 24, 2023

Jipok commented Aug 24, 2023

TheBloke commented Aug 24, 2023 • edited Loading

TheBloke commented Aug 24, 2023 • edited Loading

TheBloke commented Aug 24, 2023

TheBloke commented Aug 24, 2023

slaren commented Aug 24, 2023

TheBloke commented Aug 24, 2023

slaren commented Aug 24, 2023

TheBloke commented Aug 24, 2023

slaren commented Aug 24, 2023

TheBloke commented Aug 24, 2023

slaren commented Aug 24, 2023 •

edited

Loading

slaren commented Aug 24, 2023 •

edited

Loading

slaren commented Aug 24, 2023 •

edited

Loading

slaren commented Aug 24, 2023 •

edited

Loading

ggerganov commented Aug 24, 2023 •

edited

Loading

ggerganov commented Aug 24, 2023 •

edited

Loading

ggerganov commented Aug 24, 2023 •

edited

Loading

TheBloke commented Aug 24, 2023 •

edited

Loading

TheBloke commented Aug 24, 2023 •

edited

Loading