-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for codellama #2768
Fixes for codellama #2768
Conversation
Short perplexity test: 34b seems to be increasing a bit too much, might have to look into it. |
Final perplexity for 34b q4_k_m is llama_model_loader: - kv 0: general.architecture str
llama_model_loader: - kv 1: general.name str
llama_model_loader: - kv 2: llama.context_length u32
llama_model_loader: - kv 3: llama.embedding_length u32
llama_model_loader: - kv 4: llama.block_count u32
llama_model_loader: - kv 5: llama.feed_forward_length u32
llama_model_loader: - kv 6: llama.rope.dimension_count u32
llama_model_loader: - kv 7: llama.attention.head_count u32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32
llama_model_loader: - kv 10: general.file_type u32
llama_model_loader: - kv 11: tokenizer.ggml.model str
llama_model_loader: - kv 12: tokenizer.ggml.tokens arr
llama_model_loader: - kv 13: tokenizer.ggml.scores arr
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr
llama_model_loader: - kv 15: general.quantization_version u32
llama_model_loader: - type f32: 97 tensors
llama_model_loader: - type q4_K: 289 tensors
llama_model_loader: - type q6_K: 49 tensors
llm_load_print_meta: format = GGUF V1 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_ctx = 512
llm_load_print_meta: n_embd = 8192
llm_load_print_meta: n_head = 64
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 48
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 8
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff = 22016
llm_load_print_meta: freq_base = 10000.0
llm_load_print_meta: freq_scale = 1
llm_load_print_meta: model type = 34B
llm_load_print_meta: model ftype = mostly Q4_K - Medium
llm_load_print_meta: model size = 33.74 B
llm_load_print_meta: general.name = LLaMA v2
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.13 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required = 140.76 MB (+ 96.00 MB per state)
llm_load_tensors: offloading 48 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloading v cache to GPU
llm_load_tensors: offloading k cache to GPU
llm_load_tensors: offloaded 51/51 layers to GPU
llm_load_tensors: VRAM used: 19238 MB
....................................................................................................
llama_new_context_with_model: kv self size = 96.00 MB
llama_new_context_with_model: compute buffer total size = 119.41 MB
llama_new_context_with_model: VRAM scratch buffer: 118.00 MB
system_info: n_threads = 1 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity: tokenizing the input ..
perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 0.99 seconds per pass - ETA 10.77 minutes
[1]5.3722,[2]6.8634,[3]18.5997,[4]18.0402,[5]20.8216,[6]18.8895,[7]24.5365,[8]23.1267,[9]30.7042,[10]32.4732,[11]38.5647,[12]35.2927,[13]35.3977,[14]40.8779,[15]50.0018,[16]44.6572,[17]42.0392,[18]47.4379,[19]42.1442,[20]45.5599,[21]48.5124,[22]50.1700,[23]46.9413,[24]49.1561,[25]48.3779,[26]44.4299,[27]43.9957,[28]42.0548,[29]43.4902,[30]43.0095,[31]43.9184,[32]42.1812,[33]42.3804,[34]44.5481,[35]45.8786,[36]44.1231,[37]45.8371,[38]46.3058,[39]47.8326,[40]49.2811,[41]47.9219,[42]47.5123,[43]48.8565,[44]49.9723,[45]51.3837,[46]52.2847,[47]51.7251,[48]53.2990,[49]55.3860,[50]56.1291,[51]56.6182,[52]56.4147,[53]57.7867,[54]59.2960,[55]56.8689,[56]55.7473,[57]55.4897,[58]55.7443,[59]54.9167,[60]56.6449,[61]57.6218,[62]56.8220,[63]57.7998,[64]58.3952,[65]58.9115,[66]59.2383,[67]60.9043,[68]60.6064,[69]61.1762,[70]61.9318,[71]63.0806,[72]61.8627,[73]62.5991,[74]63.5537,[75]64.4164,[76]64.6967,[77]64.8567,[78]64.9029,[79]65.9337,[80]64.3252,[81]64.6137,[82]63.1367,[83]61.4556,[84]62.1606,[85]62.9551,[86]62.4851,[87]62.9987,[88]62.0748,[89]62.3586,[90]62.7458,[91]63.4891,[92]64.8459,[93]65.2256,[94]65.7529,[95]66.7645,[96]67.4958,[97]66.2010,[98]66.1251,[99]65.4783,[100]66.1518,[101]66.7187,[102]67.5872,[103]66.4947,[104]66.4975,[105]67.1665,[106]67.8499,[107]67.4641,[108]68.1799,[109]67.6239,[110]68.2992,[111]69.2309,[112]69.8823,[113]68.8550,[114]68.9047,[115]69.3004,[116]68.2032,[117]67.1597,[118]67.8393,[119]68.4106,[120]69.3725,[121]69.1116,[122]68.4353,[123]68.2800,[124]68.0477,[125]67.9936,[126]68.6714,[127]68.1732,[128]68.9529,[129]69.4866,[130]70.0720,[131]70.8555,[132]69.7161,[133]68.5270,[134]69.1271,[135]68.9255,[136]69.2516,[137]69.4818,[138]69.5431,[139]69.7582,[140]70.5190,[141]70.7511,[142]69.8141,[143]70.1926,[144]70.3150,[145]70.4458,[146]69.4104,[147]69.6069,[148]69.8050,[149]69.8345,[150]69.0615,[151]69.2670,[152]69.2654,[153]69.4935,[154]69.6644,[155]70.3173,[156]69.5862,[157]69.7767,[158]69.1744,[159]69.5966,[160]68.7871,[161]68.4688,[162]68.1816,[163]67.2025,[164]66.6679,[165]66.4302,[166]66.5904,[167]65.4620,[168]64.5073,[169]64.7642,[170]64.6514,[171]64.4600,[172]64.4797,[173]64.3071,[174]64.3380,[175]64.4337,[176]64.5575,[177]63.7445,[178]64.0980,[179]64.1851,[180]64.3830,[181]63.7599,[182]63.8480,[183]64.0105,[184]64.2083,[185]63.7754,[186]63.9489,[187]63.6941,[188]64.3531,[189]64.6876,[190]64.3726,[191]65.3649,[192]65.8998,[193]66.5370,[194]67.0823,[195]67.5497,[196]68.0094,[197]68.3190,[198]68.0144,[199]67.5266,[200]67.0679,[201]66.3224,[202]66.9096,[203]66.4625,[204]67.1994,[205]66.8987,[206]67.5093,[207]67.6564,[208]68.1417,[209]68.3621,[210]68.5502,[211]69.0086,[212]69.3362,[213]69.2351,[214]68.7124,[215]68.7493,[216]68.9633,[217]68.8705,[218]68.4066,[219]68.6391,[220]68.1637,[221]68.5218,[222]68.7122,[223]68.7446,[224]68.9553,[225]68.8326,[226]69.4308,[227]69.9229,[228]70.4930,[229]70.8096,[230]70.3801,[231]70.7470,[232]70.8915,[233]70.7748,[234]70.7758,[235]70.8871,[236]70.1911,[237]70.4651,[238]70.7678,[239]70.8104,[240]70.7656,[241]70.3186,[242]70.6932,[243]70.8081,[244]70.1327,[245]70.0990,[246]69.4259,[247]69.5657,[248]69.5496,[249]69.7259,[250]69.1864,[251]69.0940,[252]69.1617,[253]69.4336,[254]69.4729,[255]69.4173,[256]68.8389,[257]68.5638,[258]68.8591,[259]69.0085,[260]69.0906,[261]68.7520,[262]68.6570,[263]68.6046,[264]68.5303,[265]68.7943,[266]68.6624,[267]68.6287,[268]68.2216,[269]67.7220,[270]67.7484,[271]67.3327,[272]66.8540,[273]67.0361,[274]67.0683,[275]67.4446,[276]67.2734,[277]67.5035,[278]67.3129,[279]67.5298,[280]67.7587,[281]67.4256,[282]67.5814,[283]67.8374,[284]67.9660,[285]67.5812,[286]67.7368,[287]67.6688,[288]67.2755,[289]66.8072,[290]66.2715,[291]66.3255,[292]66.4076,[293]65.8640,[294]66.0011,[295]66.0790,[296]66.2347,[297]66.3860,[298]66.1118,[299]66.2495,[300]65.6643,[301]65.2227,[302]64.7275,[303]64.6542,[304]64.5785,[305]64.3063,[306]63.8716,[307]63.9514,[308]64.0732,[309]63.8202,[310]63.9167,[311]63.9081,[312]63.5917,[313]63.2475,[314]63.5068,[315]62.9764,[316]62.7250,[317]62.2487,[318]61.6874,[319]61.4673,[320]61.7716,[321]61.9084,[322]61.5076,[323]61.1308,[324]60.8254,[325]60.7589,[326]60.8205,[327]60.4973,[328]60.7845,[329]61.0918,[330]61.3219,[331]61.5470,[332]61.5831,[333]61.8672,[334]62.0101,[335]62.1867,[336]62.1230,[337]62.2017,[338]61.9051,[339]61.9544,[340]62.1529,[341]62.3669,[342]62.0460,[343]62.2432,[344]62.3634,[345]62.0058,[346]62.1173,[347]62.2805,[348]62.3061,[349]62.4143,[350]62.5531,[351]62.6050,[352]62.4007,[353]62.4609,[354]62.7498,[355]63.1266,[356]63.4078,[357]63.4392,[358]63.7693,[359]64.1597,[360]63.7484,[361]63.4239,[362]63.2723,[363]63.6244,[364]63.6976,[365]63.8372,[366]63.8226,[367]64.0798,[368]64.0947,[369]64.2145,[370]64.3675,[371]64.0243,[372]64.1487,[373]64.2855,[374]64.2745,[375]64.3428,[376]64.5512,[377]64.1334,[378]64.3588,[379]64.5686,[380]64.5594,[381]64.6034,[382]64.7722,[383]64.8803,[384]64.7609,[385]64.9052,[386]65.0941,[387]65.2881,[388]65.5088,[389]65.4680,[390]65.2245,[391]65.3933,[392]65.3887,[393]65.1051,[394]64.9781,[395]64.7609,[396]64.4236,[397]64.6313,[398]64.3495,[399]64.5929,[400]64.8183,[401]64.5031,[402]64.8714,[403]64.5638,[404]64.8526,[405]65.0616,[406]65.1516,[407]65.1070,[408]65.0348,[409]65.3795,[410]65.6611,[411]65.9290,[412]65.7917,[413]66.0696,[414]66.2572,[415]66.4053,[416]66.3929,[417]66.6110,[418]66.7518,[419]66.8844,[420]67.1552,[421]67.3992,[422]67.6390,[423]67.4414,[424]67.6983,[425]67.5210,[426]67.7952,[427]67.5891,[428]67.8714,[429]67.8638,[430]67.7827,[431]67.8524,[432]68.0215,[433]68.0250,[434]67.7200,[435]67.8556,[436]67.6048,[437]67.7432,[438]67.8956,[439]67.8467,[440]67.9014,[441]67.9599,[442]68.0346,[443]68.1949,[444]68.3085,[445]68.4728,[446]68.5187,[447]68.7361,[448]68.6779,[449]68.7231,[450]68.4663,[451]68.6147,[452]68.3170,[453]68.3787,[454]68.1536,[455]67.8817,[456]67.9065,[457]68.1159,[458]68.1468,[459]68.3322,[460]68.1498,[461]67.8467,[462]67.5300,[463]67.6604,[464]67.3659,[465]67.1501,[466]66.8277,[467]66.7319,[468]66.5062,[469]66.2973,[470]66.3300,[471]66.0265,[472]66.0738,[473]65.7277,[474]65.4825,[475]65.6804,[476]65.8642,[477]66.0072,[478]65.9554,[479]66.2864,[480]66.4256,[481]66.3546,[482]66.4843,[483]66.3786,[484]66.6304,[485]66.6555,[486]66.3278,[487]66.2141,[488]66.3310,[489]66.0924,[490]66.3018,[491]66.0328,[492]65.7608,[493]65.5610,[494]65.3118,[495]65.3401,[496]65.1191,[497]64.8696,[498]64.6983,[499]64.4070,[500]64.4543,[501]64.2186,[502]64.0064,[503]64.1239,[504]63.9233,[505]63.6977,[506]63.4843,[507]63.6471,[508]63.7544,[509]64.0045,[510]64.0976,[511]63.9092,[512]63.7219,[513]63.8090,[514]63.6563,[515]63.8533,[516]63.7793,[517]63.8389,[518]63.8802,[519]63.9988,[520]64.0833,[521]64.0832,[522]64.0158,[523]64.1246,[524]64.2134,[525]64.3350,[526]64.1216,[527]64.2748,[528]64.0177,[529]64.0979,[530]64.0446,[531]64.1003,[532]63.9388,[533]64.1983,[534]64.0059,[535]64.0748,[536]63.7816,[537]63.9685,[538]64.0965,[539]64.3060,[540]64.1924,[541]64.3957,[542]64.1433,[543]64.1330,[544]64.2264,[545]64.2380,[546]64.2640,[547]64.3215,[548]64.2925,[549]64.3039,[550]64.2860,[551]64.3630,[552]64.5191,[553]64.6287,[554]64.3604,[555]64.0817,[556]64.1676,[557]64.3209,[558]64.3894,[559]64.4420,[560]64.4271,[561]64.4476,[562]64.4579,[563]64.6372,[564]64.6691,[565]64.8215,[566]64.9431,[567]65.0567,[568]64.8633,[569]64.9597,[570]64.7805,[571]64.8904,[572]65.0106,[573]65.1171,[574]65.0246,[575]65.0512,[576]64.8336,[577]64.8709,[578]64.9671,[579]65.1184,[580]65.1459,[581]64.9673,[582]64.7708,[583]64.6237,[584]64.4728,[585]64.1978,[586]64.1859,[587]63.9967,[588]64.1102,[589]63.9630,[590]64.0796,[591]64.1820,[592]64.3731,[593]64.2488,[594]64.0475,[595]64.1433,[596]64.1900,[597]63.9622,[598]63.8459,[599]63.6558,[600]63.8030,[601]63.8747,[602]63.9622,[603]63.8415,[604]63.8371,[605]63.9250,[606]63.7467,[607]63.5332,[608]63.2838,[609]63.3835,[610]63.4859,[611]63.5186,[612]63.7273,[613]63.6170,[614]63.3900,[615]63.3528,[616]63.5236,[617]63.4472,[618]63.3583,[619]63.2271,[620]63.0657,[621]62.8304,[622]62.6378,[623]62.7276,[624]62.7355,[625]62.6736,[626]62.7857,[627]62.7025,[628]62.5853,[629]62.3819,[630]62.4502,[631]62.6082,[632]62.5224,[633]62.5602,[634]62.4200,[635]62.5675,[636]62.6213,[637]62.6924,[638]62.8519,[639]62.9235,[640]63.0890,[641]62.8776,[642]62.8108,[643]62.8572,[644]62.8797,[645]62.7199,[646]62.7847,[647]62.7171,[648]62.5697,[649]62.6333,[650]62.8029,[651]63.0165,[652]63.1887,[653]63.3320,[654]63.3318,[655]63.1600,
llama_print_timings: load time = 5183.42 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: prompt eval time = 650506.37 ms / 335360 tokens ( 1.94 ms per token, 515.54 tokens per second)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: total time = 677581.77 ms |
We have to add |
Ok, that's definitely an issue. The final ppl with 34b q4_k_m llama_model_loader: - kv 0: general.architecture str
llama_model_loader: - kv 1: general.name str
llama_model_loader: - kv 2: llama.context_length u32
llama_model_loader: - kv 3: llama.embedding_length u32
llama_model_loader: - kv 4: llama.block_count u32
llama_model_loader: - kv 5: llama.feed_forward_length u32
llama_model_loader: - kv 6: llama.rope.dimension_count u32
llama_model_loader: - kv 7: llama.attention.head_count u32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32
llama_model_loader: - kv 10: general.file_type u32
llama_model_loader: - kv 11: tokenizer.ggml.model str
llama_model_loader: - kv 12: tokenizer.ggml.tokens arr
llama_model_loader: - kv 13: tokenizer.ggml.scores arr
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr
llama_model_loader: - kv 15: general.quantization_version u32
llama_model_loader: - type f32: 97 tensors
llama_model_loader: - type q4_K: 289 tensors
llama_model_loader: - type q6_K: 49 tensors
llm_load_print_meta: format = GGUF V1 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_ctx = 512
llm_load_print_meta: n_embd = 8192
llm_load_print_meta: n_head = 64
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 48
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 8
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff = 22016
llm_load_print_meta: freq_base = 1000000.0
llm_load_print_meta: freq_scale = 1
llm_load_print_meta: model type = 34B
llm_load_print_meta: model ftype = mostly Q4_K - Medium
llm_load_print_meta: model size = 33.74 B
llm_load_print_meta: general.name = LLaMA v2
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.13 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required = 140.76 MB (+ 96.00 MB per state)
llm_load_tensors: offloading 48 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloading v cache to GPU
llm_load_tensors: offloading k cache to GPU
llm_load_tensors: offloaded 51/51 layers to GPU
llm_load_tensors: VRAM used: 19238 MB
....................................................................................................
llama_new_context_with_model: kv self size = 96.00 MB
llama_new_context_with_model: compute buffer total size = 119.41 MB
llama_new_context_with_model: VRAM scratch buffer: 118.00 MB
system_info: n_threads = 1 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity: tokenizing the input ..
perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 1.00 seconds per pass - ETA 10.88 minutes
[1]4.2372,[2]4.7320,[3]5.5051,[4]6.3183,[5]6.4880,[6]6.4093,[7]6.5309,[8]6.5565,[9]6.8206,[10]7.0476,[11]7.2814,[12]7.3205,[13]7.2565,[14]7.3485,[15]7.5977,[16]7.2239,[17]7.0971,[18]7.0294,[19]6.6729,[20]6.6685,[21]6.5518,[22]6.3390,[23]6.2829,[24]6.1878,[25]6.1765,[26]5.9847,[27]5.7738,[28]5.6386,[29]5.5307,[30]5.3570,[31]5.2739,[32]5.3031,[33]5.2486,[34]5.2887,[35]5.3046,[36]5.3362,[37]5.3191,[38]5.3193,[39]5.3439,[40]5.3798,[41]5.3973,[42]5.4398,[43]5.4031,[44]5.4433,[45]5.4429,[46]5.4097,[47]5.4334,[48]5.4142,[49]5.4021,[50]5.3661,[51]5.3801,[52]5.3764,[53]5.4184,[54]5.4072,[55]5.3937,[56]5.4206,[57]5.4382,[58]5.4544,[59]5.4711,[60]5.5030,[61]5.4940,[62]5.5545,[63]5.5759,[64]5.5782,[65]5.6078,[66]5.6046,[67]5.6132,[68]5.6288,[69]5.6525,[70]5.6789,[71]5.7012,[72]5.7352,[73]5.7758,[74]5.7811,[75]5.7850,[76]5.7937,[77]5.8052,[78]5.7914,[79]5.8198,[80]5.8168,[81]5.8321,[82]5.8398,[83]5.7918,[84]5.7946,[85]5.7940,[86]5.7804,[87]5.7329,[88]5.7193,[89]5.7046,[90]5.6888,[91]5.7118,[92]5.7031,[93]5.6973,[94]5.6979,[95]5.7305,[96]5.7293,[97]5.7267,[98]5.7199,[99]5.7056,[100]5.7005,[101]5.7222,[102]5.7176,[103]5.7371,[104]5.7421,[105]5.7402,[106]5.7571,[107]5.7603,[108]5.7683,[109]5.7672,[110]5.7628,[111]5.7838,[112]5.8061,[113]5.8043,[114]5.8025,[115]5.8081,[116]5.8000,[117]5.8072,[118]5.8323,[119]5.8557,[120]5.8899,[121]5.9036,[122]5.9262,[123]5.9611,[124]5.9752,[125]5.9661,[126]6.0024,[127]6.0339,[128]6.0583,[129]6.0453,[130]6.0491,[131]6.0429,[132]6.0352,[133]6.0147,[134]6.0197,[135]6.0119,[136]5.9990,[137]5.9899,[138]5.9669,[139]5.9587,[140]5.9516,[141]5.9228,[142]5.9162,[143]5.8873,[144]5.8662,[145]5.8512,[146]5.8402,[147]5.8395,[148]5.8380,[149]5.8297,[150]5.8262,[151]5.8293,[152]5.8183,[153]5.8056,[154]5.7982,[155]5.8032,[156]5.8023,[157]5.8175,[158]5.8185,[159]5.8227,[160]5.8300,[161]5.8414,[162]5.8157,[163]5.8066,[164]5.7872,[165]5.7615,[166]5.7366,[167]5.7018,[168]5.6776,[169]5.6686,[170]5.6591,[171]5.6381,[172]5.6227,[173]5.6066,[174]5.5824,[175]5.5619,[176]5.5499,[177]5.5312,[178]5.5122,[179]5.4977,[180]5.4907,[181]5.4753,[182]5.4583,[183]5.4466,[184]5.4455,[185]5.4420,[186]5.4456,[187]5.4566,[188]5.4579,[189]5.4803,[190]5.4809,[191]5.5010,[192]5.5167,[193]5.5298,[194]5.5432,[195]5.5652,[196]5.5787,[197]5.5996,[198]5.6119,[199]5.6158,[200]5.6211,[201]5.6152,[202]5.6305,[203]5.6400,[204]5.6377,[205]5.6518,[206]5.6578,[207]5.6537,[208]5.6669,[209]5.6712,[210]5.6753,[211]5.6881,[212]5.6958,[213]5.7033,[214]5.7065,[215]5.7077,[216]5.7186,[217]5.7336,[218]5.7462,[219]5.7446,[220]5.7430,[221]5.7343,[222]5.7304,[223]5.7209,[224]5.7141,[225]5.7087,[226]5.7265,[227]5.7304,[228]5.7380,[229]5.7426,[230]5.7369,[231]5.7487,[232]5.7380,[233]5.7210,[234]5.7059,[235]5.6820,[236]5.6794,[237]5.6711,[238]5.6748,[239]5.6639,[240]5.6540,[241]5.6547,[242]5.6552,[243]5.6521,[244]5.6418,[245]5.6389,[246]5.6281,[247]5.6184,[248]5.6122,[249]5.6094,[250]5.6134,[251]5.6047,[252]5.6015,[253]5.5930,[254]5.5861,[255]5.5745,[256]5.5577,[257]5.5464,[258]5.5379,[259]5.5360,[260]5.5281,[261]5.5234,[262]5.5196,[263]5.5132,[264]5.4880,[265]5.4891,[266]5.4855,[267]5.4798,[268]5.4881,[269]5.4898,[270]5.4923,[271]5.5008,[272]5.5048,[273]5.5066,[274]5.5055,[275]5.5106,[276]5.5165,[277]5.5285,[278]5.5376,[279]5.5451,[280]5.5482,[281]5.5584,[282]5.5649,[283]5.5780,[284]5.5876,[285]5.5969,[286]5.6099,[287]5.6085,[288]5.6138,[289]5.6064,[290]5.5959,[291]5.5852,[292]5.5742,[293]5.5635,[294]5.5650,[295]5.5655,[296]5.5713,[297]5.5717,[298]5.5760,[299]5.5750,[300]5.5669,[301]5.5683,[302]5.5637,[303]5.5564,[304]5.5486,[305]5.5461,[306]5.5347,[307]5.5373,[308]5.5376,[309]5.5261,[310]5.5231,[311]5.5192,[312]5.5215,[313]5.5185,[314]5.5199,[315]5.5058,[316]5.5025,[317]5.4878,[318]5.4697,[319]5.4834,[320]5.4943,[321]5.4977,[322]5.4925,[323]5.4887,[324]5.4899,[325]5.5017,[326]5.5030,[327]5.5049,[328]5.5085,[329]5.5123,[330]5.5162,[331]5.5278,[332]5.5243,[333]5.5315,[334]5.5268,[335]5.5211,[336]5.5234,[337]5.5223,[338]5.5222,[339]5.5185,[340]5.5132,[341]5.5180,[342]5.5199,[343]5.5236,[344]5.5245,[345]5.5255,[346]5.5233,[347]5.5262,[348]5.5283,[349]5.5314,[350]5.5305,[351]5.5303,[352]5.5297,[353]5.5242,[354]5.5238,[355]5.5298,[356]5.5350,[357]5.5323,[358]5.5420,[359]5.5458,[360]5.5433,[361]5.5427,[362]5.5507,[363]5.5611,[364]5.5675,[365]5.5717,[366]5.5741,[367]5.5817,[368]5.5797,[369]5.5816,[370]5.5846,[371]5.5795,[372]5.5846,[373]5.5890,[374]5.5876,[375]5.5875,[376]5.5952,[377]5.5923,[378]5.5950,[379]5.5998,[380]5.5939,[381]5.5917,[382]5.5877,[383]5.5872,[384]5.5880,[385]5.5881,[386]5.5874,[387]5.5896,[388]5.5870,[389]5.5845,[390]5.5792,[391]5.5740,[392]5.5725,[393]5.5746,[394]5.5791,[395]5.5775,[396]5.5718,[397]5.5804,[398]5.5867,[399]5.5953,[400]5.5946,[401]5.5957,[402]5.5981,[403]5.6007,[404]5.6063,[405]5.6021,[406]5.6002,[407]5.6029,[408]5.6044,[409]5.6164,[410]5.6269,[411]5.6376,[412]5.6538,[413]5.6655,[414]5.6725,[415]5.6791,[416]5.6871,[417]5.6974,[418]5.7001,[419]5.7059,[420]5.7138,[421]5.7247,[422]5.7293,[423]5.7369,[424]5.7471,[425]5.7567,[426]5.7647,[427]5.7691,[428]5.7773,[429]5.7807,[430]5.7896,[431]5.8027,[432]5.8053,[433]5.8037,[434]5.8000,[435]5.8028,[436]5.8056,[437]5.8156,[438]5.8245,[439]5.8218,[440]5.8220,[441]5.8191,[442]5.8181,[443]5.8192,[444]5.8205,[445]5.8189,[446]5.8215,[447]5.8235,[448]5.8271,[449]5.8256,[450]5.8267,[451]5.8229,[452]5.8228,[453]5.8164,[454]5.8121,[455]5.8147,[456]5.8194,[457]5.8227,[458]5.8217,[459]5.8218,[460]5.8309,[461]5.8296,[462]5.8304,[463]5.8341,[464]5.8333,[465]5.8324,[466]5.8270,[467]5.8310,[468]5.8332,[469]5.8364,[470]5.8370,[471]5.8349,[472]5.8407,[473]5.8359,[474]5.8398,[475]5.8379,[476]5.8406,[477]5.8360,[478]5.8367,[479]5.8466,[480]5.8527,[481]5.8555,[482]5.8522,[483]5.8507,[484]5.8540,[485]5.8545,[486]5.8508,[487]5.8521,[488]5.8505,[489]5.8475,[490]5.8483,[491]5.8478,[492]5.8454,[493]5.8431,[494]5.8416,[495]5.8417,[496]5.8396,[497]5.8369,[498]5.8371,[499]5.8337,[500]5.8258,[501]5.8212,[502]5.8232,[503]5.8241,[504]5.8169,[505]5.8203,[506]5.8212,[507]5.8141,[508]5.8091,[509]5.8083,[510]5.8091,[511]5.8131,[512]5.8154,[513]5.8173,[514]5.8225,[515]5.8179,[516]5.8173,[517]5.8182,[518]5.8179,[519]5.8203,[520]5.8210,[521]5.8219,[522]5.8243,[523]5.8248,[524]5.8310,[525]5.8341,[526]5.8345,[527]5.8370,[528]5.8325,[529]5.8331,[530]5.8275,[531]5.8264,[532]5.8323,[533]5.8348,[534]5.8334,[535]5.8360,[536]5.8321,[537]5.8301,[538]5.8352,[539]5.8361,[540]5.8379,[541]5.8401,[542]5.8395,[543]5.8411,[544]5.8416,[545]5.8399,[546]5.8406,[547]5.8373,[548]5.8312,[549]5.8311,[550]5.8289,[551]5.8265,[552]5.8247,[553]5.8215,[554]5.8193,[555]5.8165,[556]5.8158,[557]5.8202,[558]5.8175,[559]5.8172,[560]5.8147,[561]5.8152,[562]5.8122,[563]5.8122,[564]5.8171,[565]5.8188,[566]5.8196,[567]5.8192,[568]5.8199,[569]5.8186,[570]5.8210,[571]5.8222,[572]5.8217,[573]5.8220,[574]5.8184,[575]5.8171,[576]5.8169,[577]5.8154,[578]5.8140,[579]5.8136,[580]5.8075,[581]5.8045,[582]5.8050,[583]5.8072,[584]5.8078,[585]5.7998,[586]5.7930,[587]5.7930,[588]5.7967,[589]5.8017,[590]5.8037,[591]5.8058,[592]5.8042,[593]5.8020,[594]5.8026,[595]5.8006,[596]5.8040,[597]5.8015,[598]5.7991,[599]5.8010,[600]5.8000,[601]5.7994,[602]5.8026,[603]5.8041,[604]5.8053,[605]5.8082,[606]5.8100,[607]5.8101,[608]5.8069,[609]5.8072,[610]5.8110,[611]5.8088,[612]5.8104,[613]5.8066,[614]5.8014,[615]5.7944,[616]5.7954,[617]5.7883,[618]5.7826,[619]5.7770,[620]5.7639,[621]5.7573,[622]5.7555,[623]5.7561,[624]5.7553,[625]5.7558,[626]5.7544,[627]5.7571,[628]5.7577,[629]5.7569,[630]5.7603,[631]5.7645,[632]5.7705,[633]5.7687,[634]5.7722,[635]5.7732,[636]5.7700,[637]5.7671,[638]5.7688,[639]5.7653,[640]5.7670,[641]5.7673,[642]5.7735,[643]5.7755,[644]5.7762,[645]5.7748,[646]5.7788,[647]5.7770,[648]5.7779,[649]5.7775,[650]5.7791,[651]5.7837,[652]5.7848,[653]5.7883,[654]5.7821,[655]5.7811,
llama_print_timings: load time = 5546.65 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: prompt eval time = 649534.48 ms / 335360 tokens ( 1.94 ms per token, 516.31 tokens per second)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: total time = 677274.58 ms The correct value of |
We need to add rope-freq-base to GGUF. |
We don't have one yet - we should introduce and add it to the spec: ggerganov/ggml#302 Btw, the vocab now is 32016. Lines 4533 to 4541 in ef955fb
|
Is this with all models? For 34b the sizes of the tensors suggest a
|
7B and 13B are tuned with infix, which uses special tokens. |
Here are some results on M2 Ultra:
build: 01f2224 (1053)
build: 01f2224 (1053)
build: 01f2224 (1053) Anyone working on adding the rope base to the meta data? |
I am not working on it, I was waiting for some input since I don't know all the details of gguf. |
You have to add the KV constant in the And in self.gguf.add_rope_base(params.f_rope_base) |
Does this affect all the new Code Llama models or only 34B? Something I'm ready elsewhere suggests all, is that right? |
My plan is to only affect CodeLLama model, the rope freq base will be added as an optional metadata that will be omitted for the other models, so they won't change. But that may change after the review. |
All new Code Llama are affacted - without this change one would need to provide the rope base manually, which is inconvenient |
@TheBloke the change has been merged, it should be safe to convert the models now. |
Just a heads up - I expect in the near future to tune the quantum mixtures to some extend. |
Additionally, it might be a good idea to convert them with |
@TheBloke I tried your
I tried converting this model myself, and it works for me, so I am not sure what went wrong there. Maybe you used a different |
@slaren Q8 model works for me: |
ugh yeah I see what went wrong, I converted to HF first and the convert_llama_weights_to_hf reads tokenizer.model from the root directory, not the model weight dir, so I must have done them all with the same tokenizer.model I'm re-doing everything now |
I'm confused re rope_frequency_Base - I have
freq_base is 10,000 still. Am I doing something wrong? or misunderstanding something? |
Oh I am misunderstanding - that's the section of convert.py that reads params.json, not config.json! OK what am I meant to do when making a model from HF format, how do I set the correct rope_freq_base then? |
Maybe I should just make the models from PTH, I feel like I'm making life much harder for myself trying to go PTH -> HF -> GGUF |
It's not supported for HF models. If you can point me to an HF model, I can try to add it, assuming that the parameter is somewhere in |
Yeah I guess it's not officially in there. We don't know how HF are going to offficially do this yet. User emozilla has created custom Llama modelling code which uses But whether HF will stick with that I have no idea. Are you OK with supporting that temporarily at least? |
Yeah no problem, if everything goes well I'll open a PR in a short while. |
Thanks so much! And just to triple check I'm not screwing anything else up:
Is that right? That seems to work fine just want to be extra sure |
The 34B base model has a vocab of 32000, only 7B and 13B should have the extended vocab. I am not sure about the instruct and python models yet, I can check, but it's going to take a while, I am running out of disk space |
Ok thanks! I've not looked at 34B yet, will be shortly. Don't worry it's fine, I was just wanting a sanity check if you already knew. I've done some tests converting direct from PTH and it shows what I described above (at least for 7B and 13B) so I'm confident that must be correct |
Changes
convert.py
to allow missingvocab_size
inparams.json
, adds enum value for 34b model.Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6