[BUG] 加载lora微调后的模型失效 #1130

jackaihfia2334 · 2023-08-16T14:36:32Z

通过ChatGLM-Efficient-Tuning项目微调了chatglm2-6b，并通过该项目的export方式导出merged后的模型chapi
修改config中对应的信息，进行加载，报如下warning, 加载成功后，发现微调的效果并不起效。而通过ChatGLM-Efficient-Tuning调用微调后的模型是起效的。

————————————————————————————————————————
warner(
Some weights of the model checkpoint at /data2/model/chapi were not used when initializing ChatGLMForConditionalGeneration: ['lm_head.weight']

This IS expected if you are initializing ChatGLMForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing ChatGLMForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
2023-08-16 14:24:55 | INFO | model_worker | Register to controller
2023-08-16 14:24:56 | INFO | controller | Register a new worker: http://127.0.0.1:20002
2023-08-16 14:24:56 | INFO | controller | Register done: http://127.0.0.1:20002, {'model_names': ['chapi'], 'speed': 1, 'queue_length': 0}
2023-08-16 14:24:56 | INFO | stdout | INFO: 127.0.0.1:52242 - "POST /register_worker HTTP/1.1" 200 OK
2023-08-16 14:24:56 | ERROR | stderr | INFO: Started server process [1426]
2023-08-16 14:24:56 | ERROR | stderr | INFO: Waiting for application startup.

————————————————————————————————————————————
我做的是self_cognation的微调，让它输出自己的名字是查派
而直接通过AutoModel调用虽然也是报相同的warning，却能输出微调后预期的结果
from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("/data2/model/chapi", trust_remote_code=True) model = AutoModel.from_pretrained("/data2/model/chapi", trust_remote_code=True).half().cuda() model = model.eval() response, history = model.chat(tokenizer, "你好", history=[]) print(response)

输出：您好！我是查派，由 xxx开发，旨在为用户提供智能化的回答和支持。

The text was updated successfully, but these errors were encountered:

hzg0601 · 2023-08-17T01:14:31Z

请给出，model_config相关配置信息，及启动方式

jackaihfia2334 · 2023-08-17T02:18:52Z

请给出，model_config相关配置信息，及启动方式

配置信息如下

llm_model_dict = {

"chatglm2-6b": {
    "local_model_path": "/data2/model/chatglm2-6b",
    "api_base_url": "http://localhost:8888/v1",  # "name"修改为fastchat服务中的"api_base_url"
    "api_key": "EMPTY"
},

"chapi": {
    "local_model_path": "/data2/model/chapi",
    "api_base_url": "http://localhost:8888/v1",  # "name"修改为fastchat服务中的"api_base_url"
    "api_key": "EMPTY"
},

"vicuna-7b-v1.5-16k": {
    "local_model_path": "/data2/model/vicuna-7b-v1.5-16k",
    "api_base_url": "http://localhost:8888/v1",  # "name"修改为fastchat服务中的"api_base_url"
    "api_key": "EMPTY"
},

# 调用chatgpt时如果报出： urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.openai.com', port=443):
#  Max retries exceeded with url: /v1/chat/completions
# 则需要将urllib3版本修改为1.25.11
# 如果依然报urllib3.exceptions.MaxRetryError: HTTPSConnectionPool，则将https改为http
# 参考https://zhuanlan.zhihu.com/p/350015032

# 如果报出：raise NewConnectionError(
# urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x000001FE4BDB85E0>:
# Failed to establish a new connection: [WinError 10060]
# 则是因为内地和香港的IP都被OPENAI封了，需要切换为日本、新加坡等地
"openai-chatgpt-3.5": {
    "local_model_path": "gpt-3.5-turbo",
    "api_base_url": "https://api.openapi.com/v1",
    "api_key": os.environ.get("OPENAI_API_KEY")
},

}

LLM_MODEL = "chapi"

LLM_DEVICE = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"

————————————————————————
启动方式即按照readme，先python server/llm_api.py，然后python server/api.py，最后streamlit run webui.py
——————————————————————————————————————
也尝试了直接使用fastchat项目本身的api调用+webui，也是不起效果的，其他部署方式均能起效

hzg0601 · 2023-08-17T02:37:42Z

根据你的描述，可能是由于使用ChatGLM的modeling.py文件启动微调后的检查点，但由于llm-head不同，导致只能启动旧的模型，请参考README问档5.1.3小节加载PEFT检查点

jackaihfia2334 · 2023-08-17T02:45:31Z

根据你的描述，可能是由于使用ChatGLM的modeling.py文件启动微调后的检查点，但由于llm-head不同，导致只能启动旧的模型，请参考README问档5.1.3小节加载PEFT检查点

[感谢回复]
1.调用的是已经merge的模型而非调用basemodel+lora
2.通过AutoModel直接调用的方式能够生效
3.README问档5.1.3小节加载lora的方式也遇到一些问题，已在另一个issue中提出(#1110)

hzg0601 · 2023-08-17T03:05:24Z

如果方便的话，可否将lora和合并后的模型邮件发一份给我。这似乎是个通用问题，我想调试解决一下

jackaihfia2334 · 2023-08-17T03:37:12Z

合并后的模型过大，我还在上传中，完成后发您，先发lora文件，可通过ChatGLM-Efficient-Tuning与chatglm2合并 hiyouga/ChatGLM-Efficient-Tuning: Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调 (github.com)

…

------------------ 原始邮件 ------------------ 发件人: "chatchat-space/Langchain-Chatchat" ***@***.***>; 发送时间: 2023年8月17日(星期四) 中午11:05 ***@***.***>; ***@***.******@***.***>; 主题: Re: [chatchat-space/Langchain-Chatchat] [BUG] 加载lora微调后的模型失效 (Issue #1130) 如果方便的话，可否将lora和合并后的模型邮件发一份给我。这似乎是个通用问题，我想调试解决一下 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

jackaihfia2334 · 2023-08-17T12:19:59Z

合并后的模型已上传至夸克网盘链接：https://pan.quark.cn/s/98cdd26cc80e

…

------------------ 原始邮件 ------------------ 发件人: "chatchat-space/Langchain-Chatchat" ***@***.***>; 发送时间: 2023年8月17日(星期四) 中午11:05 ***@***.***>; ***@***.******@***.***>; 主题: Re: [chatchat-space/Langchain-Chatchat] [BUG] 加载lora微调后的模型失效 (Issue #1130) 如果方便的话，可否将lora和合并后的模型邮件发一份给我。这似乎是个通用问题，我想调试解决一下 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***> 从QQ邮箱发来的超大附件 lora.zip (68.78M, 2023年09月16日 20:19 到期)进入下载页面：http://mail.qq.com/cgi-bin/ftnExs_download?t=exs_ftn_download&k=0b3566664cba8ac964b4bbfc43390b1d4e415e5500080e511b0252505414580102014b52555b091f530356575c5c5f00030d03546531395e594707481f5049320b&code=65ffe992

hzg0601 · 2023-08-17T14:36:41Z

感谢

jackaihfia2334 · 2023-08-18T08:56:55Z

感谢

经实践，通过peft方式加载lora模型（而非加载合并后的模型）是成功的，但启动api和webui错误，似乎存在适配上的问题。
我通过python3 -m fastchat.serve.cli --model-path /data2/project/peft-model，以命令行形式交互得到预期的结果。

而使用python3 -m fastchat.serve.model_worker --model-path /data2/project/peft-model启动LLM服务后，通过python server/api.py启动api服务，通过streamlit run webui.py启动webui服务，
在输入时 api服务报错如下（似乎是端口问题）
——————————————————————
root@docker-desktop:/data1/llm/code/Langchain-Chatchat# python server/api.py
2023-08-18 08:49:10,100 - utils.py[line:148] - INFO: Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2023-08-18 08:49:10,100 - utils.py[line:160] - INFO: NumExpr defaulting to 8 threads.
INFO: Started server process [32283]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:7861 (Press CTRL+C to quit)
INFO: 127.0.0.1:46718 - "POST /chat/chat HTTP/1.1" 200 OK
2023-08-18 08:49:35,508 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI.
2023-08-18 08:49:35,508 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry'
2023-08-18 08:49:35,508 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry'
2023-08-18 08:49:39,512 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI.
2023-08-18 08:49:39,513 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry'
2023-08-18 08:49:39,513 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry'
2023-08-18 08:49:43,515 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI.
2023-08-18 08:49:43,515 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry'
2023-08-18 08:49:43,515 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry'
2023-08-18 08:49:47,522 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 8.0 seconds as it raised APIConnectionError: Error communicating with OpenAI.
2023-08-18 08:49:47,523 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry'
2023-08-18 08:49:47,523 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry'
2023-08-18 08:49:55,530 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 10.0 seconds as it raised APIConnectionError: Error communicating with OpenAI.
2023-08-18 08:49:55,531 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry'
2023-08-18 08:49:55,531 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry'
Caught exception: Error communicating with OpenAI

此问题也在isssues1110中提出 #1110

jackaihfia2334 · 2023-08-18T09:37:01Z

感谢

另外，我尝试使用fastchat原生的webui加载方式，报错如下，似乎是因为没有从本地加载adaper_config而是去huggingface下载，详见
lm-sys/FastChat#2262

———————————————————————
root@docker-desktop:/# python3 -m fastchat.serve.gradio_web_server
2023-08-18 16:24:00 | INFO | gradio_web_server | args: Namespace(host='0.0.0.0', port=None, share=False, controller_url='http://localhost:21001', concurrency_count=10, model_list_mode='once', moderate=False, add_chatgpt=False, add_claude=False, add_palm=False, gradio_auth_path=None)
2023-08-18 16:24:00 | INFO | gradio_web_server | Models: ['peft-model']
2023-08-18 16:24:00 | INFO | stdout | Running on local URL: http://0.0.0.0:7860
2023-08-18 16:24:00 | INFO | stdout |
2023-08-18 16:24:00 | INFO | stdout | To create a public link, set share=True in launch().
2023-08-18 16:24:05 | INFO | gradio_web_server | load_demo. ip: 127.0.0.1. params: {}
2023-08-18 16:24:05 | INFO | httpx | HTTP Request: POST http://localhost:7860/api/predict "HTTP/1.1 200 OK"
2023-08-18 16:24:05 | INFO | httpx | HTTP Request: POST http://localhost:7860/reset "HTTP/1.1 200 OK"
2023-08-18 16:24:10 | INFO | gradio_web_server | add_text. ip: 127.0.0.1. len: 2
2023-08-18 16:24:10 | INFO | stdout | peft-model
2023-08-18 16:24:10 | INFO | stdout | peft-model
2023-08-18 16:24:10 | INFO | stdout | peft-model
2023-08-18 16:24:10 | INFO | stdout | peft-model
2023-08-18 16:24:10 | INFO | stdout | find 'adapter_config.json' at 'peft-model'
'(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 6b2753fa-c523-43c3-80bf-70ebedbb1769)')' thrown while requesting HEAD https://huggingface.co/peft-model/resolve/main/adapter_config.json
2023-08-18 16:24:20 | WARNING | huggingface_hub.utils._http | '(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 6b2753fa-c523-43c3-80bf-70ebedbb1769)')' thrown while requesting HEAD https://huggingface.co/peft-model/resolve/main/adapter_config.json
2023-08-18 16:24:20 | ERROR | stderr | Traceback (most recent call last):
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/peft/utils/config.py", line 119, in from_pretrained
2023-08-18 16:24:20 | ERROR | stderr | config_file = hf_hub_download(
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
2023-08-18 16:24:20 | ERROR | stderr | return fn(*args, **kwargs)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1291, in hf_hub_download
2023-08-18 16:24:20 | ERROR | stderr | raise LocalEntryNotFoundError(
2023-08-18 16:24:20 | ERROR | stderr | huggingface_hub.utils._errors.LocalEntryNotFoundError: Connection error, and we cannot find the requested files in the disk cache. Please try again or make sure your Internet connection is on.
2023-08-18 16:24:20 | ERROR | stderr |
2023-08-18 16:24:20 | ERROR | stderr | During handling of the above exception, another exception occurred:
2023-08-18 16:24:20 | ERROR | stderr |
2023-08-18 16:24:20 | ERROR | stderr | Traceback (most recent call last):
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 442, in run_predict
2023-08-18 16:24:20 | ERROR | stderr | output = await app.get_blocks().process_api(
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1392, in process_api
2023-08-18 16:24:20 | ERROR | stderr | result = await self.call_function(
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1097, in call_function
2023-08-18 16:24:20 | ERROR | stderr | prediction = await anyio.to_thread.run_sync(
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
2023-08-18 16:24:20 | ERROR | stderr | return await get_asynclib().run_sync_in_worker_thread(
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
2023-08-18 16:24:20 | ERROR | stderr | return await future
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
2023-08-18 16:24:20 | ERROR | stderr | result = context.run(func, *args)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 703, in wrapper
2023-08-18 16:24:20 | ERROR | stderr | response = f(*args, **kwargs)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/fastchat/serve/gradio_web_server.py", line 210, in add_text
2023-08-18 16:24:20 | ERROR | stderr | state = State(model_selector)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/fastchat/serve/gradio_web_server.py", line 68, in init
2023-08-18 16:24:20 | ERROR | stderr | self.conv = get_conversation_template(model_name)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/fastchat/model/model_adapter.py", line 291, in get_conversation_template
2023-08-18 16:24:20 | ERROR | stderr | return adapter.get_default_conv_template(model_path)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/fastchat/model/model_adapter.py", line 498, in get_default_conv_template
2023-08-18 16:24:20 | ERROR | stderr | config = PeftConfig.from_pretrained(model_path)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/peft/utils/config.py", line 123, in from_pretrained
2023-08-18 16:24:20 | ERROR | stderr | raise ValueError(f"Can't find '{CONFIG_NAME}' at '{pretrained_model_name_or_path}'")
2023-08-18 16:24:20 | ERROR | stderr | ValueError: Can't find 'adapter_config.json' at 'peft-model'
2023-08-18 16:24:20 | INFO | httpx | HTTP Request: POST http://localhost:7860/api/predict "HTTP/1.1 500 Internal Server Error"
2023-08-18 16:24:20 | INFO | httpx | HTTP Request: POST http://localhost:7860/reset "HTTP/1.1 200 OK"
2023-08-18 16:24:20 | INFO | gradio_web_server | bot_response. ip: 127.0.0.1
2023-08-18 16:24:20 | ERROR | stderr | Traceback (most recent call last):
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 442, in run_predict
2023-08-18 16:24:20 | ERROR | stderr | output = await app.get_blocks().process_api(
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1392, in process_api
2023-08-18 16:24:20 | ERROR | stderr | result = await self.call_function(
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1111, in call_function
2023-08-18 16:24:20 | ERROR | stderr | prediction = await utils.async_iteration(iterator)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 346, in async_iteration
2023-08-18 16:24:20 | ERROR | stderr | return await iterator.anext()
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 339, in anext
2023-08-18 16:24:20 | ERROR | stderr | return await anyio.to_thread.run_sync(
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
2023-08-18 16:24:20 | ERROR | stderr | return await get_asynclib().run_sync_in_worker_thread(
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
2023-08-18 16:24:20 | ERROR | stderr | return await future
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
2023-08-18 16:24:20 | ERROR | stderr | result = context.run(func, *args)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 322, in run_sync_iterator_async
2023-08-18 16:24:20 | ERROR | stderr | return next(iterator)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 691, in gen_wrapper
2023-08-18 16:24:20 | ERROR | stderr | yield from f(*args, **kwargs)
2023-08-18 16:24:20 | ERROR | stderr | File "/usr/local/lib/python3.10/dist-packages/fastchat/serve/gradio_web_server.py", line 300, in bot_response
2023-08-18 16:24:20 | ERROR | stderr | if state.skip_next:
2023-08-18 16:24:20 | ERROR | stderr | AttributeError: 'NoneType' object has no attribute 'skip_next'
2023-08-18 16:24:20 | INFO | httpx | HTTP Request: POST http://localhost:7860/api/predict "HTTP/1.1 500 Internal Server Error"
2023-08-18 16:24:20 | INFO | httpx | HTTP Request: POST http://localhost:7860/reset "HTTP/1.1 200 OK"

wu-xiaohua · 2023-08-18T15:40:44Z

如果方便的话，可否将lora和合并后的模型邮件发一份给我。这似乎是个通用问题，我想调试解决一下

同样遇到合并后的模型有这个问题

如果方便的话，可否将lora和合并后的模型邮件发一份给我。这似乎是个通用问题，我想调试解决一下

chenkaiC4 · 2023-08-19T05:14:03Z

感谢

经实践，通过peft方式加载lora模型（而非加载合并后的模型）部署是成功的，但api和webui似乎存在适配上的问题。我通过python3 -m fastchat.serve.cli --model-path /data2/project/peft-model，以命令行形式交互得到预期的结果。

而使用python3 -m fastchat.serve.model_worker --model-path /data2/project/peft-model启动LLM服务后，通过python server/api.py启动api服务，通过streamlit run webui.py启动webui服务，在输入时 api服务报错如下（似乎是端口问题） —————————————————————— root@docker-desktop:/data1/llm/code/Langchain-Chatchat# python server/api.py 2023-08-18 08:49:10,100 - utils.py[line:148] - INFO: Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. 2023-08-18 08:49:10,100 - utils.py[line:160] - INFO: NumExpr defaulting to 8 threads. INFO: Started server process [32283] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:7861 (Press CTRL+C to quit) INFO: 127.0.0.1:46718 - "POST /chat/chat HTTP/1.1" 200 OK 2023-08-18 08:49:35,508 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:35,508 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:35,508 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:39,512 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:39,513 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:39,513 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:43,515 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:43,515 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:43,515 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:47,522 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 8.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:47,523 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:47,523 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:55,530 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 10.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:55,531 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:55,531 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' Caught exception: Error communicating with OpenAI

此问题也在isssues1110中提出 #1110

@jackaihfia2334 请教一个问题：

我用的也是 0.2.0版本，用chatglm2 的 ptuning 训练的，得到了训练后的模型在 ptuning/output/whoami-pt-128-2e-2/checkpoint-300目录下，使用 python3 -m fastchat.serve.cli 加载模型，该如何制定参数？原来的 chatgml2-6b 模型需要引入吗？

jackaihfia2334 · 2023-08-19T05:30:15Z

感谢

经实践，通过peft方式加载lora模型（而非加载合并后的模型）部署是成功的，但api和webui似乎存在适配上的问题。我通过python3 -m fastchat.serve.cli --model-path /data2/project/peft-model，以命令行形式交互得到预期的结果。
而使用python3 -m fastchat.serve.model_worker --model-path /data2/project/peft-model启动LLM服务后，通过python server/api.py启动api服务，通过streamlit run webui.py启动webui服务，在输入时 api服务报错如下（似乎是端口问题） —————————————————————— root@docker-desktop:/data1/llm/code/Langchain-Chatchat# python server/api.py 2023-08-18 08:49:10,100 - utils.py[line:148] - INFO: Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. 2023-08-18 08:49:10,100 - utils.py[line:160] - INFO: NumExpr defaulting to 8 threads. INFO: Started server process [32283] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:7861 (Press CTRL+C to quit) INFO: 127.0.0.1:46718 - "POST /chat/chat HTTP/1.1" 200 OK 2023-08-18 08:49:35,508 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:35,508 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:35,508 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:39,512 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:39,513 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:39,513 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:43,515 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:43,515 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:43,515 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:47,522 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 8.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:47,523 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:47,523 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:55,530 - before_sleep.py[line:65] - WARNING: Retrying langchain.chat_models.openai.acompletion_with_retry.._completion_with_retry in 10.0 seconds as it raised APIConnectionError: Error communicating with OpenAI. 2023-08-18 08:49:55,531 - manager.py[line:364] - WARNING: Error in AsyncIteratorCallbackHandler.on_retry callback: 'AsyncIteratorCallbackHandler' object has no attribute 'on_retry' 2023-08-18 08:49:55,531 - manager.py[line:364] - WARNING: Error in StdOutCallbackHandler.on_retry callback: 'StdOutCallbackHandler' object has no attribute 'on_retry' Caught exception: Error communicating with OpenAI
此问题也在isssues1110中提出 #1110

@jackaihfia2334 请教一个问题：

我用的也是 0.2.0版本，用chatglm2 的 ptuning 训练的，得到了训练后的模型在 ptuning/output/whoami-pt-128-2e-2/checkpoint-300目录下，使用 python3 -m fastchat.serve.cli 加载模型，该如何制定参数？原来的 chatgml2-6b 模型需要引入吗？

ptuning模型路径中需包含peft，例如改名为ptuning/output/whoami-pt-128-2e-2/peft-model。
原来的 chatgml2-6b 模型不需要手动引入，只需在你的peft-model的adapter_config中指定好base模型（ chatgml2-6b 模型）的路径，一般是本身就自动生成的，可以检查一下。然后输入python3 -m fastchat.serve.cli --model-path XXXX即可（XXX为你的peft模型路径）
可参考 lm-sys/FastChat#2219

chenkaiC4 · 2023-08-19T07:33:35Z

@jackaihfia2334 感谢回复，我参考你的提示做了，但是 P-Tuning v2后，生成的数据下，没有 adapter_config.json，只有一个 config.json。下面是我的目录结构：

运行指令：
python3 -m fastchat.serve.cli --model-path /home/ubuntu/code/ChatGLM2-6B/ptuning/output/whoami-pt-128-2e-2/peft-model

报错：

ValueError: Can't find 'adapter_config.json' at '/home/ubuntu/code/ChatGLM2-6B/ptuning/output/whoami-pt-128-2e-2/peft-model'

然后手动添加了 adapter_config.json 文件，

{
    "base_model_name_or_path": "/home/ubuntu/code/ChatGLM2-6B/chatglm2-6b"
}

报错：

这个是P-Tuning v2 的数据格式对不上吗？

jackaihfia2334 · 2023-08-19T07:45:42Z

@jackaihfia2334 感谢回复，我参考你的提示做了，但是 P-Tuning v2后，生成的数据下，没有 adapter_config.json，只有一个 config.json。下面是我的目录结构：

运行指令： python3 -m fastchat.serve.cli --model-path /home/ubuntu/code/ChatGLM2-6B/ptuning/output/whoami-pt-128-2e-2/peft-model

报错：

ValueError: Can't find 'adapter_config.json' at '/home/ubuntu/code/ChatGLM2-6B/ptuning/output/whoami-pt-128-2e-2/peft-model'

然后手动添加了 adapter_config.json 文件，
{
    "base_model_name_or_path": "/home/ubuntu/code/ChatGLM2-6B/chatglm2-6b"
}
报错：

这个是P-Tuning v2 的数据格式对不上吗？

maybe fastchat对p-tuning不支持或者有其他支持方式，可能需要去fastchat的项目里查看一下提个issue

chenkaiC4 · 2023-08-19T07:49:45Z

感谢 @jackaihfia2334 握爪

jackaihfia2334 · 2023-08-19T07:55:50Z

@jackaihfia2334 感谢回复，我参考你的提示做了，但是 P-Tuning v2后，生成的数据下，没有 adapter_config.json，只有一个 config.json。下面是我的目录结构：

运行指令： python3 -m fastchat.serve.cli --model-path /home/ubuntu/code/ChatGLM2-6B/ptuning/output/whoami-pt-128-2e-2/peft-model

报错：

ValueError: Can't find 'adapter_config.json' at '/home/ubuntu/code/ChatGLM2-6B/ptuning/output/whoami-pt-128-2e-2/peft-model'

然后手动添加了 adapter_config.json 文件，
{
    "base_model_name_or_path": "/home/ubuntu/code/ChatGLM2-6B/chatglm2-6b"
}
报错：

这个是P-Tuning v2 的数据格式对不上吗？

factchat的源码里对peft-model都是去匹配adapter_config.json的，你可以试试看把你的config.json重命名为adapter_config.json。然后做相应的修改（估计还会有其他bug）从你的报错来看是缺少peft_type参数

我使用lora微调得到的adapter_config.json内容如下，供参考
——————————————————————————————————
{
"auto_mapping": null,
"base_model_name_or_path": "/data2/model/chatglm2-6b",
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"layers_pattern": null,
"layers_to_transform": null,
"lora_alpha": 32.0,
"lora_dropout": 0,
"modules_to_save": null,
"peft_type": "LORA",
"r": 8,
"revision": null,
"target_modules": [
"query_key_value"
],
"task_type": "CAUSAL_LM"
}
——————————————————

chenkaiC4 · 2023-08-19T08:14:29Z

@jackaihfia2334 老哥太感谢了，我试了下，还是有问题，我准备弃坑了。改用你用推荐的 ChatGLM-Efficient-Tuning，不过这个项目不更新了，现在是 https://github.com/hiyouga/LLaMA-Efficient-Tuning，请问目前是用老版本的 ChatGLM-Efficient-Tuning 还是新的 LLaMA-Efficient-Tuning？我也是要跑下 self_cognation 的训练。目前你能集成到Langchain-Chatchat中，用接口调用了吗？

jackaihfia2334 · 2023-08-19T08:23:36Z

@jackaihfia2334 老哥太感谢了，我试了下，还是有问题，我准备弃坑了。改用你用推荐的 ChatGLM-Efficient-Tuning，不过这个项目不更新了，现在是 https://github.com/hiyouga/LLaMA-Efficient-Tuning，请问目前是用老版本的 ChatGLM-Efficient-Tuning 还是新的 LLaMA-Efficient-Tuning？我也是要跑下 self_cognation 的训练。目前你能集成到Langchain-Chatchat中，用接口调用了吗？

LLaMA-Efficient-Tuning这个新版本我也没用过，打赏试试看。集成到Langchain-Chatchat就是我在这个issue提的问题，llm服务可以启动，但是gradio_webui_surverr不适配，没法在网页上部署。我使用fastchat原生的webui也不成功，bug我也贴在上面，作者团队会后续修正。我也在fasthchat里提了issue。lm-sys/FastChat#2262

经过实践，使用fastchat的命令行可以成功（python3 -m fastchat.serve.cli）。我自己魔改fastchat原生的webui（fastchat.serve.gradio_web_server）也可以成功，但方式有点笨，等待官方后续修改。

chenkaiC4 · 2023-08-19T08:28:45Z

@jackaihfia2334 API 接口是正常的吗？我目前只需要API接口能访问就行。

jackaihfia2334 · 2023-08-19T08:33:19Z

@jackaihfia2334 API 接口是正常的吗？我目前只需要API接口能访问就行。

往上翻翻，我贴的很具体了

jackaihfia2334 · 2023-08-19T08:52:35Z

@jackaihfia2334 API 接口是正常的吗？我目前只需要API接口能访问就行。

就是python -m fastchat.serve.model_worker成功python server/api.py报错
使用python3 -m fastchat.serve.model_worker --model-path /data2/project/peft-model启动LLM服务后，通过python server/api.py启动api服务，通过streamlit run webui.py启动webui服务，
在输入时 api服务报错如下（似乎是端口问题）

chenkaiC4 · 2023-08-19T08:54:19Z

看到了，这个很奇怪的。按理模型加载后，前、后端的逻辑通过 HTTP 接口走，这块没有改动，本不应该报错的。

chenkaiC4 · 2023-08-19T08:57:52Z

#1130 (comment)
的确像是没找到llm服务，然后超时 retry了

chenkaiC4 · 2023-08-20T12:49:28Z

@jackaihfia2334 我用 https://github.com/hiyouga/LLaMA-Efficient-Tuning 训练后，用它的测试，能达到预期。

然后我用命令行方式：python3 -m fastchat.serve.cli --model-path /home/ubuntu/LLaMA-Efficient-Tuning/output/peft_checkpoints，模型加载了，但是只加载了chatgml2的原始模型，问答也达不到预期效果。
下面是我生成的 adapter_config.json

{
  "auto_mapping": null,
  "base_model_name_or_path": "/home/ubuntu/code/glmchain/chatglm2-6b",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "layers_pattern": null,
  "layers_to_transform": null,
  "lora_alpha": 32.0,
  "lora_dropout": 0.1,
  "modules_to_save": null,
  "peft_type": "LORA",
  "r": 8,
  "revision": null,
  "target_modules": [
    "query_key_value"
  ],
  "task_type": "CAUSAL_LM"
}

其中，base_model_name_or_path 中 /home/ubuntu/code/glmchain/chatglm2-6b，是本机下载的原生的chatglm2-6b模型位置。

下图是lora训练后的checkpoints结构：

看起来和你的没有区别，但cli 的问答达不到预期效果。

能看下你lora训练后的checkpoints 文件夹里的文件目录吗？

【更新】
使用老版本的代码：https://github.com/hiyouga/ChatGLM-Efficient-Tuning，训练后，运行 python3 -m fastchat.serve.cli --model /home/ubuntu/LLaMA-Efficient-Tuning/output/peft_checkpoints ，问答达到预期。应该是新版本里，lora 和 fastchat 有不适配的地方。

chenkaiC4 · 2023-08-20T13:55:38Z

@jackaihfia2334 问题解决了 😸 。操作如下：

采用老版本的代码进行训练，使用的是 https://github.com/hiyouga/ChatGLM-Efficient-Tuning 进行 lora 训练。
训练完的checkpoint 目录，需要带有 peft，我的是 peft_chatglm2。(其中包含 adapter_config.json)
依次执行下面的指令：

1. 打开fastchat系统级的 http 服务，有心跳和模型维护接口

python3 -m fastchat.serve.controller

2. 运行LLM模型服务，这里的model-names 的设置决定了第4步中的 llm_model_dict 设置

PEFT_SHARE_BASE_WEIGHTS=true python3 -m fastchat.serve.multi_model_worker \
    --model-path /home/ubuntu/LLaMA-Efficient-Tuning/output/peft_chatglm2 \
    --model-names peft_lora_chatglm2 \
    --num-gpus 1

3. 启动openai形式的接口

python3 -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8000

4. 关键一步，设置 configs/model_config.py后，运行 python server/api.py。设置如下：

llm_model_dict = {
    "peft_lora_chatglm2": {
        "local_model_path": "/home/ubuntu/LLaMA-Efficient-Tuning/new_model/chatgml2",
        "api_base_url": "http://localhost:8000/v1",
        "api_key": "EMPTY"
    },
}

LLM_MODEL = "peft_lora_chatglm2"

这里面核心还是高清了FastChat的逻辑，被坑主要的原因是没有执行 第三步 ，没有启动 openai 风格的接口，而FastChat里，使用的都是openai风格的API，然后就 http retry了

jackaihfia2334 · 2023-08-20T14:30:27Z

感谢解答！成功运行了。现在还存在两个问题
1.lora微调合并后的模型webui部署似乎还存在问题
2.希望LLaMA-Efficient-Tuning能够兼容fastchat

XiaHuGXB · 2023-08-21T03:52:12Z

可以尝试在LLaMA-Efficient-Tuning/ChatGLM-Efficient-Tuning中启动llm的api的服务，试了一下这样可以绕过fastchat

jackaihfia2334 · 2023-08-21T08:46:49Z

可以尝试在LLaMA-Efficient-Tuning/ChatGLM-Efficient-Tuning中启动llm的api的服务，试了一下这样可以绕过fastchat

这样操作是可行的。现在就是好奇为什么fastchat加载微调合并后的模型无法得到预期效果。

dijkstra-mose · 2023-08-22T06:47:30Z

也碰到同样的问题。调试发现是因为gradio_web_server.py里存在bug：

class State:
    def __init__(self, model_name):
        self.conv = get_conversation_template(model_name)

这时会调用PeftModelAdapter.get_default_conv_template()
但由于PeftModelAdapter是动态读取conv_template的，这时读取conv是错的。
正确的逻辑应该改为调用model_worker的api动态读取conv:

class State:
    def __init__(self, model_name):
        ret = requests.post(
            controller_url + "/get_worker_address", json={"model": model_name}
        )
        worker_addr = ret.json()["address"]
        ret = requests.post(worker_addr + "/worker_get_conv_template")
        conv = ret.json()["conv"]
        self.conv = Conversation(
            name=conv["name"],
            system_template=conv["system_template"],
            system_message=conv["system_message"],
            roles=conv["roles"],
            messages=conv["messages"],
            offset=conv["offset"],
            sep_style=conv["sep_style"],
            sep=conv["sep"],
            sep2=conv["sep2"],
            stop_str=conv["stop_str"],
            stop_token_ids=conv["stop_token_ids"],
        )
        logger.info(f"model_name: {model_name}, worker_addr: {worker_addr}, worker_get_conv_template")

这样可以读取到正确的conv了，应该已经解决lora checkpoint部署的问题。

MyGitHubPigStar · 2023-08-22T08:22:56Z

感谢解答！成功运行了。现在还存在两个问题 1.lora能够调音后的模型webui配置似乎还存在问题 2.希望LLaMA-Efficient-Tuning兼容fastchat

截至目前，使用lora合并后的模型，能用api的方式启动吗

jackaihfia2334 · 2023-08-22T08:28:54Z

感谢解答！成功运行了。现在还存在两个问题 1.lora能够调音后的模型webui配置似乎还存在问题 2.希望LLaMA-Efficient-Tuning兼容fastchat

截至目前，使用lora合并后的模型，能用api的方式启动吗

可以启动，但回答没有得到微调的效果，用chatglm-efficient-tuning方式启动的api可以达到效果。
应该是fastchat本身存在问题

MyGitHubPigStar · 2023-08-22T08:34:35Z

感谢解答！成功运行了。现在还存在两个问题 1.lora能够调音后的模型webui配置似乎还存在问题 2.希望LLaMA-Efficient-Tuning兼容fastchat

截至目前，使用lora合并后的模型，能用api的方式启动吗

可以启动，但回答没有得到微调的效果，用chatglm-efficient-tuning方式启动的api可以达到效果。应该是fastchat本身存在问题

感谢！很奇怪。我使用了8月初生成的模型（chatglm-efficient-tuning），通过最新版的langchain是可以加载成功。但再次训练模型放入langchain就无法正确加载。

BC-0521 · 2023-08-22T12:00:33Z

@jackaihfia2334 问题解决了 😸 。操作如下：

采用老版本的代码进行训练，使用的是 https://github.com/hiyouga/ChatGLM-Efficient-Tuning 进行 lora 训练。

训练完的checkpoint 目录，需要带有 peft，我的是 peft_chatglm2。(其中包含 adapter_config.json)

依次执行下面的指令：

1. 打开fastchat系统级的 http 服务，有心跳和模型维护接口
python3 -m fastchat.serve.controller
2. 运行LLM模型服务，这里的model-names 的设置决定了第4步中的 llm_model_dict 设置
PEFT_SHARE_BASE_WEIGHTS=true python3 -m fastchat.serve.multi_model_worker \
    --model-path /home/ubuntu/LLaMA-Efficient-Tuning/output/peft_chatglm2 \
    --model-names peft_lora_chatglm2 \
    --num-gpus 1
3. 启动openai形式的接口
python3 -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8000
4. 关键一步，设置 configs/model_config.py后，运行 python server/api.py。设置如下：
llm_model_dict = {
    "peft_lora_chatglm2": {
        "local_model_path": "/home/ubuntu/LLaMA-Efficient-Tuning/new_model/chatgml2",
        "api_base_url": "http://localhost:8000/v1",
        "api_key": "EMPTY"
    },
}

LLM_MODEL = "peft_lora_chatglm2"
这里面核心还是高清了FastChat的逻辑，被坑主要的原因是没有执行 第三步 ，没有启动 openai 风格的接口，而FastChat里，使用的都是openai风格的API，然后就 http retry了

大佬，https://github.com/hiyouga/ChatGLM-Efficient-Tuning这个链接里有5个版本，用哪个版本训练呢，我们运行python3 -m fastchat.serve.controller，第一步就报错了，

怎么解决啊

hzg0601 · 2023-08-22T16:04:30Z

已有答案，参考 #1130 (comment)

Gzj369 · 2023-09-08T03:16:08Z

@jackaihfia2334 问题解决了 😸 。操作如下：

采用老版本的代码进行训练，使用的是 https://github.com/hiyouga/ChatGLM-Efficient-Tuning 进行 lora 训练。

训练完的checkpoint 目录，需要带有 peft，我的是 peft_chatglm2。(其中包含 adapter_config.json)

依次执行下面的指令：

1. 打开fastchat系统级的 http 服务，有心跳和模型维护接口
python3 -m fastchat.serve.controller
2. 运行LLM模型服务，这里的model-names 的设置决定了第4步中的 llm_model_dict 设置
PEFT_SHARE_BASE_WEIGHTS=true python3 -m fastchat.serve.multi_model_worker \
    --model-path /home/ubuntu/LLaMA-Efficient-Tuning/output/peft_chatglm2 \
    --model-names peft_lora_chatglm2 \
    --num-gpus 1
3. 启动openai形式的接口
python3 -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8000
4. 关键一步，设置 configs/model_config.py后，运行 python server/api.py。设置如下：
llm_model_dict = {
    "peft_lora_chatglm2": {
        "local_model_path": "/home/ubuntu/LLaMA-Efficient-Tuning/new_model/chatgml2",
        "api_base_url": "http://localhost:8000/v1",
        "api_key": "EMPTY"
    },
}

LLM_MODEL = "peft_lora_chatglm2"
这里面核心还是高清了FastChat的逻辑，被坑主要的原因是没有执行 第三步 ，没有启动 openai 风格的接口，而FastChat里，使用的都是openai风格的API，然后就 http retry了

@chenkaiC4

@jackaihfia2334 问题解决了 😸 。操作如下：

采用老版本的代码进行训练，使用的是 https://github.com/hiyouga/ChatGLM-Efficient-Tuning 进行 lora 训练。

训练完的checkpoint 目录，需要带有 peft，我的是 peft_chatglm2。(其中包含 adapter_config.json)

依次执行下面的指令：

1. 打开fastchat系统级的 http 服务，有心跳和模型维护接口
python3 -m fastchat.serve.controller
2. 运行LLM模型服务，这里的model-names 的设置决定了第4步中的 llm_model_dict 设置
PEFT_SHARE_BASE_WEIGHTS=true python3 -m fastchat.serve.multi_model_worker \
    --model-path /home/ubuntu/LLaMA-Efficient-Tuning/output/peft_chatglm2 \
    --model-names peft_lora_chatglm2 \
    --num-gpus 1
3. 启动openai形式的接口
python3 -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8000
4. 关键一步，设置 configs/model_config.py后，运行 python server/api.py。设置如下：
llm_model_dict = {
    "peft_lora_chatglm2": {
        "local_model_path": "/home/ubuntu/LLaMA-Efficient-Tuning/new_model/chatgml2",
        "api_base_url": "http://localhost:8000/v1",
        "api_key": "EMPTY"
    },
}

LLM_MODEL = "peft_lora_chatglm2"
这里面核心还是高清了FastChat的逻辑，被坑主要的原因是没有执行 第三步 ，没有启动 openai 风格的接口，而FastChat里，使用的都是openai风格的API，然后就 http retry了
大佬，https://github.com/hiyouga/ChatGLM-Efficient-Tuning这个链接里有5个版本，用哪个版本训练呢，我们运行python3 -m fastchat.serve.controller，第一步就报错了，怎么解决啊

@BC-0521 这个应该可以通过运行python时指定 --host 127.0.0.1 解决，或者看看是不是还有其他正在执行的进程导致端口占用

Gzj369 · 2023-09-08T03:20:06Z

@jackaihfia2334 问题解决了 😸 。操作如下：

采用老版本的代码进行训练，使用的是 https://github.com/hiyouga/ChatGLM-Efficient-Tuning 进行 lora 训练。

训练完的checkpoint 目录，需要带有 peft，我的是 peft_chatglm2。(其中包含 adapter_config.json)

依次执行下面的指令：

1. 打开fastchat系统级的 http 服务，有心跳和模型维护接口
python3 -m fastchat.serve.controller
2. 运行LLM模型服务，这里的model-names 的设置决定了第4步中的 llm_model_dict 设置
PEFT_SHARE_BASE_WEIGHTS=true python3 -m fastchat.serve.multi_model_worker \
    --model-path /home/ubuntu/LLaMA-Efficient-Tuning/output/peft_chatglm2 \
    --model-names peft_lora_chatglm2 \
    --num-gpus 1
3. 启动openai形式的接口
python3 -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8000
4. 关键一步，设置 configs/model_config.py后，运行 python server/api.py。设置如下：
llm_model_dict = {
    "peft_lora_chatglm2": {
        "local_model_path": "/home/ubuntu/LLaMA-Efficient-Tuning/new_model/chatgml2",
        "api_base_url": "http://localhost:8000/v1",
        "api_key": "EMPTY"
    },
}

LLM_MODEL = "peft_lora_chatglm2"
这里面核心还是高清了FastChat的逻辑，被坑主要的原因是没有执行 第三步 ，没有启动 openai 风格的接口，而FastChat里，使用的都是openai风格的API，然后就 http retry了

@chenkaiC4 @jackaihfia2334 2位好，我应该是3个步骤+model_config.py 都是按照你说的设置的，现在启动web_ui提问，还是有问题
CUDA_VISIBLE_DEVICES=0 python -m fastchat.serve.controller --host 127.0.0.1 --port 21001

CUDA_VISIBLE_DEVICES=0 PEFT_SHARE_BASE_WEIGHTS=true python -m fastchat.serve.multi_model_worker --model-path /home/Baichuan2-13B-Chat/lora_checkpoint_60_baichuan2/peft_checkpoint-2216 --model-names peft_lora_baichuan2_13b_chat --num-gpus 1 --host 127.0.0.1 --port 21002

CUDA_VISIBLE_DEVICES=0 python -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8888

CUDA_VISIBLE_DEVICES=0 python server/api.py

CUDA_VISIBLE_DEVICES=0 streamlit run webui.py --server.port 8081

web_ui提问，查看后端，提示如下错误

麻烦帮忙看看，非常感谢

Gzj369 · 2023-09-11T01:28:18Z

@jackaihfia2334 问题解决了 😸 。操作如下：

采用老版本的代码进行训练，使用的是 https://github.com/hiyouga/ChatGLM-Efficient-Tuning 进行 lora 训练。

训练完的checkpoint 目录，需要带有 peft，我的是 peft_chatglm2。(其中包含 adapter_config.json)

依次执行下面的指令：

1. 打开fastchat系统级的 http 服务，有心跳和模型维护接口
python3 -m fastchat.serve.controller
2. 运行LLM模型服务，这里的model-names 的设置决定了第4步中的 llm_model_dict 设置
PEFT_SHARE_BASE_WEIGHTS=true python3 -m fastchat.serve.multi_model_worker \
    --model-path /home/ubuntu/LLaMA-Efficient-Tuning/output/peft_chatglm2 \
    --model-names peft_lora_chatglm2 \
    --num-gpus 1
3. 启动openai形式的接口
python3 -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8000
4. 关键一步，设置 configs/model_config.py后，运行 python server/api.py。设置如下：
llm_model_dict = {
    "peft_lora_chatglm2": {
        "local_model_path": "/home/ubuntu/LLaMA-Efficient-Tuning/new_model/chatgml2",
        "api_base_url": "http://localhost:8000/v1",
        "api_key": "EMPTY"
    },
}

LLM_MODEL = "peft_lora_chatglm2"
这里面核心还是高清了FastChat的逻辑，被坑主要的原因是没有执行 第三步 ，没有启动 openai 风格的接口，而FastChat里，使用的都是openai风格的API，然后就 http retry了
@chenkaiC4 @jackaihfia2334 2位好，我应该是3个步骤+model_config.py 都是按照你说的设置的，现在启动web_ui提问，还是有问题 CUDA_VISIBLE_DEVICES=0 python -m fastchat.serve.controller --host 127.0.0.1 --port 21001

CUDA_VISIBLE_DEVICES=0 PEFT_SHARE_BASE_WEIGHTS=true python -m fastchat.serve.multi_model_worker --model-path /home/Baichuan2-13B-Chat/lora_checkpoint_60_baichuan2/peft_checkpoint-2216 --model-names peft_lora_baichuan2_13b_chat --num-gpus 1 --host 127.0.0.1 --port 21002

CUDA_VISIBLE_DEVICES=0 python -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8888

CUDA_VISIBLE_DEVICES=0 python server/api.py

CUDA_VISIBLE_DEVICES=0 streamlit run webui.py --server.port 8081

web_ui提问，查看后端，提示如下错误

麻烦帮忙看看，非常感谢

通过重新执行如下2个命令，web_ui.py已经可以正常访问了，但是测试发现针对提问容易输出重复的答案
CUDA_VISIBLE_DEVICES=0 python server/api.py

CUDA_VISIBLE_DEVICES=0 streamlit run webui.py --server.port 8081

pursure-Hy · 2023-09-21T08:08:20Z

@jackaihfia2334 问题解决了 😸 。操作如下：

采用老版本的代码进行训练，使用的是 https://github.com/hiyouga/ChatGLM-Efficient-Tuning 进行 lora 训练。

训练完的checkpoint 目录，需要带有 peft，我的是 peft_chatglm2。(其中包含 adapter_config.json)

依次执行下面的指令：

1. 打开fastchat系统级的 http 服务，有心跳和模型维护接口
python3 -m fastchat.serve.controller
2. 运行LLM模型服务，这里的model-names 的设置决定了第4步中的 llm_model_dict 设置
PEFT_SHARE_BASE_WEIGHTS=true python3 -m fastchat.serve.multi_model_worker \
    --model-path /home/ubuntu/LLaMA-Efficient-Tuning/output/peft_chatglm2 \
    --model-names peft_lora_chatglm2 \
    --num-gpus 1
3. 启动openai形式的接口
python3 -m fastchat.serve.openai_api_server --host 127.0.0.1 --port 8000
4. 关键一步，设置 configs/model_config.py后，运行 python server/api.py。设置如下：
llm_model_dict = {
    "peft_lora_chatglm2": {
        "local_model_path": "/home/ubuntu/LLaMA-Efficient-Tuning/new_model/chatgml2",
        "api_base_url": "http://localhost:8000/v1",
        "api_key": "EMPTY"
    },
}

LLM_MODEL = "peft_lora_chatglm2"
这里面核心还是高清了FastChat的逻辑，被坑主要的原因是没有执行 第三步 ，没有启动 openai 风格的接口，而FastChat里，使用的都是openai风格的API，然后就 http retry了

但是我看您的路径还是LLAMA-Efficient的呀，这不是新版本的吗？

nailuonice · 2023-12-21T01:10:52Z

感谢解答！成功运行了。现在还存在两个问题 1.lora能够调音后的模型webui配置似乎还存在问题 2.希望LLaMA-Efficient-Tuning兼容fastchat

截至目前，使用lora合并后的模型，能用api的方式启动吗

lora合并后的模型实操是可以的就死合并这个动作本身很麻烦导出又导入的。感觉chatchat项目本身支持lora加载并且有效才是王道，我现在的问题就是 lora不合并，按照大佬们的操作步骤能运行，但就是推理的时候发现没效果呀！

Gzj369 · 2023-12-21T01:11:13Z

这是来自QQ邮箱的自动回复邮件。您好，邮件我已经收到。看到后我一定会在第一时间内阅读并回复您。

nailuonice · 2023-12-21T08:17:19Z

这里介绍lora加载的方式！！！

model_config.py相关

检查是否有llm_model_dict字段

7777端口跟 server_confgi.py对得上就行随便改

  lora相关：对应adapter_config.json 里面base_model_name_or_path写对即可

jackaihfia2334 added the bug Something isn't working label Aug 16, 2023

imClumsyPanda assigned hzg0601 Aug 17, 2023

chenkaiC4 mentioned this issue Aug 19, 2023

[BUG] Lora方式启动llm服务失败 #1110

Closed

hzg0601 closed this as completed Aug 22, 2023

MyGitHubPigStar mentioned this issue Aug 23, 2023

LLM服务无法正确加载LoRA合并后的模型，报错：EOFError: EOF when reading a line[BUG] #1193

Closed

Fraudsterrrr mentioned this issue Nov 14, 2023

相同底座的lora模型切换时不需要重新加载底座 #2042

Closed

[BUG] 加载lora微调后的模型失效 #1130

[BUG] 加载lora微调后的模型失效 #1130

Comments

jackaihfia2334 commented Aug 16, 2023 • edited Loading

hzg0601 commented Aug 17, 2023

jackaihfia2334 commented Aug 17, 2023 • edited Loading

hzg0601 commented Aug 17, 2023

jackaihfia2334 commented Aug 17, 2023

hzg0601 commented Aug 17, 2023

jackaihfia2334 commented Aug 17, 2023 via email

jackaihfia2334 commented Aug 17, 2023 via email

hzg0601 commented Aug 17, 2023

jackaihfia2334 commented Aug 18, 2023 • edited Loading

jackaihfia2334 commented Aug 18, 2023 • edited Loading

wu-xiaohua commented Aug 18, 2023

chenkaiC4 commented Aug 19, 2023

jackaihfia2334 commented Aug 19, 2023 • edited Loading

chenkaiC4 commented Aug 19, 2023

jackaihfia2334 commented Aug 19, 2023

chenkaiC4 commented Aug 19, 2023

jackaihfia2334 commented Aug 19, 2023 • edited Loading

chenkaiC4 commented Aug 19, 2023

jackaihfia2334 commented Aug 19, 2023

chenkaiC4 commented Aug 19, 2023

jackaihfia2334 commented Aug 19, 2023

jackaihfia2334 commented Aug 19, 2023

chenkaiC4 commented Aug 19, 2023 • edited Loading

chenkaiC4 commented Aug 19, 2023

chenkaiC4 commented Aug 20, 2023 • edited Loading

chenkaiC4 commented Aug 20, 2023 • edited Loading

jackaihfia2334 commented Aug 20, 2023

XiaHuGXB commented Aug 21, 2023

jackaihfia2334 commented Aug 21, 2023

dijkstra-mose commented Aug 22, 2023 • edited Loading

MyGitHubPigStar commented Aug 22, 2023

jackaihfia2334 commented Aug 22, 2023

MyGitHubPigStar commented Aug 22, 2023 • edited Loading

BC-0521 commented Aug 22, 2023

hzg0601 commented Aug 22, 2023

Gzj369 commented Sep 8, 2023 • edited Loading

Gzj369 commented Sep 8, 2023 • edited Loading

Gzj369 commented Sep 11, 2023

pursure-Hy commented Sep 21, 2023

nailuonice commented Dec 21, 2023

Gzj369 commented Dec 21, 2023 via email

nailuonice commented Dec 21, 2023 • edited Loading

jackaihfia2334 commented Aug 16, 2023 •

edited

Loading

jackaihfia2334 commented Aug 17, 2023 •

edited

Loading

jackaihfia2334 commented Aug 18, 2023 •

edited

Loading

jackaihfia2334 commented Aug 18, 2023 •

edited

Loading

jackaihfia2334 commented Aug 19, 2023 •

edited

Loading

jackaihfia2334 commented Aug 19, 2023 •

edited

Loading

chenkaiC4 commented Aug 19, 2023 •

edited

Loading

chenkaiC4 commented Aug 20, 2023 •

edited

Loading

chenkaiC4 commented Aug 20, 2023 •

edited

Loading

dijkstra-mose commented Aug 22, 2023 •

edited

Loading

MyGitHubPigStar commented Aug 22, 2023 •

edited

Loading

Gzj369 commented Sep 8, 2023 •

edited

Loading

Gzj369 commented Sep 8, 2023 •

edited

Loading

nailuonice commented Dec 21, 2023 •

edited

Loading