Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TGI latest cpu version doesn't work with some models #625

Open
yongfengdu opened this issue Aug 19, 2024 · 4 comments
Open

TGI latest cpu version doesn't work with some models #625

yongfengdu opened this issue Aug 19, 2024 · 4 comments
Assignees
Labels

Comments

@yongfengdu
Copy link
Collaborator

After updated tgi version to
ghcr.io/huggingface/text-generation-inference:latest-intel-cpu
The codegen test failed with the following 2 MODELs:
ise-uiuc/Magicoder-S-DS-6.7B
m-a-p/OpenCodeInterpreter-DS-6.7B

The later one is mentioned in the readme file of CodeGen:
https://github.com/opea-project/GenAIExamples/tree/main/CodeGen

The default model(meta-llama/CodeLlama-7b-hf) specified by docker-compose runs fine.

@srinarayan-srikanthan
Copy link
Collaborator

what is the issue you are facing, can you please post error log from docker here.

@yongfengdu
Copy link
Collaborator Author

I'm using helm install to test:
https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/common/tgi
Using command like this:
helm install tgi tgi --set LLM_MODEL_ID=ise-uiuc/Magicoder-S-DS-6.7B

@yongfengdu
Copy link
Collaborator Author

Error message/pod logs:

{"timestamp":"2024-08-19T05:38:39.361300Z","level":"INFO","fields":{"message":"Args {\n model_id: "ise-uiuc/Magicoder-S-DS-6.7B",\n revision: None,\n validation_workers: 2,\n sharded: None,\n num_shard: None,\n quantize: None,\n speculate: None,\n dtype: None,\n trust_remote_code: false,\n max_concurrent_requests: 128,\n max_best_of: 2,\n max_stop_sequences: 4,\n max_top_n_tokens: 5,\n max_input_tokens: None,\n max_input_length: None,\n max_total_tokens: None,\n waiting_served_ratio: 0.3,\n max_batch_prefill_tokens: None,\n max_batch_total_tokens: None,\n max_waiting_tokens: 20,\n max_batch_size: None,\n cuda_graphs: None,\n hostname: "tgi-874bfcffc-c4wst",\n port: 2080,\n shard_uds_path: "/tmp/text-generation-server",\n master_addr: "localhost",\n master_port: 29500,\n huggingface_hub_cache: Some(\n "/data",\n ),\n weights_cache_override: None,\n disable_custom_kernels: false,\n cuda_memory_fraction: 1.0,\n rope_scaling: None,\n rope_factor: None,\n json_output: true,\n otlp_endpoint: None,\n otlp_service_name: "text-generation-inference.router",\n cors_allow_origin: [],\n api_key: None,\n watermark_gamma: None,\n watermark_delta: None,\n ngrok: false,\n ngrok_authtoken: None,\n ngrok_edge: None,\n tokenizer_config_path: None,\n disable_grammar_support: false,\n env: false,\n max_client_batch_size: 4,\n lora_adapters: None,\n usage_stats: On,\n}"},"target":"text_generation_launcher"}
{"timestamp":"2024-08-19T05:38:39.361458Z","level":"INFO","fields":{"message":"Token file not found "/tmp/.cache/huggingface/token"","log.target":"hf_hub","log.module_path":"hf_hub","log.file":"/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs","log.line":55},"target":"hf_hub"}
{"timestamp":"2024-08-19T05:38:39.361623Z","level":"INFO","fields":{"message":"Model supports up to 16384 but tgi will now set its default to 4096 instead. This is to save VRAM by refusing large prompts in order to allow more users on the same hardware. You can increase that size using --max-batch-prefill-tokens=16434 --max-total-tokens=16384 --max-input-tokens=16383."},"target":"text_generation_launcher"}
{"timestamp":"2024-08-19T05:38:39.361636Z","level":"INFO","fields":{"message":"Default max_input_tokens to 4095"},"target":"text_generation_launcher"}
{"timestamp":"2024-08-19T05:38:39.361640Z","level":"INFO","fields":{"message":"Default max_total_tokens to 4096"},"target":"text_generation_launcher"}
{"timestamp":"2024-08-19T05:38:39.361643Z","level":"INFO","fields":{"message":"Default max_batch_prefill_tokens to 4145"},"target":"text_generation_launcher"}
{"timestamp":"2024-08-19T05:38:39.361648Z","level":"INFO","fields":{"message":"Using default cuda graphs [1, 2, 4, 8, 16, 32]"},"target":"text_generation_launcher"}
{"timestamp":"2024-08-19T05:38:39.361854Z","level":"INFO","fields":{"message":"Starting check and download process for ise-uiuc/Magicoder-S-DS-6.7B"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2024-08-19T05:38:42.469115Z","level":"INFO","fields":{"message":"Files are already present on the host. Skipping download."},"target":"text_generation_launcher"}
{"timestamp":"2024-08-19T05:38:43.169166Z","level":"INFO","fields":{"message":"Successfully downloaded weights for ise-uiuc/Magicoder-S-DS-6.7B"},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
{"timestamp":"2024-08-19T05:38:43.169575Z","level":"INFO","fields":{"message":"Starting shard"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
{"timestamp":"2024-08-19T05:38:46.051416Z","level":"WARN","fields":{"message":"FBGEMM fp8 kernels are not installed."},"target":"text_generation_launcher"}
{"timestamp":"2024-08-19T05:38:46.070139Z","level":"INFO","fields":{"message":"Using Attention = False"},"target":"text_generation_launcher"}
{"timestamp":"2024-08-19T05:38:46.070193Z","level":"INFO","fields":{"message":"Using Attention = paged"},"target":"text_generation_launcher"}
{"timestamp":"2024-08-19T05:38:46.123324Z","level":"WARN","fields":{"message":"Could not import Mamba: No module named 'mamba_ssm'"},"target":"text_generation_launcher"}
{"timestamp":"2024-08-19T05:38:46.294082Z","level":"INFO","fields":{"message":"affinity={0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47}, membind = {0}"},"target":"text_generation_launcher"}
{"timestamp":"2024-08-19T05:38:46.662238Z","level":"ERROR","fields":{"message":"Error when initializing model\nTraceback (most recent call last):\n File "/opt/conda/bin/text-generation-server", line 8, in \n sys.exit(app())\n File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in call\n return get_command(self)(*args, **kwargs)\n File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in call\n return self.main(*args, **kwargs)\n File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main\n return _main(\n File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main\n rv = self.invoke(ctx)\n File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke\n return _process_result(sub_ctx.command.invoke(sub_ctx))\n File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke\n return ctx.invoke(self.callback, **ctx.params)\n File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke\n return __callback(*args, **kwargs)\n File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper\n return callback(**use_params) # type: ignore\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 109, in serve\n server.serve(\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 274, in serve\n asyncio.run(\n File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run\n return loop.run_until_complete(main)\n File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete\n self.run_forever()\n File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever\n self._run_once()\n File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once\n handle._run()\n File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run\n self._context.run(self._callback, *self._args)\n> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 229, in serve_inner\n model = get_model_with_lora_adapters(\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/init.py", line 1198, in get_model_with_lora_adapters\n model = get_model(\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/init.py", line 769, in get_model\n return FlashCausalLM(\n File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 909, in init\n config = config_class.from_pretrained(\n File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 996, in from_pretrained\n return config_class.from_dict(config_dict, **unused_kwargs)\n File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 772, in from_dict\n config = cls(**config_dict)\n File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 192, in init\n rope_config_validation(self)\n File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_rope_utils.py", line 546, in rope_config_validation\n validation_fn(config)\n File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_rope_utils.py", line 379, in _validate_linear_scaling_rope_parameters\n rope_type = rope_scaling["rope_type"]\nKeyError: 'rope_type'"},"target":"text_generation_launcher"}
{"timestamp":"2024-08-19T05:38:48.289176Z","level":"ERROR","fields":{"message":"Shard complete standard error output:\n\n/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.\n warnings.warn(\n2024-08-19 05:38:45.737 | INFO | text_generation_server.utils.import_utils::75 - Detected system ipex\n/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/sgmv.py:18: UserWarning: Could not import SGMV kernel from Punica, falling back to loop.\n warnings.warn("Could not import SGMV kernel from Punica, falling back to loop.")\n╭───────────────────── Traceback (most recent call last) ──────────────────────╮\n│ /opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py:109 in │\n│ serve │\n│ │\n│ 106 │ │ raise RuntimeError( │\n│ 107 │ │ │ "Only 1 can be set between dtype and quantize, as they │\n│ 108 │ │ ) │\n│ ❱ 109 │ server.serve( │\n│ 110 │ │ model_id, │\n│ 111 │ │ lora_adapters, │\n│ 112 │ │ revision, │\n│ │\n│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │\n│ │ dtype = None │ │\n│ │ json_output = True │ │\n│ │ logger_level = 'INFO' │ │\n│ │ lora_adapters = [] │ │\n│ │ max_input_tokens = 4095 │ │\n│ │ model_id = 'ise-uiuc/Magicoder-S-DS-6.7B' │ │\n│ │ otlp_endpoint = None │ │\n│ │ otlp_service_name = 'text-generation-inference.router' │ │\n│ │ quantize = None │ │\n│ │ revision = None │ │\n│ │ server = <module 'text_generation_server.server' from │ │\n│ │ '/opt/conda/lib/python3.10/site-packages/text_gener… │ │\n│ │ setup_tracing = <function setup_tracing at 0x7f9f4843c9d0> │ │\n│ │ sharded = False │ │\n│ │ speculate = None │ │\n│ │ trust_remote_code = False │ │\n│ │ uds_path = PosixPath('/tmp/text-generation-server') │ │\n│ ╰──────────────────────────────────────────────────────────────────────────╯ │\n│ │\n│ /opt/conda/lib/python3.10/site-packages/text_generation_server/server.py:274 │\n│ in serve │\n│ │\n│ 271 │ │ while signal_handler.KEEP_PROCESSING: │\n│ 272 │ │ │ await asyncio.sleep(0.5) │\n│ 273 │ │\n│ ❱ 274 │ asyncio.run( │\n│ 275 │ │ serve_inner( │\n│ 276 │ │ │ model_id, │\n│ 277 │ │ │ lora_adapters, │\n│ │\n│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │\n│ │ dtype = None │ │\n│ │ lora_adapters = [] │ │\n│ │ max_input_tokens = 4095 │ │\n│ │ model_id = 'ise-uiuc/Magicoder-S-DS-6.7B' │ │\n│ │ quantize = None │ │\n│ │ revision = None │ │\n│ │ serve_inner = <function serve..serve_inner at │ │\n│ │ 0x7f9f49602680> │ │\n│ │ sharded = False │ │\n│ │ speculate = None │ │\n│ │ trust_remote_code = False │ │\n│ │ uds_path = PosixPath('/tmp/text-generation-server') │ │\n│ ╰──────────────────────────────────────────────────────────────────────────╯ │\n│ │\n│ /opt/conda/lib/python3.10/asyncio/runners.py:44 in run │\n│ │\n│ 41 │ │ events.set_event_loop(loop) │\n│ 42 │ │ if debug is not None: │\n│ 43 │ │ │ loop.set_debug(debug) │\n│ ❱ 44 │ │ return loop.run_until_complete(main) │\n│ 45 │ finally: │\n│ 46 │ │ try: │\n│ 47 │ │ │ _cancel_all_tasks(loop) │\n│ │\n│ ╭──────────────────────────────── locals ─────────────────────────────────╮ │\n│ │ debug = None │ │\n│ │ loop = <_UnixSelectorEventLoop running=False closed=True debug=False> │ │\n│ │ main = <coroutine object serve..serve_inner at 0x7f9f483e0740> │ │\n│ ╰─────────────────────────────────────────────────────────────────────────╯ │\n│ │\n│ /opt/conda/lib/python3.10/asyncio/base_events.py:649 in run_until_complete │\n│ │\n│ 646 │ │ if not future.done(): │\n│ 647 │ │ │ raise RuntimeError('Event loop stopped before Future comp │\n│ 648 │ │ │\n│ ❱ 649 │ │ return future.result() │\n│ 650 │ │\n│ 651 │ def stop(self): │\n│ 652 │ │ """Stop running the event loop. │\n│ │\n│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │\n│ │ future = <Task finished name='Task-1' │ │\n│ │ coro=<serve..serve_inner() done, defined at │ │\n│ │ /opt/conda/lib/python3.10/site-packages/text_generation_serv… │ │\n│ │ exception=KeyError('rope_type')> │ │\n│ │ new_task = True │ │\n│ │ self = <_UnixSelectorEventLoop running=False closed=True │ │\n│ │ debug=False> │ │\n│ ╰──────────────────────────────────────────────────────────────────────────╯ │\n│ │\n│ /opt/conda/lib/python3.10/site-packages/text_generation_server/server.py:229 │\n│ in serve_inner │\n│ │\n│ 226 │ │ │ server_urls = [local_url] │\n│ 227 │ │ │\n│ 228 │ │ try: │\n│ ❱ 229 │ │ │ model = get_model_with_lora_adapters( │\n│ 230 │ │ │ │ model_id, │\n│ 231 │ │ │ │ lora_adapters, │\n│ 232 │ │ │ │ revision, │\n│ │\n│ ╭──────────────────────────── locals ─────────────────────────────╮ │\n│ │ adapter_to_index = {} │ │\n│ │ dtype = None │ │\n│ │ local_url = 'unix:///tmp/text-generation-server-0' │ │\n│ │ lora_adapters = [] │ │\n│ │ max_input_tokens = 4095 │ │\n│ │ model_id = 'ise-uiuc/Magicoder-S-DS-6.7B' │ │\n│ │ quantize = None │ │\n│ │ revision = None │ │\n│ │ server_urls = ['unix:///tmp/text-generation-server-0'] │ │\n│ │ sharded = False │ │\n│ │ speculate = None │ │\n│ │ trust_remote_code = False │ │\n│ │ uds_path = PosixPath('/tmp/text-generation-server') │ │\n│ │ unix_socket_template = 'unix://{}-{}' │ │\n│ ╰─────────────────────────────────────────────────────────────────╯ │\n│ │\n│ /opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init │\n│ __.py:1198 in get_model_with_lora_adapters │\n│ │\n│ 1195 │ adapter_to_index: Dict[str, int], │\n│ 1196 ): │\n│ 1197 │ lora_adapter_ids = [adapter.id for adapter in lora_adapters] │\n│ ❱ 1198 │ model = get_model( │\n│ 1199 │ │ model_id, │\n│ 1200 │ │ lora_adapter_ids, │\n│ 1201 │ │ revision, │\n│ │\n│ ╭────────────────────── locals ──────────────────────╮ │\n│ │ adapter_to_index = {} │ │\n│ │ dtype = None │ │\n│ │ lora_adapter_ids = [] │ │\n│ │ lora_adapters = [] │ │\n│ │ max_input_tokens = 4095 │ │\n│ │ model_id = 'ise-uiuc/Magicoder-S-DS-6.7B' │ │\n│ │ quantize = None │ │\n│ │ revision = None │ │\n│ │ sharded = False │ │\n│ │ speculate = None │ │\n│ │ trust_remote_code = False │ │\n│ ╰────────────────────────────────────────────────────╯ │\n│ │\n│ /opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init │\n│ __.py:769 in get_model │\n│ │\n│ 766 │ elif model_type == LLAMA or model_type == BAICHUAN or model_type │\n│ 767 │ │ print(f">>> model_type: {model_type}") │\n│ 768 │ │ if FLASH_ATTENTION: │\n│ ❱ 769 │ │ │ return FlashCausalLM( │\n│ 770 │ │ │ │ model_id=model_id, │\n│ 771 │ │ │ │ model_class=FlashLlamaForCausalLM, │\n│ 772 │ │ │ │ revision=revision, │\n│ │\n│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │\n│ │ _ = {} │ │\n│ │ config_dict = { │ │\n│ │ │ 'name_or_path': │ │\n│ │ 'ise-uiuc/Magicoder-S-DS-6.7B', │ │\n│ │ │ 'architectures': ['LlamaForCausalLM'], │ │\n│ │ │ 'attention_bias': False, │ │\n│ │ │ 'attention_dropout': 0.0, │ │\n│ │ │ 'bos_token_id': 32013, │ │\n│ │ │ 'eos_token_id': 32014, │ │\n│ │ │ 'hidden_act': 'silu', │ │\n│ │ │ 'hidden_size': 4096, │ │\n│ │ │ 'initializer_range': 0.02, │ │\n│ │ │ 'intermediate_size': 11008, │ │\n│ │ │ ... +15 │ │\n│ │ } │ │\n│ │ dtype = None │ │\n│ │ lora_adapter_ids = [] │ │\n│ │ max_input_tokens = 4095 │ │\n│ │ method = 'n-gram' │ │\n│ │ model_id = 'ise-uiuc/Magicoder-S-DS-6.7B' │ │\n│ │ model_type = 'llama' │ │\n│ │ quantization_config = None │ │\n│ │ quantize = None │ │\n│ │ revision = None │ │\n│ │ sharded = False │ │\n│ │ should_use_sliding_window = False │ │\n│ │ sliding_window = -1 │ │\n│ │ speculate = 0 │ │\n│ │ speculator = None │ │\n│ │ trust_remote_code = False │ │\n│ ╰──────────────────────────────────────────────────────────────────────────╯ │\n│ │\n│ /opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash │\n│ causal_lm.py:909 in init │\n│ │\n│ 906 │ │ except Exception: │\n│ 907 │ │ │ pass │\n│ 908 │ │ │\n│ ❱ 909 │ │ config = config_class.from_pretrained( │\n│ 910 │ │ │ model_id, revision=revision, trust_remote_code=trust_remo │\n│ 911 │ │ ) │\n│ 912 │ │ config.quantize = quantize │\n│ │\n│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │\n│ │ aliases = None │ │\n│ │ config_class = <class │ │\n│ │ 'transformers.models.auto.configuration_auto.Auto… │ │\n│ │ default_dtype = torch.float16 │ │\n│ │ device = device(type='cpu') │ │\n│ │ dtype = torch.bfloat16 │ │\n│ │ generation_config = GenerationConfig { │ │\n│ │ "bos_token_id": 32013, │ │\n│ │ "eos_token_id": 32014 │ │\n│ │ } │ │\n│ │ head_size = None │ │\n│ │ lora_adapter_ids = [] │ │\n│ │ model_class = <class │ │\n│ │ 'text_generation_server.models.custom_modeling.fl… │ │\n│ │ model_id = 'ise-uiuc/Magicoder-S-DS-6.7B' │ │\n│ │ num_kv_heads = None │ │\n│ │ quantize = None │ │\n│ │ rank = 0 │ │\n│ │ revision = None │ │\n│ │ self = <text_generation_server.models.flash_causal_lm.Fl… │ │\n│ │ object at 0x7f9f48412a10> │ │\n│ │ skip_special_tokens = True │ │\n│ │ speculator = None │ │\n│ │ tokenizer = LlamaTokenizerFast(name_or_path='ise-uiuc/Magicod… │ │\n│ │ vocab_size=32000, model_max_length=16384, │ │\n│ │ is_fast=True, padding_side='left', │ │\n│ │ truncation_side='left', │ │\n│ │ special_tokens={'bos_token': │ │\n│ │ '<|begin▁of▁sentence|>', 'eos_token': │ │\n│ │ '<|end▁of▁sentence|>', 'pad_token': │ │\n│ │ '<|end▁of▁sentence|>'}, │ │\n│ │ clean_up_tokenization_spaces=False), │ │\n│ │ added_tokens_decoder={ │ │\n│ │ │ │ 32000: AddedToken("õ", rstrip=False, │ │\n│ │ lstrip=False, single_word=False, normalized=True, │ │\n│ │ special=False), │ │\n│ │ │ │ 32001: AddedToken("÷", rstrip=False, │ │\n│ │ lstrip=False, single_word=False, normalized=True, │ │\n│ │ special=False), │ │\n│ │ │ │ 32002: AddedToken("Á", rstrip=False, │ │\n│ │ lstrip=False, single_word=False, normalized=True, │ │\n│ │ special=False), │ │\n│ │ │ │ 32003: AddedToken("ý", rstrip=False, │ │\n│ │ lstrip=False, single_word=False, normalized=True, │ │\n│ │ special=False), │ │\n│ │ │ │ 32004: AddedToken("À", rstrip=False, │ │\n│ │ lstrip=False, single_word=False, normalized=True, │ │\n│ │ special=False), │ │\n│ │ │ │ 32005: AddedToken("ÿ", rstrip=False, │ │\n│ │ lstrip=False, single_word=False, normalized=True, │ │\n│ │ special=False), │ │\n│ │ │ │ 32006: AddedToken("ø", rstrip=False, │ │\n│ │ lstrip=False, single_word=False, normalized=True, │ │\n│ │ special=False), │ │\n│ │ │ │ 32007: AddedToken("ú", rstrip=False, │ │\n│ │ lstrip=False, single_word=False, normalized=True, │ │\n│ │ special=False), │ │\n│ │ │ │ 32008: AddedToken("þ", rstrip=False, │ │\n│ │ lstrip=False, single_word=False, normalized=True, │ │\n│ │ special=False), │ │\n│ │ │ │ 32009: AddedToken("ü", rstrip=False, │ │\n│ │ lstrip=False, single_word=False, normalized=True, │ │\n│ │ special=False), │ │\n│ │ │ │ 32010: AddedToken("ù", rstrip=False, │ │\n│ │ lstrip=False, single_word=False, normalized=True, │ │\n│ │ special=False), │ │\n│ │ │ │ 32011: AddedToken("ö", rstrip=False, │ │\n│ │ lstrip=False, single_word=False, normalized=True, │ │\n│ │ special=False), │ │\n│ │ │ │ 32012: AddedToken("û", rstrip=False, │ │\n│ │ lstrip=False, single_word=False, normalized=True, │ │\n│ │ special=False), │ │\n│ │ │ │ 32013: │ │\n│ │ AddedToken("<|begin▁of▁sentence|>", │ │\n│ │ rstrip=False, lstrip=False, single_word=False, │ │\n│ │ normalized=True, special=True), │ │\n│ │ │ │ 32014: AddedToken("<|end▁of▁sentence|>"… │ │\n│ │ rstrip=False, lstrip=False, single_word=False, │ │\n│ │ normalized=True, special=True), │ │\n│ │ │ │ 32015: AddedToken("<|fim▁hole|>", │ │\n│ │ rstrip=False, lstrip=False, single_word=False, │ │\n│ │ normalized=True, special=False), │ │\n│ │ │ │ 32016: AddedToken("<|fim▁begin|>", │ │\n│ │ rstrip=False, lstrip=False, single_word=False, │ │\n│ │ normalized=True, special=False), │ │\n│ │ │ │ 32017: AddedToken("<|fim▁end|>", │ │\n│ │ rstrip=False, lstrip=False, single_word=False, │ │\n│ │ normalized=True, special=False), │ │\n│ │ │ │ 32018: AddedToken("", rstrip=False, │ │\n│ │ lstrip=False, single_word=False, normalized=True, │ │\n│ │ special=False), │ │\n│ │ │ │ 32019: AddedToken("<|User|>", │ │\n│ │ rstrip=False, lstrip=False, single_word=False, │ │\n│ │ normalized=True, special=False), │ │\n│ │ │ │ 32020: AddedToken("<|Assistant|>", │ │\n│ │ rstrip=False, lstrip=False, single_word=False, │ │\n│ │ normalized=True, special=False), │ │\n│ │ │ │ 32021: AddedToken("<|EOT|>", rstrip=False, │ │\n│ │ lstrip=False, single_word=False, normalized=True, │ │\n│ │ special=False), │ │\n│ │ } │ │\n│ │ tokenizer_class = <class │ │\n│ │ 'transformers.models.auto.tokenization_auto.AutoT… │ │\n│ │ trust_remote_code = False │ │\n│ │ world_size = 1 │ │\n│ ╰──────────────────────────────────────────────────────────────────────────╯ │\n│ │\n│ /opt/conda/lib/python3.10/site-packages/transformers/models/auto/configurati │\n│ on_auto.py:996 in from_pretrained │\n│ │\n│ 993 │ │ │ │ │ "but Transformers does not recognize this archite │\n│ 994 │ │ │ │ │ "issue with the checkpoint, or because your versi │\n│ 995 │ │ │ │ ) │\n│ ❱ 996 │ │ │ return config_class.from_dict(config_dict, **unused_kwarg │\n│ 997 │ │ else: │\n│ 998 │ │ │ # Fallback: use pattern matching on the string. │\n│ 999 │ │ │ # We go from longer names to shorter names to catch rober │\n│ │\n│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │\n│ │ cls = <class │ │\n│ │ 'transformers.models.auto.configuration… │ │\n│ │ code_revision = None │ │\n│ │ config_class = <class │ │\n│ │ 'transformers.models.llama.configuratio… │ │\n│ │ config_dict = { │ │\n│ │ │ '_name_or_path': │ │\n│ │ 'ise-uiuc/Magicoder-S-DS-6.7B', │ │\n│ │ │ 'architectures': [ │ │\n│ │ │ │ 'LlamaForCausalLM' │ │\n│ │ │ ], │ │\n│ │ │ 'attention_bias': False, │ │\n│ │ │ 'attention_dropout': 0.0, │ │\n│ │ │ 'bos_token_id': 32013, │ │\n│ │ │ 'eos_token_id': 32014, │ │\n│ │ │ 'hidden_act': 'silu', │ │\n│ │ │ 'hidden_size': 4096, │ │\n│ │ │ 'initializer_range': 0.02, │ │\n│ │ │ 'intermediate_size': 11008, │ │\n│ │ │ ... +16 │ │\n│ │ } │ │\n│ │ has_local_code = True │ │\n│ │ has_remote_code = False │ │\n│ │ kwargs = { │ │\n│ │ │ 'revision': None, │ │\n│ │ │ 'from_auto': True, │ │\n│ │ │ 'name_or_path': │ │\n│ │ 'ise-uiuc/Magicoder-S-DS-6.7B' │ │\n│ │ } │ │\n│ │ pretrained_model_name_or_path = 'ise-uiuc/Magicoder-S-DS-6.7B' │ │\n│ │ trust_remote_code = False │ │\n│ │ unused_kwargs = { │ │\n│ │ │ 'name_or_path': │ │\n│ │ 'ise-uiuc/Magicoder-S-DS-6.7B' │ │\n│ │ } │ │\n│ │ use_auth_token = None │ │\n│ ╰──────────────────────────────────────────────────────────────────────────╯ │\n│ │\n│ /opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py: │\n│ 772 in from_dict │\n│ │\n│ 769 │ │ # We remove it from kwargs so that it does not appear in ret │\n│ 770 │ │ config_dict[\"attn_implementation\"] = kwargs.pop(\"attn_impleme │\n│ 771 │ │ │\n│ ❱ 772 │ │ config = cls(**config_dict) │\n│ 773 │ │ │\n│ 774 │ │ if hasattr(config, \"pruned_heads\"): │\n│ 775 │ │ │ config.pruned_heads = {int(key): value for key, value in │\n│ │\n│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │\n│ │ cls = <class │ │\n│ │ 'transformers.models.llama.configuration_llama.L… │ │\n│ │ config_dict = { │ │\n│ │ │ '_name_or_path': │ │\n│ │ 'ise-uiuc/Magicoder-S-DS-6.7B', │ │\n│ │ │ 'architectures': ['LlamaForCausalLM'], │ │\n│ │ │ 'attention_bias': False, │ │\n│ │ │ 'attention_dropout': 0.0, │ │\n│ │ │ 'bos_token_id': 32013, │ │\n│ │ │ 'eos_token_id': 32014, │ │\n│ │ │ 'hidden_act': 'silu', │ │\n│ │ │ 'hidden_size': 4096, │ │\n│ │ │ 'initializer_range': 0.02, │ │\n│ │ │ 'intermediate_size': 11008, │ │\n│ │ │ ... +16 │ │\n│ │ } │ │\n│ │ kwargs = {'name_or_path': 'ise-uiuc/Magicoder-S-DS-6.7B'} │ │\n│ │ return_unused_kwargs = False │ │\n│ ╰──────────────────────────────────────────────────────────────────────────╯ │\n│ │\n│ /opt/conda/lib/python3.10/site-packages/transformers/models/llama/configurat │\n│ ion_llama.py:192 in __init__ │\n│ │\n│ 189 │ │ self.mlp_bias = mlp_bias │\n│ 190 │ │ │\n│ 191 │ │ # Validate the correctness of rotary position embeddings param │\n│ ❱ 192 │ │ rope_config_validation(self) │\n│ 193 │ │ │\n│ 194 │ │ super().__init__( │\n│ 195 │ │ │ pad_token_id=pad_token_id, │\n│ │\n│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │\n│ │ attention_bias = False │ │\n│ │ attention_dropout = 0.0 │ │\n│ │ bos_token_id = 32013 │ │\n│ │ eos_token_id = 32014 │ │\n│ │ hidden_act = 'silu' │ │\n│ │ hidden_size = 4096 │ │\n│ │ initializer_range = 0.02 │ │\n│ │ intermediate_size = 11008 │ │\n│ │ kwargs = { │ │\n│ │ │ '_name_or_path': │ │\n│ │ 'ise-uiuc/Magicoder-S-DS-6.7B', │ │\n│ │ │ 'architectures': ['LlamaForCausalLM'], │ │\n│ │ │ 'model_type': 'llama', │ │\n│ │ │ 'torch_dtype': 'float32', │ │\n│ │ │ 'transformers_version': '4.36.0.dev0', │ │\n│ │ │ '_commit_hash': │ │\n│ │ 'b3ed7cb1578a3643ceaf2ebf996a3d8e85f75d8f', │ │\n│ │ │ 'attn_implementation': None │ │\n│ │ } │ │\n│ │ max_position_embeddings = 16384 │ │\n│ │ mlp_bias = False │ │\n│ │ num_attention_heads = 32 │ │\n│ │ num_hidden_layers = 32 │ │\n│ │ num_key_value_heads = 32 │ │\n│ │ pad_token_id = None │ │\n│ │ pretraining_tp = 1 │ │\n│ │ rms_norm_eps = 1e-06 │ │\n│ │ rope_scaling = {'factor': 4.0, 'type': 'linear'} │ │\n│ │ rope_theta = 100000 │ │\n│ │ self = LlamaConfig { │ │\n│ │ \"attention_bias\": false, │ │\n│ │ \"attention_dropout\": 0.0, │ │\n│ │ \"hidden_act\": \"silu\", │ │\n│ │ \"hidden_size\": 4096, │ │\n│ │ \"initializer_range\": 0.02, │ │\n│ │ \"intermediate_size\": 11008, │ │\n│ │ \"max_position_embeddings\": 16384, │ │\n│ │ \"mlp_bias\": false, │ │\n│ │ \"model_type\": \"llama\", │ │\n│ │ \"num_attention_heads\": 32, │ │\n│ │ \"num_hidden_layers\": 32, │ │\n│ │ \"num_key_value_heads\": 32, │ │\n│ │ \"pretraining_tp\": 1, │ │\n│ │ \"rms_norm_eps\": 1e-06, │ │\n│ │ \"rope_scaling\": { │ │\n│ │ │ \"factor\": 4.0, │ │\n│ │ │ \"type\": \"linear\" │ │\n│ │ }, │ │\n│ │ \"rope_theta\": 100000, │ │\n│ │ \"transformers_version\": \"4.43.1\", │ │\n│ │ \"use_cache\": true, │ │\n│ │ \"vocab_size\": 32256 │ │\n│ │ } │ │\n│ │ tie_word_embeddings = False │ │\n│ │ use_cache = True │ │\n│ │ vocab_size = 32256 │ │\n│ ╰──────────────────────────────────────────────────────────────────────────╯ │\n│ │\n│ /opt/conda/lib/python3.10/site-packages/transformers/modeling_rope_utils.py: │\n│ 546 in rope_config_validation │\n│ │\n│ 543 │ │\n│ 544 │ validation_fn = ROPE_VALIDATION_FUNCTIONS.get(rope_type) │\n│ 545 │ if validation_fn is not None: │\n│ ❱ 546 │ │ validation_fn(config) │\n│ 547 │ else: │\n│ 548 │ │ raise ValueError( │\n│ 549 │ │ │ f\"Missing validation function mapping in ROPE_VALIDATION │\n│ │\n│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │\n│ │ config = LlamaConfig { │ │\n│ │ "attention_bias": false, │ │\n│ │ "attention_dropout": 0.0, │ │\n│ │ "hidden_act": "silu", │ │\n│ │ "hidden_size": 4096, │ │\n│ │ "initializer_range": 0.02, │ │\n│ │ "intermediate_size": 11008, │ │\n│ │ "max_position_embeddings": 16384, │ │\n│ │ "mlp_bias": false, │ │\n│ │ "model_type": "llama", │ │\n│ │ "num_attention_heads": 32, │ │\n│ │ "num_hidden_layers": 32, │ │\n│ │ "num_key_value_heads": 32, │ │\n│ │ "pretraining_tp": 1, │ │\n│ │ "rms_norm_eps": 1e-06, │ │\n│ │ "rope_scaling": { │ │\n│ │ │ "factor": 4.0, │ │\n│ │ │ "type": "linear" │ │\n│ │ }, │ │\n│ │ "rope_theta": 100000, │ │\n│ │ "transformers_version": "4.43.1", │ │\n│ │ "use_cache": true, │ │\n│ │ "vocab_size": 32256 │ │\n│ │ } │ │\n│ │ possible_rope_types = { │ │\n│ │ │ 'longrope', │ │\n│ │ │ 'yarn', │ │\n│ │ │ 'default', │ │\n│ │ │ 'llama3', │ │\n│ │ │ 'linear', │ │\n│ │ │ 'dynamic' │ │\n│ │ } │ │\n│ │ rope_scaling = {'factor': 4.0, 'type': 'linear'} │ │\n│ │ rope_type = 'linear' │ │\n│ │ validation_fn = <function _validate_linear_scaling_rope_parameters │ │\n│ │ at 0x7f9f4821f250> │ │\n│ ╰──────────────────────────────────────────────────────────────────────────╯ │\n│ │\n│ /opt/conda/lib/python3.10/site-packages/transformers/modeling_rope_utils.py: │\n│ 379 in _validate_linear_scaling_rope_parameters │\n│ │\n│ 376 │\n│ 377 def _validate_linear_scaling_rope_parameters(config: PretrainedConfig) │\n│ 378 │ rope_scaling = config.rope_scaling │\n│ ❱ 379 │ rope_type = rope_scaling["rope_type"] │\n│ 380 │ required_keys = {"rope_type", "factor"} │\n│ 381 │ received_keys = set(rope_scaling.keys()) │\n│ 382 │ _check_received_keys(rope_type, received_keys, required_keys) │\n│ │\n│ ╭────────────────────── locals ──────────────────────╮ │\n│ │ config = LlamaConfig { │ │\n│ │ "attention_bias": false, │ │\n│ │ "attention_dropout": 0.0, │ │\n│ │ "hidden_act": "silu", │ │\n│ │ "hidden_size": 4096, │ │\n│ │ "initializer_range": 0.02, │ │\n│ │ "intermediate_size": 11008, │ │\n│ │ "max_position_embeddings": 16384, │ │\n│ │ "mlp_bias": false, │ │\n│ │ "model_type": "llama", │ │\n│ │ "num_attention_heads": 32, │ │\n│ │ "num_hidden_layers": 32, │ │\n│ │ "num_key_value_heads": 32, │ │\n│ │ "pretraining_tp": 1, │ │\n│ │ "rms_norm_eps": 1e-06, │ │\n│ │ "rope_scaling": { │ │\n│ │ │ "factor": 4.0, │ │\n│ │ │ "type": "linear" │ │\n│ │ }, │ │\n│ │ "rope_theta": 100000, │ │\n│ │ "transformers_version": "4.43.1", │ │\n│ │ "use_cache": true, │ │\n│ │ "vocab_size": 32256 │ │\n│ │ } │ │\n│ │ rope_scaling = {'factor': 4.0, 'type': 'linear'} │ │\n│ ╰────────────────────────────────────────────────────╯ │\n╰──────────────────────────────────────────────────────────────────────────────╯\nKeyError: 'rope_type'"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
{"timestamp":"2024-08-19T05:38:48.374957Z","level":"ERROR","fields":{"message":"Shard 0 failed to start"},"target":"text_generation_launcher"}
{"timestamp":"2024-08-19T05:38:48.375003Z","level":"INFO","fields":{"message":"Shutting down shards"},"target":"text_generation_launcher"}
Error: ShardCannotStart

@eero-t
Copy link
Contributor

eero-t commented Oct 21, 2024

After updated tgi version to ghcr.io/huggingface/text-generation-inference:latest-intel-cpu The codegen test failed with the following 2 MODELs: ise-uiuc/Magicoder-S-DS-6.7B m-a-p/OpenCodeInterpreter-DS-6.7B

The later one is mentioned in the readme file of CodeGen: https://github.com/opea-project/GenAIExamples/tree/main/CodeGen

latest-intel-cpu is mentioned now only in GitHub workflow:

GenAIExamples$ git grep latest-intel-cpu
.github/workflows/scripts/update_images_tag.sh:dict["ghcr.io/huggingface/text-generation-inference"]="docker://ghcr.io/huggingface/text-generation-inference:latest-intel-cpu"

=> can this be closed?

wangkl2 pushed a commit to wangkl2/GenAIExamples that referenced this issue Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants