You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running the Llama-2-7b-chat-hf model on Huggingface.
When I set temperature=0.0 or temperature=0, I get ValueError: temperature has to be a strictly positive float, but is 0.0.
Until a week ago, It was working with the same code and environment.
My code and error message;
fromtransformersimportAutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfigmodel_name="meta-llama/Llama-2-7b-chat-hf"bnb_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
)
model_4bit=AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
trust_remote_code=True
)
model_4bit.config.use_cache=Falsemodel=model_4bittokenizer=AutoTokenizer.from_pretrained(model_name)
defgenerate(text):
prompt=f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.### Instruction:Summarize following sentence in three lines.### Input:{text}### Response:"""input_ids=tokenizer.encode(prompt, return_tensors="pt")
input_ids.to(device)
withtorch.no_grad():
outputs=model.generate(inputs=input_ids,
temperature=0.0,
max_new_tokens=500)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
text="""FC Barcelona's Spanish defender Jordi Alba and Turkish midfielder Arda Turan have returned to full training, according to the Spanish newspaper Marca on March 28. J. Alba returned to full training after suffering an injury in the Copa del Rey match against Athletic Bilbao on March 17. Arda, who missed the match against Atletico Madrid on March 27 due to a high fever, has also returned to the squad and is now in good shape for the match against Atletico Madrid."""generate(text)
>>ValueErrorTraceback (mostrecentcalllast)
CellIn[12], line52input_ids.to(device)
3withtorch.no_grad():
---->5outputs=model.generate(inputs=input_ids,
6temperature=0.0,
7max_new_tokens=500)
8print(tokenizer.decode(outputs[0], skip_special_tokens=True))
File~/anaconda3/envs/llama2/lib/python3.9/site-packages/torch/utils/_contextlib.py:115, incontext_decorator.<locals>.decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113defdecorate_context(*args, **kwargs):
114withctx_factory():
-->115returnfunc(*args, **kwargs)
File~/anaconda3/envs/llama2/lib/python3.9/site-packages/transformers/generation/utils.py:1604, inGenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, **kwargs)
1586returnself.contrastive_search(
1587input_ids,
1588top_k=generation_config.top_k,
(...)
1599**model_kwargs,
1600 )
1602elifis_sample_gen_mode:
1603# 11. prepare logits warper->1604logits_warper=self._get_logits_warper(generation_config)
1606# 12. expand input_ids with `num_return_sequences` additional sequences per batch1607input_ids, model_kwargs=self._expand_inputs_for_generation(
1608input_ids=input_ids,
1609expand_size=generation_config.num_return_sequences,
1610is_encoder_decoder=self.config.is_encoder_decoder,
1611**model_kwargs,
1612 )
File~/anaconda3/envs/llama2/lib/python3.9/site-packages/transformers/generation/utils.py:809, inGenerationMixin._get_logits_warper(self, generation_config)
806# the following idea is largely copied from this PR: https://github.com/huggingface/transformers/pull/5420/files807# all samplers can be found in `generation_utils_samplers.py`808ifgeneration_config.temperatureisnotNoneandgeneration_config.temperature!=1.0:
-->809warpers.append(TemperatureLogitsWarper(generation_config.temperature))
810min_tokens_to_keep=2ifgeneration_config.num_beams>1else1811ifgeneration_config.top_kisnotNoneandgeneration_config.top_k!=0:
File~/anaconda3/envs/llama2/lib/python3.9/site-packages/transformers/generation/logits_process.py:231, inTemperatureLogitsWarper.__init__(self, temperature)
229def__init__(self, temperature: float):
230ifnotisinstance(temperature, float) ornot (temperature>0):
-->231raiseValueError(f"`temperature` has to be a strictly positive float, but is {temperature}")
233self.temperature=temperatureValueError: `temperature`hastobeastrictlypositivefloat, butis0.0
The text was updated successfully, but these errors were encountered:
We've been adding validation to .generate, adding exceptions to breaking operations and warnings to other incorrect (but harmless output-wise) operations.
Setting temperature=0.0 means a division by 0 operation will occur, which opens a pandora's box of problems :) I'm assuming you want to run greedy decoding, in which case the correct flag is do_sample=False.
To actions from this issue:
Short term: The message in exception will be improved to nudge towards the use of do_sample=False
Long term: We were already thinking of triggering greedy methods when temperature = 0.0, this issue further reinforces it.
I am running the Llama-2-7b-chat-hf model on Huggingface.
When I set temperature=0.0 or temperature=0, I get
ValueError:
temperaturehas to be a strictly positive float, but is 0.0.
Until a week ago, It was working with the same code and environment.
My code and error message;
The text was updated successfully, but these errors were encountered: