Skip to content

Commit

Permalink
Deprecate default chat templates (#30346)
Browse files Browse the repository at this point in the history
* initial commit, remove warnings on default chat templates

* stash commit

* Raise a much sterner warning for default chat templates, and prepare for depreciation

* Update the docs
  • Loading branch information
Rocketknight1 authored Apr 19, 2024
1 parent e67ccf0 commit 0927bfd
Show file tree
Hide file tree
Showing 20 changed files with 102 additions and 79 deletions.
10 changes: 7 additions & 3 deletions docs/source/en/chat_templating.md
Original file line number Diff line number Diff line change
Expand Up @@ -362,7 +362,11 @@ template for your tokenizer is by checking the `tokenizer.default_chat_template`
This is something we do purely for backward compatibility reasons, to avoid breaking any existing workflows. Even when
the class template is appropriate for your model, we strongly recommend overriding the default template by
setting the `chat_template` attribute explicitly to make it clear to users that your model has been correctly configured
for chat, and to future-proof in case the default templates are ever altered or deprecated.
for chat.

Now that actual chat templates have been adopted more widely, default templates have been deprecated and will be
removed in a future release. We strongly recommend setting the `chat_template` attribute for any tokenizers that
still depend on them!

### What template should I use?

Expand All @@ -374,8 +378,8 @@ best performance for inference or fine-tuning when you precisely match the token

If you're training a model from scratch, or fine-tuning a base language model for chat, on the other hand,
you have a lot of freedom to choose an appropriate template! LLMs are smart enough to learn to handle lots of different
input formats. Our default template for models that don't have a class-specific template follows the
`ChatML` format, and this is a good, flexible choice for many use-cases. It looks like this:
input formats. One popular choice is the `ChatML` format, and this is a good, flexible choice for many use-cases.
It looks like this:

```
{% for message in messages %}
Expand Down
9 changes: 5 additions & 4 deletions src/transformers/models/blenderbot/tokenization_blenderbot.py
Original file line number Diff line number Diff line change
Expand Up @@ -412,10 +412,11 @@ def default_chat_template(self):
A very simple chat template that just adds whitespace between messages.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return (
"{% for message in messages %}"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -294,10 +294,11 @@ def default_chat_template(self):
A very simple chat template that just adds whitespace between messages.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return (
"{% for message in messages %}"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -225,10 +225,11 @@ def default_chat_template(self):
A very simple chat template that just adds whitespace between messages.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return (
"{% for message in messages %}"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -105,10 +105,11 @@ def default_chat_template(self):
A very simple chat template that just adds whitespace between messages.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return (
"{% for message in messages %}"
Expand Down
9 changes: 5 additions & 4 deletions src/transformers/models/bloom/tokenization_bloom_fast.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,9 +156,10 @@ def default_chat_template(self):
A simple chat template that ignores role information and just concatenates messages with EOS tokens.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return "{% for message in messages %}" "{{ message.content }}{{ eos_token }}" "{% endfor %}"
9 changes: 5 additions & 4 deletions src/transformers/models/code_llama/tokenization_code_llama.py
Original file line number Diff line number Diff line change
Expand Up @@ -457,10 +457,11 @@ def default_chat_template(self):
in the original repository.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
template = (
"{% if messages[0]['role'] == 'system' %}"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -370,10 +370,11 @@ def default_chat_template(self):
in the original repository.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
template = (
"{% if messages[0]['role'] == 'system' %}"
Expand Down
9 changes: 5 additions & 4 deletions src/transformers/models/cohere/tokenization_cohere_fast.py
Original file line number Diff line number Diff line change
Expand Up @@ -248,10 +248,11 @@ def default_chat_template(self):
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
default_template = (
"{{ bos_token }}"
Expand Down
9 changes: 5 additions & 4 deletions src/transformers/models/gpt2/tokenization_gpt2.py
Original file line number Diff line number Diff line change
Expand Up @@ -337,9 +337,10 @@ def default_chat_template(self):
A simple chat template that ignores role information and just concatenates messages with EOS tokens.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return "{% for message in messages %}" "{{ message.content }}{{ eos_token }}" "{% endfor %}"
9 changes: 5 additions & 4 deletions src/transformers/models/gpt2/tokenization_gpt2_fast.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,9 +148,10 @@ def default_chat_template(self):
A simple chat template that ignores role information and just concatenates messages with EOS tokens.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return "{% for message in messages %}" "{{ message.content }}{{ eos_token }}" "{% endfor %}"
Original file line number Diff line number Diff line change
Expand Up @@ -235,9 +235,10 @@ def default_chat_template(self):
A simple chat template that ignores role information and just concatenates messages with EOS tokens.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return "{% for message in messages %}" "{{ message.content }}{{ eos_token }}" "{% endfor %}"
Original file line number Diff line number Diff line change
Expand Up @@ -166,10 +166,11 @@ def default_chat_template(self):
A simple chat template that just adds BOS/EOS tokens around messages while discarding role information.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return (
"{% for message in messages %}"
Expand Down
9 changes: 5 additions & 4 deletions src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py
Original file line number Diff line number Diff line change
Expand Up @@ -302,10 +302,11 @@ def default_chat_template(self):
preceding messages. BOS tokens are added between all messages.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return (
"{{ eos_token }}{{ bos_token }}"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -247,10 +247,11 @@ def default_chat_template(self):
information.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return (
"{% for message in messages %}"
Expand Down
9 changes: 5 additions & 4 deletions src/transformers/models/llama/tokenization_llama.py
Original file line number Diff line number Diff line change
Expand Up @@ -430,10 +430,11 @@ def default_chat_template(self):
in the original repository.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
template = (
"{% if messages[0]['role'] == 'system' %}"
Expand Down
Loading

0 comments on commit 0927bfd

Please sign in to comment.