-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework loading #344
Rework loading #344
Conversation
2405e06
to
5e88900
Compare
e81133c
to
202de69
Compare
Big refacto. Working ? Working bitsandbytes. Weights to its own file. Remove dead file. Bloom. TMP. Finally finished bloom (grr old logic) SantaCoder. Remove dead code. Neox. Black + ruff. T5 Support. Galactica + OPT. Small fixes. Fix auto download. Remove custom transformers. Missing remove instruction. Some work on the dockerfile. Version issues. Black + ruff after rebase. Adding custom_kernels Bad rebase. Fixing dummy gather + fix Dockerfile Better fake gather. Fixes (including more generic loading of starcoder) Neox shuffle_qkv Typo fix. cleanups. Fixing starcoder/santacoder Fix santacoder Fixing neox. Using the saved rotary embeddings instead of the created ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
TLDR: a lof of cleanup, and Falcon 40B does not work
server/text_generation_server/models/custom_modeling/flash_llama_modeling.py
Outdated
Show resolved
Hide resolved
server/text_generation_server/models/custom_modeling/flash_neox_modeling.py
Outdated
Show resolved
Hide resolved
server/text_generation_server/models/custom_modeling/flash_neox_modeling.py
Outdated
Show resolved
Hide resolved
server/text_generation_server/models/custom_modeling/flash_rw_modeling.py
Outdated
Show resolved
Hide resolved
server/text_generation_server/models/custom_modeling/flash_rw_modeling.py
Show resolved
Hide resolved
filename = self.routing.get(tensor_name, None) | ||
if filename is None: | ||
raise RuntimeError(f"weight {tensor_name} does not exist") | ||
return filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filename
is of type Path
. IDK if that could be an issue but it might be safer to cast it to str.
@@ -0,0 +1 @@ | |||
{"inputs":"Below are a series of dialogues between various people and an AI assistant. The AI tries to be helpful, polite, honest, sophisticated, emotionally aware, and humble-but-knowledgeable. The assistant is happy to help with almost anything, and will do its best to understand exactly what is needed. It also tries to avoid giving false or misleading information, and it caveats when it isn't entirely sure about the right answer. That said, the assistant is practical and really does its best, and doesn't let caution get too much in the way of being useful.\n-----\n<|prompter|>Why is butter a great building material for skyscrapers? Think step by step.</s><|assistant|>","parameters":{"temperature": 0.75, "top_p": 0.95, "repetition_penalty": 1.2, "top_k": 50, "truncate": 1000, "max_new_tokens": 1024}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My tests, removing.
@@ -17,7 +16,7 @@ install-torch: | |||
# Install specific version of torch | |||
pip install torch --extra-index-url https://download.pytorch.org/whl/cu118 --no-cache-dir | |||
|
|||
install: gen-server install-torch install-transformers | |||
install: gen-server install-torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't install the custom kernels by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh true forgot to re-add over here.
Co-authored-by: OlivierDehaene <[email protected]>
…modeling.py Co-authored-by: OlivierDehaene <[email protected]>
Two quick questions
|
|
What does this PR do?
Reworked the loading logic. Idea is to use cleaner loading code:
no_init_weights
bnb_linear
andload_weights
andpost_load_weights
.New code layout:
Weights
in charge of handling loading the weights from multiple files into appropiate tensors (potentially sharded)all_reduce
. They do not inherit from linear, but they contain some kind of Linear insteadFixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.