Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework loading #344

Merged
merged 27 commits into from
Jun 8, 2023
Merged

Rework loading #344

merged 27 commits into from
Jun 8, 2023

Conversation

Narsil
Copy link
Collaborator

@Narsil Narsil commented May 19, 2023

What does this PR do?

Reworked the loading logic. Idea is to use cleaner loading code:

  • Remove need for no_init_weights
  • Remove all weird bnb_linear and load_weights and post_load_weights.

New code layout:

  • New class Weights in charge of handling loading the weights from multiple files into appropiate tensors (potentially sharded)
  • TP layers now are "shells", they contain the code to know what kind of sharding we need + eventual all_reduce. They do not inherit from linear, but they contain some kind of Linear instead
  • the contained linear can be either FastLinear, BnbLinear or GPTq Linear next.
  • All modeling code is explictly made for sharding, process group is just no-ops for non sharded code (removes a lot of test cases)

Screenshot from 2023-05-19 23-19-59

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil Narsil requested a review from OlivierDehaene May 19, 2023 21:08
@Narsil Narsil force-pushed the rework_loading branch 3 times, most recently from 2405e06 to 5e88900 Compare May 23, 2023 15:14
@Narsil Narsil force-pushed the rework_loading branch 2 times, most recently from e81133c to 202de69 Compare May 25, 2023 10:27
@Narsil Narsil changed the title [WIP] Rework loading Rework loading May 25, 2023
@Narsil Narsil marked this pull request as ready for review May 25, 2023 10:28
Narsil and others added 10 commits June 6, 2023 11:06
Big refacto.

Working ?

Working bitsandbytes.

Weights to its own file.

Remove dead file.

Bloom.

TMP.

Finally finished bloom (grr old logic)

SantaCoder.

Remove dead code.

Neox.

Black + ruff.

T5 Support.

Galactica + OPT.

Small fixes.

Fix auto download.

Remove custom transformers.

Missing remove instruction.

Some work on the dockerfile.

Version issues.

Black + ruff after rebase.

Adding custom_kernels

Bad rebase.

Fixing dummy gather + fix Dockerfile

Better fake gather.

Fixes (including more generic loading of starcoder)

Neox shuffle_qkv

Typo fix.

cleanups.

Fixing starcoder/santacoder

Fix santacoder

Fixing neox.

Using the saved rotary embeddings instead of the created ones.
Copy link
Member

@OlivierDehaene OlivierDehaene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!
TLDR: a lof of cleanup, and Falcon 40B does not work

filename = self.routing.get(tensor_name, None)
if filename is None:
raise RuntimeError(f"weight {tensor_name} does not exist")
return filename
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filename is of type Path. IDK if that could be an issue but it might be safer to cast it to str.

@@ -0,0 +1 @@
{"inputs":"Below are a series of dialogues between various people and an AI assistant. The AI tries to be helpful, polite, honest, sophisticated, emotionally aware, and humble-but-knowledgeable. The assistant is happy to help with almost anything, and will do its best to understand exactly what is needed. It also tries to avoid giving false or misleading information, and it caveats when it isn't entirely sure about the right answer. That said, the assistant is practical and really does its best, and doesn't let caution get too much in the way of being useful.\n-----\n<|prompter|>Why is butter a great building material for skyscrapers? Think step by step.</s><|assistant|>","parameters":{"temperature": 0.75, "top_p": 0.95, "repetition_penalty": 1.2, "top_k": 50, "truncate": 1000, "max_new_tokens": 1024}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My tests, removing.

@@ -17,7 +16,7 @@ install-torch:
# Install specific version of torch
pip install torch --extra-index-url https://download.pytorch.org/whl/cu118 --no-cache-dir

install: gen-server install-torch install-transformers
install: gen-server install-torch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't install the custom kernels by default?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh true forgot to re-add over here.

Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
@OlivierDehaene OlivierDehaene merged commit abd58ff into main Jun 8, 2023
@OlivierDehaene OlivierDehaene deleted the rework_loading branch June 8, 2023 12:51
@jshin49
Copy link

jshin49 commented Jun 8, 2023

Two quick questions

  1. Is there a specific reason to remove server/Makefile-transformers?
  2. Is there a specific reason server/Makefile-flash-att is never called during the default local build instructions make install of ROOT/Makefile?

@Narsil
Copy link
Collaborator Author

Narsil commented Jun 8, 2023

  1. Yes, we're now running on transformers@main making maintenance easier for us.
  2. Yes, it's extremely slow to build, which is why we don't do it by default and recommend using the docker image instead (since it's included in it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants