-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: neuron rotation #50
Draft
ljleb
wants to merge
44
commits into
s1dlx:dev
Choose a base branch
from
ljleb:oft
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 36 commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
18bff99
oft
ljleb 47bdac9
device
ljleb 0681458
rename
ljleb 1fe0882
fix black
ljleb 2810d89
cayley interpolation for alpha
ljleb 09bed88
refact
ljleb f11c054
add method to __all__
ljleb 1f497e9
include 1D 'rotation'
ljleb 36fccaa
ignore alpha for now
ljleb f18208d
refact
ljleb 1dafe83
implement fractional rotations
ljleb 149ab16
fix transform direction
ljleb 1f71391
fix eye
ljleb b464fd3
rewrite with out=
ljleb e1dc59c
it works; opt now
ljleb cbb6a06
optimize: 45m -> 7m
ljleb ce62946
rm print
ljleb 8172927
fix precision issues
ljleb 19fcc0a
fix precision issues
ljleb f954270
black
ljleb e94e252
dont change
ljleb 1f380c8
imps
ljleb ea95b66
beta is deformation
ljleb 1751f59
simplify
ljleb c69bb95
@
ljleb 0d5160b
backup
ljleb 1920496
deal with conv attention shape, rotate centroids
ljleb 5a1c776
black
ljleb f61d6aa
wip
ljleb 6ac82d6
refact
ljleb 7fff089
backup
ljleb 6ddc503
remove approx
ljleb a6742b3
dont edit
ljleb d84b776
fix fp16 and fp32 merges
ljleb d812ea8
reduced svd
ljleb 38d4db6
black
ljleb 3c90395
dont ellipsis
ljleb f506831
print more info for debug
ljleb 1b46056
dont merge sdxl kek
ljleb aeb8c99
black
ljleb de71102
revert utils.py
ljleb a01e016
cache impl
ljleb 003017e
cache eigen inv
ljleb 81515bd
Update merge.py
ljleb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a lot more complex than meets the eye. We should be determining the svd driver based on the size of the matrix. Different drivers perform faster on smaller/bigger matrices. And in some instances the CPU will out perform the GPU. What exactly is our average matrix size when we call svd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we include all keys, it goes form$320^2$ to ~ $20K^2$ . As this upper bound isn't really practical, if we exclude all conv layers (which have the largest neurons), the upper bound is ~ $5K^2$ . I can list all sizes in a bit, they all are square matrices.
I've never done this before at all, this is all new to me. Appreciate the help. IIUC, this only matters on cuda devices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all matrices sizes that currently go through svd are listed below:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some benchmarking between jax's svd functions jitted through XLA and pytorch's different drivers on a colab using a v100 (a 3080 is about equal to this in PyTorch Performance), and these were the results.
Basically unless you need full accuracy, even with full_matrices set to true, gesvdj is going to be faster. However the speed you gain comes at the cost of some accuracy, and the potential to not always converge without needing to fall back to gesvd.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way$m=n$ ($m$ and $n$ being the width and height of the svd input). That's why it didn't seem to affect generation speed. We might want to remove it as it doesn't really change anything, since the input to the svd is always a square covariance matrix here.
full_matrices=False
doesn't produce a reduced SVD whenThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I did complete a full merge on CUDA, and didn't receive the error. I think it has something to do with trying to move models between the CPU and GPU, interacting with WebUI keeping models loaded in memory. Is there sanity checking when the models are loaded to ensure that they have been moved to CPU if the work_device is set to CPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before merging, when assembling the merge args, the weights are sent to the requested device:
meh/sd_meh/merge.py
Lines 465 to 466 in 2780321
note that if
work_device
isNone
, it takes the value ofdevice
:meh/sd_meh/merge.py
Lines 371 to 372 in 2780321
So IIUC, it shouldn't be a device issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I found the culprit.
It seems that on CPU there isn't enough precision sometimes, which leads too$U$ or $V^T$ having a determinant of 0. This is not what SVD should output, $U$ and $V^T$ should always be orthogonal transforms, which implies $|det U| = |det V^T| = 1$ .
When the determinant of$U$ or $V^T$ is 0, then this line divides by 0:
So the last column of
u
sometimes is filled with infinities. Then, when trying to compute the eigenvalues of the matrix, an error is then raised.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted below, while this prevents the entire merge from raising an error, rotations with invalid determinants still result in a broken merge. I went the other direction and raised an error instead.