feat: neuron rotation #50

ljleb · 2023-11-10T04:19:20Z

For each key:

extract an orthogonal matrix, model $A$ is the base and model $B$ is the target orientation
apply the orthogonal transform immediately

Note: this is pretty slow. On my RTX 3080 it takes me ~3 minutes to merge two models.

ljleb · 2023-11-10T04:49:50Z

A good idea would be to change the rotation rate of the orthogonal matrix $Q$ using $Q^{\alpha}$, i.e.:
$\alpha = 0$ : $I$
$\alpha = 1$ : $Q$
$\alpha = 2$ : $Q^2$
$\alpha = 0.5$ : $Q^{0.5}$

I have not tested this, but I believe we will run into the problem that it will be even slower. According to GPT, we have to compute a matrix log and then a matrix power. I'll run some tests to see how well that works.

ljleb · 2023-11-10T05:36:31Z

GPT4 found something called the cayley transform, which seems to do what we want.

2D with an arbitrary matrix $A$ and a rotation power $t$:

3D with an arbitrary matrix $A$ and a rotation power $t$:

link to discussion with GPT4: https://chat.openai.com/share/96a9b2ae-3a5f-47ce-8b22-bb07e5f6d1a9
reference: https://en.wikipedia.org/wiki/Cayley_transform#Matrix_map

ljleb · 2023-11-10T07:21:44Z

I have not tested this, but I believe we will run into the problem that it will be even slower. According to GPT, we have to compute a matrix log and then a matrix power. I'll run some tests to see how well that works.

It turns out that it is not that much slower. I'm hitting ~6 minutes when merging using --device cuda. It goes down to ~3 minutes if we exclude the cases where a layer has 1 dimension.

ljleb · 2023-11-10T08:27:34Z

I ran more tests and cayley seems to break down on the 1D case. I need to spend more time on alpha to make it work.

ljleb · 2023-11-12T18:56:34Z

I found that you can apply a fractional power to the eigenvalues of a matrix to implement a fractional matrix exponent. This is fairly slow and requires double precision to work, otherwise the output gets an imaginary component because of precision errors.

With alpha not an integer, it takes ~15 minutes to merge with --work_device cuda, ~12 minutes to merge with --device cuda, and still ~5 minutes to merge with an integer alpha.

ljleb · 2023-11-17T01:45:59Z

Notes on a couple of trade-offs I had to look into:

$\alpha$ is used for two different purposes:
1. rotating two sets of weights about their centroids in an $n$-dimensional space with ${n(n-1)}\over{2}$ rotation planes
2. rotating the centroid of $A$ towards the centroid of $B$ with respect to the origin of the vector space on an ellipse
- this has the implication that $\alpha \equiv 0 \mod 4$ will position the weights about the centroid of A. The purpose of this is to have a smooth transition for $0 \leq \alpha \leq 1$. The range $4k + 1 < \alpha < 4k + 4$ doesn't really make much sense as a result of this decision. To fix this situation, this method would need a 3rd parameter $\gamma$ to separately control these two settings
$\beta$ is used to morph the neurons of $A$ into the shape of $B$, independently of the rotation of the neurons and of the position of their centroid. 0 = shape of $A$, 1 = shape of $B$
the neurons of some of the conv layers have > 20k floats. to solve the procrustes problem for these cases, the algorithm has to compute the SVD of a 20k x 20k matrix, which is really not practical. For this reason, I excluded the conv layers from the keys to be rotated. With the merge I tested this against, full exclusion of the conv layers seems to instead give more aesthetic results than splitting the conv layers neurons in smaller chuncks and doing the SVD on these.
- As a result of this, merging 1.5 models now only takes ~3 minutes on an RTX 3080

ljleb · 2023-11-17T18:26:22Z

The models I tested against were not completely different, in particular the text encoder was the same. This skewed my small benchmarks for expected merge times. It seems to take 9 minutes to merge 2 v1 models with all different keys. Turning this to draft until we get merge time lower or determine that the merge method is valuable enough to outweight 9 minutes.

mariaWitch · 2023-12-07T20:54:32Z

sd_meh/merge_methods.py

+    if len(a.shape) == 1 or len(a.shape) == 2 and a.shape[0] == 1:
+        return new_centroid.reshape_as(a)
+
+    svd_driver = "gesvd" if a.is_cuda else None


This is actually a lot more complex than meets the eye. We should be determining the svd driver based on the size of the matrix. Different drivers perform faster on smaller/bigger matrices. And in some instances the CPU will out perform the GPU. What exactly is our average matrix size when we call svd?

If we include all keys, it goes form $320^2$ to ~ $20K^2$. As this upper bound isn't really practical, if we exclude all conv layers (which have the largest neurons), the upper bound is ~ $5K^2$. I can list all sizes in a bit, they all are square matrices.

I've never done this before at all, this is all new to me. Appreciate the help. IIUC, this only matters on cuda devices?

all matrices sizes that currently go through svd are listed below:

320x320: 47 keys

640x640: 48 keys

768x768: 94 keys

960x960: 2 keys

1280x1280: 83 keys

2560x2560: 10 keys

3072x3072: 12 keys

5120x5120: 6 keys

I did some benchmarking between jax's svd functions jitted through XLA and pytorch's different drivers on a colab using a v100 (a 3080 is about equal to this in PyTorch Performance), and these were the results.

Basically unless you need full accuracy, even with full_matrices set to true, gesvdj is going to be faster. However the speed you gain comes at the cost of some accuracy, and the potential to not always converge without needing to fall back to gesvd.

By the way full_matrices=False doesn't produce a reduced SVD when $m=n$ ($m$ and $n$ being the width and height of the svd input). That's why it didn't seem to affect generation speed. We might want to remove it as it doesn't really change anything, since the input to the svd is always a square covariance matrix here.

So I did complete a full merge on CUDA, and didn't receive the error. I think it has something to do with trying to move models between the CPU and GPU, interacting with WebUI keeping models loaded in memory. Is there sanity checking when the models are loaded to ensure that they have been moved to CPU if the work_device is set to CPU?

Before merging, when assembling the merge args, the weights are sent to the requested device:

meh/sd_meh/merge.py

Lines 465 to 466 in 2780321

"a": thetas["model_a"][key].to(work_device),

"b": thetas["model_b"][key].to(work_device),

note that if work_device is None, it takes the value of device:

meh/sd_meh/merge.py

Lines 371 to 372 in 2780321

if work_device is None:

work_device = device

So IIUC, it shouldn't be a device issue.

I think I found the culprit.

It seems that on CPU there isn't enough precision sometimes, which leads too $U$ or $V^T$ having a determinant of 0. This is not what SVD should output, $U$ and $V^T$ should always be orthogonal transforms, which implies $|det U| = |det V^T| = 1$.

When the determinant of $U$ or $V^T$ is 0, then this line divides by 0:

u[:, -1] /= torch.det(u) * torch.det(v_t)

So the last column of u sometimes is filled with infinities. Then, when trying to compute the eigenvalues of the matrix, an error is then raised.

As noted below, while this prevents the entire merge from raising an error, rotations with invalid determinants still result in a broken merge. I went the other direction and raised an error instead.

Enferlain · 2023-12-17T04:51:21Z

Wanted to try with the bayesian merger extension. Added the changed parts to merge_methods.py

At the end of stage 1 got this error

stage 1:  96%|████████████████████████████████████████████████████████████████▏  | 1608/1680 [1:03:11<02:49,  2.36s/it]
*** API error: POST: http://127.0.0.1:7860/bbwm/merge-models {'error': 'RuntimeError', 'detail': '', 'body': '', 'errors': 'torch.linalg.eig: input tensor should not contain infs or NaNs.'}
    Traceback (most recent call last):
      File "D:\stable-diffusion-webui\venv\lib\site-packages\anyio\streams\memory.py", line 98, in receive
        return self.receive_nowait()
      File "D:\stable-diffusion-webui\venv\lib\site-packages\anyio\streams\memory.py", line 93, in receive_nowait
        raise WouldBlock
    anyio.WouldBlock

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "D:\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\base.py", line 78, in call_next
        message = await recv_stream.receive()
      File "D:\stable-diffusion-webui\venv\lib\site-packages\anyio\streams\memory.py", line 118, in receive
        raise EndOfStream
    anyio.EndOfStream

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "D:\stable-diffusion-webui\modules\api\api.py", line 186, in exception_handling
        return await call_next(request)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\base.py", line 84, in call_next
        raise app_exc
      File "D:\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\base.py", line 70, in coro
        await self.app(scope, receive_or_disconnect, send_no_error)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\base.py", line 108, in __call__
        response = await self.dispatch_func(request, call_next)
      File "D:\stable-diffusion-webui\modules\api\api.py", line 150, in log_and_time
        res: Response = await call_next(req)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\base.py", line 84, in call_next
        raise app_exc
      File "D:\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\base.py", line 70, in coro
        await self.app(scope, receive_or_disconnect, send_no_error)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\cors.py", line 84, in __call__
        await self.app(scope, receive, send)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\gzip.py", line 24, in __call__
        await responder(scope, receive, send)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\gzip.py", line 44, in __call__
        await self.app(scope, receive, self.send_with_gzip)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\exceptions.py", line 79, in __call__
        raise exc
      File "D:\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\exceptions.py", line 68, in __call__
        await self.app(scope, receive, sender)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 21, in __call__
        raise e
      File "D:\stable-diffusion-webui\venv\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 18, in __call__
        await self.app(scope, receive, send)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\starlette\routing.py", line 718, in __call__
        await route.handle(scope, receive, send)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\starlette\routing.py", line 276, in handle
        await self.app(scope, receive, send)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\starlette\routing.py", line 66, in app
        response = await func(request)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\fastapi\routing.py", line 237, in app
        raw_response = await run_endpoint_function(
      File "D:\stable-diffusion-webui\venv\lib\site-packages\fastapi\routing.py", line 163, in run_endpoint_function
        return await dependant.call(**values)
      File "D:\stable-diffusion-webui\extensions\sd-webui-bayesian-merger\scripts\api.py", line 78, in merge_models_api
        merged = merge_models(
      File "D:\stable-diffusion-webui\venv\lib\site-packages\sd_meh\merge.py", line 176, in merge_models
        merged = simple_merge(
      File "D:\stable-diffusion-webui\venv\lib\site-packages\sd_meh\merge.py", line 262, in simple_merge
        res.result()
      File "C:\Users\Imi\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\_base.py", line 451, in result
        return self.__get_result()
      File "C:\Users\Imi\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\_base.py", line 403, in __get_result
        raise self._exception
      File "C:\Users\Imi\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\thread.py", line 58, in run
        result = self.fn(*self.args, **self.kwargs)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\sd_meh\merge.py", line 371, in simple_merge_key
        with merge_key_context(key, thetas, *args, **kwargs) as result:
      File "C:\Users\Imi\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 135, in __enter__
        return next(self.gen)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\sd_meh\merge.py", line 475, in merge_key_context
        result = merge_key(*args, **kwargs)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\sd_meh\merge.py", line 447, in merge_key
        merged_key = merge_method(**merge_args).to(device)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\sd_meh\merge_methods.py", line 259, in rotate
        transform = fractional_matrix_power(transform, alpha)
      File "D:\stable-diffusion-webui\venv\lib\site-packages\sd_meh\merge_methods.py", line 279, in fractional_matrix_power
        eigenvalues, eigenvectors = torch.linalg.eig(matrix)
    RuntimeError: torch.linalg.eig: input tensor should not contain infs or NaNs.

(fix is using cuda as device instead of cpu)

ljleb · 2023-12-18T03:39:21Z

See the discussion here #50 (comment). This can happen when merging on the CPU with fractional alpha.

oft

18bff99

ljleb changed the title ~~OFT extract + apply~~ feat: OFT extract + apply Nov 10, 2023

ljleb added 3 commits November 9, 2023 23:20

device

47bdac9

rename

0681458

fix black

1fe0882

ljleb changed the title ~~feat: OFT extract + apply~~ feat: OFT extract + apply, aka rotate Nov 10, 2023

ljleb added 2 commits November 10, 2023 00:30

cayley interpolation for alpha

2810d89

refact

09bed88

ljleb added 2 commits November 10, 2023 02:27

add method to __all__

f11c054

include 1D 'rotation'

1f497e9

ljleb added 6 commits November 10, 2023 05:22

ignore alpha for now

36fccaa

refact

f18208d

implement fractional rotations

1dafe83

fix transform direction

149ab16

fix eye

1f71391

rewrite with out=

b464fd3

ljleb changed the title ~~feat: OFT extract + apply, aka rotate~~ feat: neuron rotation Nov 12, 2023

ljleb added 8 commits November 12, 2023 01:33

it works; opt now

e1dc59c

optimize: 45m -> 7m

cbb6a06

rm print

ce62946

fix precision issues

8172927

fix precision issues

19fcc0a

black

f954270

dont change

e94e252

imps

1f380c8

ljleb added 3 commits November 13, 2023 04:13

backup

0d5160b

deal with conv attention shape, rotate centroids

1920496

black

5a1c776

ljleb added the 🔥 New feature or request label Nov 17, 2023

ljleb requested a review from s1dlx November 17, 2023 01:53

s1dlx approved these changes Nov 17, 2023

View reviewed changes

ljleb marked this pull request as draft November 17, 2023 18:23

ljleb added 5 commits November 18, 2023 00:31

wip

f61d6aa

refact

6ac82d6

backup

7fff089

remove approx

6ddc503

dont edit

a6742b3

mariaWitch reviewed Dec 7, 2023

View reviewed changes

ljleb added 3 commits December 7, 2023 18:49

fix fp16 and fp32 merges

d84b776

reduced svd

d812ea8

black

38d4db6

ljleb added 2 commits December 17, 2023 17:30

dont ellipsis

3c90395

print more info for debug

f506831

dont merge sdxl kek

1b46056

ljleb force-pushed the oft branch from 4db8968 to 1b46056 Compare December 18, 2023 07:54

ljleb added 5 commits December 18, 2023 14:41

black

aeb8c99

revert utils.py

de71102

cache impl

a01e016

cache eigen inv

003017e

Update merge.py

81515bd

ljleb mentioned this pull request Mar 1, 2024

Using OFT to rotate the neurons of A towards B hako-mikan/sd-webui-supermerger#347

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: neuron rotation #50

feat: neuron rotation #50

ljleb commented Nov 10, 2023 •

edited

Loading

ljleb commented Nov 10, 2023 •

edited

Loading

ljleb commented Nov 10, 2023 •

edited

Loading

ljleb commented Nov 10, 2023 •

edited

Loading

ljleb commented Nov 10, 2023

ljleb commented Nov 12, 2023 •

edited

Loading

ljleb commented Nov 17, 2023 •

edited

Loading

ljleb commented Nov 17, 2023 •

edited

Loading

mariaWitch Dec 7, 2023 •

edited

Loading

ljleb Dec 7, 2023 •

edited

Loading

ljleb Dec 7, 2023 •

edited

Loading

mariaWitch Dec 8, 2023

This comment was marked as outdated.

ljleb Dec 8, 2023

mariaWitch Dec 11, 2023

ljleb Dec 12, 2023 •

edited

Loading

ljleb Dec 18, 2023 •

edited

Loading

ljleb Dec 18, 2023

Enferlain commented Dec 17, 2023 •

edited

Loading

ljleb commented Dec 18, 2023 •

edited

Loading

	"a": thetas["model_a"][key].to(work_device),
	"b": thetas["model_b"][key].to(work_device),

feat: neuron rotation #50

Are you sure you want to change the base?

feat: neuron rotation #50

Conversation

ljleb commented Nov 10, 2023 • edited Loading

ljleb commented Nov 10, 2023 • edited Loading

ljleb commented Nov 10, 2023 • edited Loading

ljleb commented Nov 10, 2023 • edited Loading

ljleb commented Nov 10, 2023

ljleb commented Nov 12, 2023 • edited Loading

ljleb commented Nov 17, 2023 • edited Loading

ljleb commented Nov 17, 2023 • edited Loading

mariaWitch Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

ljleb Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

ljleb Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

mariaWitch Dec 8, 2023

Choose a reason for hiding this comment

This comment was marked as outdated.

ljleb Dec 8, 2023

Choose a reason for hiding this comment

mariaWitch Dec 11, 2023

Choose a reason for hiding this comment

ljleb Dec 12, 2023 • edited Loading

Choose a reason for hiding this comment

ljleb Dec 18, 2023 • edited Loading

Choose a reason for hiding this comment

ljleb Dec 18, 2023

Choose a reason for hiding this comment

Enferlain commented Dec 17, 2023 • edited Loading

ljleb commented Dec 18, 2023 • edited Loading

ljleb commented Nov 10, 2023 •

edited

Loading

ljleb commented Nov 10, 2023 •

edited

Loading

ljleb commented Nov 10, 2023 •

edited

Loading

ljleb commented Nov 10, 2023 •

edited

Loading

ljleb commented Nov 12, 2023 •

edited

Loading

ljleb commented Nov 17, 2023 •

edited

Loading

ljleb commented Nov 17, 2023 •

edited

Loading

mariaWitch Dec 7, 2023 •

edited

Loading

ljleb Dec 7, 2023 •

edited

Loading

ljleb Dec 7, 2023 •

edited

Loading

ljleb Dec 12, 2023 •

edited

Loading

ljleb Dec 18, 2023 •

edited

Loading

Enferlain commented Dec 17, 2023 •

edited

Loading

ljleb commented Dec 18, 2023 •

edited

Loading