Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum load of the first transformer iteration #5

Open
deanAirre opened this issue Aug 27, 2024 · 4 comments
Open

Maximum load of the first transformer iteration #5

deanAirre opened this issue Aug 27, 2024 · 4 comments

Comments

@deanAirre
Copy link

deanAirre commented Aug 27, 2024

Good Evening,

I am very interested in research around making Transformers more approachable to public especially within community without good GPU.

I already solved previous question, but I want to ask another question. Does the first iteration of transformer learning is supposed to be heavy in load because it is filled with unpruned tokens? I know the first iteration transformers is necessary for determining which token to focus in the pruning section, but how much is the maximum load of the first iteration compared to the final pruning with good accuracy?

Thanks in advance,
Regards,
Sean.

@deanAirre deanAirre reopened this Sep 17, 2024
@deanAirre deanAirre changed the title Confidence applying this within CPU only learning Maximum load of the first transformer iteration Sep 17, 2024
@ZLKong
Copy link
Collaborator

ZLKong commented Oct 10, 2024

Hi Sean:

The "first iteration of transformer" you mentioned, do you mean the first transformer block or the first iteration during training?

@deanAirre
Copy link
Author

Dear PeiyanFlying,

Yes the first iteration during training before pruning happens. Also in case needed, have you heard about a method of actually 'infuse' trained model to the first iteration blocks of transformers so it doesn't have to do training from scratch?

Thanks in advance, best regards,
Sean.

@ZLKong
Copy link
Collaborator

ZLKong commented Oct 23, 2024

Hi Sean:

The first iteration of transformer learning should be in load because the pruning has not started, but it should be a similar load compared to the original ViT.

Regarding infuse, I am not sure about this. I assume this is similar to distillation, or lottery ticket method, where you get a good initial weight for the layers, and then do fine-tuning or training?

@deanAirre
Copy link
Author

deanAirre commented Oct 24, 2024

Dear PeiyanFling,

Yes, the first iteration should be as heavy as original ViT because no pruning has been done, so I was looking for a way to 'infuse' pretrained model so it doesn't have to go as heavy as original transformer. Since it is confirmed it will be as heavy I will look for a way, maybe distillation or lottery ticket method, to make SPViT even more lighter.

But then I wonder how your 'adaptive pruning' method will 'see where it suitable to stop' if it doesn't hold embedding table from ViT first iteration training, do you think it will still work if I 'distilled' model to first SPViT training layer so it goes straight to pruning?

Thanks in advance, the discussion have been very helpful,
Sean

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants