Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accelerate calculation mechanism and accelerate training mechanism #124

Closed
shenhuinuist opened this issue Sep 27, 2016 · 2 comments
Closed

Comments

@shenhuinuist
Copy link

According to Paddle's documents, sparse training is usually used to accelerate calculation when input is sparse data with highly dimension, and sparse update is not applicable to dense input.
I find Paddle speed up matrix multiplication by calling external math libraries. Is there any accelerate calculation mechanism or accelerate training mechanism applied in Paddle especially when input is dense ? Can you show me more details?

@reyoung
Copy link
Collaborator

reyoung commented Sep 27, 2016

Basically, no special optimization for dense matrix.

In details, some AVX and SSE code has been written for merge gradient. In Baidu, we use MKL to calculate dense matrix. It is hard to write code faster than MKL in general.

The opensource version supports MKL, too. But you should buy the MKL license to use it. Maybe the student license is better for school usages.

@shenhuinuist
Copy link
Author

@reyoung Thank you so much!

@reyoung reyoung closed this as completed Sep 28, 2016
thisjiang pushed a commit to thisjiang/Paddle that referenced this issue Oct 28, 2021
gglin001 added a commit to graphcore/Paddle-fork that referenced this issue Dec 8, 2021
zhoutianzi666 pushed a commit to zhoutianzi666/Paddle that referenced this issue May 23, 2022
DesmonDay pushed a commit to DesmonDay/Paddle that referenced this issue Sep 23, 2022
seemingwang added a commit to seemingwang/Paddle that referenced this issue Oct 23, 2022
test new sample

optimize thrust alloc (PaddlePaddle#112)

fix deepwalk sample kernel (PaddlePaddle#122)

Update graphsage speed(thrust -> cub), fix sample async bug (PaddlePaddle#120)

* fix deepwalk sample kernel, fix inplace kernel sync bug

* update v2 sample

* change kernel name for readability

* delete unused kernel

support slot_feature with different length (PaddlePaddle#124)

Co-authored-by: root <[email protected]>

add graphsage slot feature (PaddlePaddle#126)

【graphsage】don't alloc a new d_feature_buf if the old one is enough (PaddlePaddle#128)

* add graphsage slot feature

* update graphsage slot feature

* update graphsage slot feature

fix linking

use type optimization

remove file

add type-optimization config

fix bug in slot_feature (PaddlePaddle#134)

Co-authored-by: root <[email protected]>

sage network optimization

remove log

fix bug in slot_feature (PaddlePaddle#134)

Co-authored-by: root <[email protected]>
zmxdream pushed a commit to zmxdream/Paddle that referenced this issue Dec 7, 2022
qizhaoaoe pushed a commit to qizhaoaoe/Paddle that referenced this issue Mar 3, 2023
ViT-B/16 finetune, Top1 Acc: 0.7805, while ViT paper was 0.7791. There is diff:
lr from 0.03 to 0.003
add 0.0001 weight_decay
global gradient clip from 1.0 to 0.5
ViT-L/16 finetune, Top1 Acc 85.03% based on JAX checkpoint, while ViT GitHub was 85.05%.
add 0.0001 weight decay for Momentum optimizer
jack603047588 pushed a commit to jiaoxuewu/PaddleBox that referenced this issue Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants