Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to fix MultinomialSampler #102

Merged

Conversation

reyoung
Copy link
Collaborator

@reyoung reyoung commented Sep 21, 2016

No description provided.

intervals_[bigPos].thresh -= 1 - intervals_[smallPos].thresh;
smallPos = nextSmallPos(smallPos + 1);
}
bigPos = nextBigPos(bigPos + 1);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, may shift the bigPos when intervals_[bigPos].thresh > 1 and smallPos >= size.


fillIntervals();
smallPos = nextSmallPos(0);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And always reset small pos because the big one before maybe small.

break;
}
}
}


TEST(MultinomialSampler, larger_then_1) {
std::vector<int> probs = { 1, 100, 100, 1, 1};
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old code will fails when use this test case

}

for (size_t i=0; i < probs.size(); ++i) {
CHECK_LE(std::abs(cnt[i] - probs[i]), 1);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a +- 1 error here.

@reyoung reyoung force-pushed the add_moltinomial_sampler_unittest branch from b694f42 to 02c6cdf Compare September 21, 2016 14:11

fillIntervals();
if (intervals_[bigPos].thresh < 1) {
bigPos = nextBigPos(0);
Copy link
Collaborator

@emailweixu emailweixu Sep 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make the complexity O(n^2). The original complexity is O(n) every step either bigPos or smallPos will increase.
Should be:
if (intervals_[bigPos].thresh <=1 ) {
bigPos = nextBigPos(bigPos + 1);
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@reyoung reyoung force-pushed the add_moltinomial_sampler_unittest branch 3 times, most recently from 0c43435 to 02c6cdf Compare September 22, 2016 12:14
* Also refine unittest to multiple iteration to prevent luckily random number.
* Remove unused unittest before.
@reyoung reyoung force-pushed the add_moltinomial_sampler_unittest branch from 47eda41 to f63e641 Compare September 22, 2016 12:59
@reyoung
Copy link
Collaborator Author

reyoung commented Sep 22, 2016

@emailweixu Test many time to prevent lucky random number.

intervals_[smallPos].otherId = bigPos;
intervals_[bigPos].thresh -= 1 - intervals_[smallPos].thresh;
smallPos = nextSmallPos(smallPos + 1);
}
Copy link
Collaborator

@emailweixu emailweixu Sep 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a line here should fix. no other changes are needed. ("smallPos < size" at line 53 can be removed)
if (smallPos >= size) break;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Please use squash & merge to merge it.

// the big interval becomes a small interval.
bigPos = nextBigPos(bigPos + 1);
}
smallPos = nextSmallPos(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line makes complexity O(n^2)

@emailweixu emailweixu merged commit 7eb29f2 into PaddlePaddle:master Sep 23, 2016
@reyoung reyoung deleted the add_moltinomial_sampler_unittest branch September 23, 2016 05:48
zhhsplendid pushed a commit to zhhsplendid/Paddle that referenced this pull request Sep 25, 2019
thisjiang pushed a commit to thisjiang/Paddle that referenced this pull request Oct 28, 2021
wangxicoding pushed a commit to wangxicoding/Paddle that referenced this pull request Dec 9, 2021
zhoutianzi666 pushed a commit to zhoutianzi666/Paddle that referenced this pull request May 23, 2022
Thunderbrook added a commit to Thunderbrook/Paddle that referenced this pull request Sep 8, 2022
AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this pull request Sep 19, 2022
* add animeganv2 network and dataset
* animegan:refine code,add License
Co-authored-by: qingqing01 <[email protected]>
qingshui referenced this pull request in qingshui/Paddle Nov 14, 2022
* Optimizing the zero key problem in the push phase

* Optimize CUDA thread parallelism in MergeGrad phase

* Optimize CUDA thread parallelism in MergeGrad phase

* Performance optimization, segment gradient merging

* Performance optimization, segment gradient merging

* Optimize pullsparse and increase keys aggregation

* sync gpugraph to gpugraph_v2 (#86)

* change load node and edge from local to cpu (#83)

* change load node and edge

* remove useless code

Co-authored-by: root <[email protected]>

* extract pull sparse as single stage(#85)

Co-authored-by: yangjunchao <[email protected]>

Co-authored-by: miaoli06 <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: chao9527 <[email protected]>
Co-authored-by: yangjunchao <[email protected]>

* [GPUGraph] graph sample v2 (#87)

* change load node and edge from local to cpu (#83)

* change load node and edge

* remove useless code

Co-authored-by: root <[email protected]>

* extract pull sparse as single stage(#85)

Co-authored-by: yangjunchao <[email protected]>

* support ssdsparsetable;test=develop (#81)

* graph sample v2

* remove log

Co-authored-by: miaoli06 <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: chao9527 <[email protected]>
Co-authored-by: yangjunchao <[email protected]>
Co-authored-by: danleifeng <[email protected]>

* Release cpu graph

* uniq nodeid (#89)

* compatible whole HBM mode (#91)

Co-authored-by: yangjunchao <[email protected]>

* Gpugraph v2 (#93)

* compatible whole HBM mode

* unify flag for graph emd storage mode and graph struct storage mode

* format

Co-authored-by: yangjunchao <[email protected]>

* split generate batch into multi stage (#92)

* split generate batch into multi stage

* fix conflict

Co-authored-by: root <[email protected]>

* [GpuGraph] Uniq feature (#95)

* uniq feature

* uniq feature

* uniq feature

* [GpuGraph]  global startid (#98)

* uniq feature

* uniq feature

* uniq feature

* global startid

* load node edge seperately and release graph (#99)

* load node edge seperately and release graph

* load node edge seperately and release graph

Co-authored-by: root <[email protected]>

* v2 infer (#102)

* optimize begin pass and end pass (#106)

Co-authored-by: yangjunchao <[email protected]>

* fix ins no (#104)

* [GPUGraph] fix FillOneStep args (#107)

* fix ins no

* fix FillOnestep args

* fix bug for whole hbm mode (#110)

Co-authored-by: yangjunchao <[email protected]>

* [GPUGraph] fix infer && add infer_table_cap (#108)

* fix ins no

* fix FillOnestep args

* fix infer && add infer table cap

* fix infer

* 【PSCORE】perform ssd sparse table  (#111)

* perform ssd sparsetable;test=develop

Conflicts:
	paddle/fluid/framework/fleet/ps_gpu_wrapper.cc

* perform ssd sparsetable;test=develop

* remove debug code;

* remove debug code;

* add jemalloc cmake;test=develop

* fix wrapper;test=develop

* fix sample core (#114)

* [GpuGraph] optimize shuffle batch (#115)

* fix sample core

* optimize shuffle batch

* release gpu mem when sample end (#116)

Co-authored-by: root <[email protected]>

* fix class not found err (PaddlePaddle#118)

Co-authored-by: root <[email protected]>

* optimize sample (PaddlePaddle#117)

* optimize sample

* optimize sample

Co-authored-by: yangjunchao <[email protected]>

* fix clear gpu mem (PaddlePaddle#119)

Co-authored-by: root <[email protected]>

* fix sample core (PaddlePaddle#121)

Co-authored-by: yangjunchao <[email protected]>

* add ssd cache (PaddlePaddle#123)

* add ssd cache;test=develop

* add ssd cache;test=develop

* add ssd cache;test=develop

* add multi epoch train & fix train table change ins & save infer embeding  (PaddlePaddle#129)

* add multi epoch train & fix train table change ins & save infer embedding

* change epoch finish judge

* change epoch finish change

Co-authored-by: root <[email protected]>

* Add debug log (PaddlePaddle#131)

* Add debug log

* Add debug log

Co-authored-by: root <[email protected]>

* optimize mem in  uniq slot feature (PaddlePaddle#130)

* [GpuGraph] cherry pick var slot feature && fix load multi path node (PaddlePaddle#136)

* optimize mem in  uniq slot feature

* cherry-pick var slot_feature

Co-authored-by: huwei02 <[email protected]>

* [GpuGraph] fix kernel overflow (PaddlePaddle#138)

* optimize mem in  uniq slot feature

* cherry-pick var slot_feature

* fix kernel overflow && add max feature num flag

Co-authored-by: huwei02 <[email protected]>

* fix ssd cache;test=develop (PaddlePaddle#139)

* slot feature secondary storage (PaddlePaddle#140)

* slot feature secondary storage

* slot feature secondary storage

Co-authored-by: yangjunchao <[email protected]>

Co-authored-by: root <[email protected]>
Co-authored-by: xuewujiao <[email protected]>
Co-authored-by: miaoli06 <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: chao9527 <[email protected]>
Co-authored-by: yangjunchao <[email protected]>
Co-authored-by: Thunderbrook <[email protected]>
Co-authored-by: danleifeng <[email protected]>
Co-authored-by: huwei02 <[email protected]>
zmxdream pushed a commit to zmxdream/Paddle that referenced this pull request Dec 7, 2022
* Optimizing the zero key problem in the push phase

* Optimize CUDA thread parallelism in MergeGrad phase

* Optimize CUDA thread parallelism in MergeGrad phase

* Performance optimization, segment gradient merging

* Performance optimization, segment gradient merging

* Optimize pullsparse and increase keys aggregation

* sync gpugraph to gpugraph_v2 (PaddlePaddle#86)

* change load node and edge from local to cpu (PaddlePaddle#83)

* change load node and edge

* remove useless code

Co-authored-by: root <[email protected]>

* extract pull sparse as single stage(PaddlePaddle#85)

Co-authored-by: yangjunchao <[email protected]>

Co-authored-by: miaoli06 <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: chao9527 <[email protected]>
Co-authored-by: yangjunchao <[email protected]>

* [GPUGraph] graph sample v2 (PaddlePaddle#87)

* change load node and edge from local to cpu (PaddlePaddle#83)

* change load node and edge

* remove useless code

Co-authored-by: root <[email protected]>

* extract pull sparse as single stage(PaddlePaddle#85)

Co-authored-by: yangjunchao <[email protected]>

* support ssdsparsetable;test=develop (PaddlePaddle#81)

* graph sample v2

* remove log

Co-authored-by: miaoli06 <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: chao9527 <[email protected]>
Co-authored-by: yangjunchao <[email protected]>
Co-authored-by: danleifeng <[email protected]>

* Release cpu graph

* uniq nodeid (PaddlePaddle#89)

* compatible whole HBM mode (PaddlePaddle#91)

Co-authored-by: yangjunchao <[email protected]>

* Gpugraph v2 (PaddlePaddle#93)

* compatible whole HBM mode

* unify flag for graph emd storage mode and graph struct storage mode

* format

Co-authored-by: yangjunchao <[email protected]>

* split generate batch into multi stage (PaddlePaddle#92)

* split generate batch into multi stage

* fix conflict

Co-authored-by: root <[email protected]>

* [GpuGraph] Uniq feature (PaddlePaddle#95)

* uniq feature

* uniq feature

* uniq feature

* [GpuGraph]  global startid (PaddlePaddle#98)

* uniq feature

* uniq feature

* uniq feature

* global startid

* load node edge seperately and release graph (PaddlePaddle#99)

* load node edge seperately and release graph

* load node edge seperately and release graph

Co-authored-by: root <[email protected]>

* v2 infer (PaddlePaddle#102)

* optimize begin pass and end pass (PaddlePaddle#106)

Co-authored-by: yangjunchao <[email protected]>

* fix ins no (PaddlePaddle#104)

* [GPUGraph] fix FillOneStep args (PaddlePaddle#107)

* fix ins no

* fix FillOnestep args

* fix bug for whole hbm mode (PaddlePaddle#110)

Co-authored-by: yangjunchao <[email protected]>

* [GPUGraph] fix infer && add infer_table_cap (PaddlePaddle#108)

* fix ins no

* fix FillOnestep args

* fix infer && add infer table cap

* fix infer

* 【PSCORE】perform ssd sparse table  (PaddlePaddle#111)

* perform ssd sparsetable;test=develop

Conflicts:
	paddle/fluid/framework/fleet/ps_gpu_wrapper.cc

* perform ssd sparsetable;test=develop

* remove debug code;

* remove debug code;

* add jemalloc cmake;test=develop

* fix wrapper;test=develop

* fix sample core (PaddlePaddle#114)

* [GpuGraph] optimize shuffle batch (PaddlePaddle#115)

* fix sample core

* optimize shuffle batch

* release gpu mem when sample end (PaddlePaddle#116)

Co-authored-by: root <[email protected]>

* fix class not found err (PaddlePaddle#118)

Co-authored-by: root <[email protected]>

* optimize sample (PaddlePaddle#117)

* optimize sample

* optimize sample

Co-authored-by: yangjunchao <[email protected]>

* fix clear gpu mem (PaddlePaddle#119)

Co-authored-by: root <[email protected]>

* fix sample core (PaddlePaddle#121)

Co-authored-by: yangjunchao <[email protected]>

* add ssd cache (PaddlePaddle#123)

* add ssd cache;test=develop

* add ssd cache;test=develop

* add ssd cache;test=develop

* add multi epoch train & fix train table change ins & save infer embeding  (PaddlePaddle#129)

* add multi epoch train & fix train table change ins & save infer embedding

* change epoch finish judge

* change epoch finish change

Co-authored-by: root <[email protected]>

* Add debug log (PaddlePaddle#131)

* Add debug log

* Add debug log

Co-authored-by: root <[email protected]>

* optimize mem in  uniq slot feature (PaddlePaddle#130)

* [GpuGraph] cherry pick var slot feature && fix load multi path node (PaddlePaddle#136)

* optimize mem in  uniq slot feature

* cherry-pick var slot_feature

Co-authored-by: huwei02 <[email protected]>

* [GpuGraph] fix kernel overflow (PaddlePaddle#138)

* optimize mem in  uniq slot feature

* cherry-pick var slot_feature

* fix kernel overflow && add max feature num flag

Co-authored-by: huwei02 <[email protected]>

* fix ssd cache;test=develop (PaddlePaddle#139)

* slot feature secondary storage (PaddlePaddle#140)

* slot feature secondary storage

* slot feature secondary storage

Co-authored-by: yangjunchao <[email protected]>

Co-authored-by: root <[email protected]>
Co-authored-by: xuewujiao <[email protected]>
Co-authored-by: miaoli06 <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: chao9527 <[email protected]>
Co-authored-by: yangjunchao <[email protected]>
Co-authored-by: Thunderbrook <[email protected]>
Co-authored-by: danleifeng <[email protected]>
Co-authored-by: huwei02 <[email protected]>
AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this pull request Dec 6, 2023
fixed cross attention typeerror
laipaang added a commit to laipaang/Paddle that referenced this pull request Jan 16, 2024
beam support 20/30 and fused_multi_transformer_int8 keep fp32
lizexu123 pushed a commit to lizexu123/Paddle that referenced this pull request Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants