-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to fix MultinomialSampler #102
Try to fix MultinomialSampler #102
Conversation
intervals_[bigPos].thresh -= 1 - intervals_[smallPos].thresh; | ||
smallPos = nextSmallPos(smallPos + 1); | ||
} | ||
bigPos = nextBigPos(bigPos + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, may shift the bigPos when intervals_[bigPos].thresh > 1 and smallPos >= size.
|
||
fillIntervals(); | ||
smallPos = nextSmallPos(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And always reset small pos because the big one before maybe small.
break; | ||
} | ||
} | ||
} | ||
|
||
|
||
TEST(MultinomialSampler, larger_then_1) { | ||
std::vector<int> probs = { 1, 100, 100, 1, 1}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old code will fails when use this test case
} | ||
|
||
for (size_t i=0; i < probs.size(); ++i) { | ||
CHECK_LE(std::abs(cnt[i] - probs[i]), 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a +- 1 error here.
b694f42
to
02c6cdf
Compare
|
||
fillIntervals(); | ||
if (intervals_[bigPos].thresh < 1) { | ||
bigPos = nextBigPos(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will make the complexity O(n^2). The original complexity is O(n) every step either bigPos or smallPos will increase.
Should be:
if (intervals_[bigPos].thresh <=1 ) {
bigPos = nextBigPos(bigPos + 1);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
0c43435
to
02c6cdf
Compare
* Also refine unittest to multiple iteration to prevent luckily random number. * Remove unused unittest before.
47eda41
to
f63e641
Compare
@emailweixu Test many time to prevent lucky random number. |
intervals_[smallPos].otherId = bigPos; | ||
intervals_[bigPos].thresh -= 1 - intervals_[smallPos].thresh; | ||
smallPos = nextSmallPos(smallPos + 1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a line here should fix. no other changes are needed. ("smallPos < size" at line 53 can be removed)
if (smallPos >= size) break;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Please use squash & merge
to merge it.
// the big interval becomes a small interval. | ||
bigPos = nextBigPos(bigPos + 1); | ||
} | ||
smallPos = nextSmallPos(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line makes complexity O(n^2)
…tedoc Update contribute_to_paddle.md
resnet50 demo support rocm
* add animeganv2 network and dataset * animegan:refine code,add License Co-authored-by: qingqing01 <[email protected]>
* Optimizing the zero key problem in the push phase * Optimize CUDA thread parallelism in MergeGrad phase * Optimize CUDA thread parallelism in MergeGrad phase * Performance optimization, segment gradient merging * Performance optimization, segment gradient merging * Optimize pullsparse and increase keys aggregation * sync gpugraph to gpugraph_v2 (#86) * change load node and edge from local to cpu (#83) * change load node and edge * remove useless code Co-authored-by: root <[email protected]> * extract pull sparse as single stage(#85) Co-authored-by: yangjunchao <[email protected]> Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> * [GPUGraph] graph sample v2 (#87) * change load node and edge from local to cpu (#83) * change load node and edge * remove useless code Co-authored-by: root <[email protected]> * extract pull sparse as single stage(#85) Co-authored-by: yangjunchao <[email protected]> * support ssdsparsetable;test=develop (#81) * graph sample v2 * remove log Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> Co-authored-by: danleifeng <[email protected]> * Release cpu graph * uniq nodeid (#89) * compatible whole HBM mode (#91) Co-authored-by: yangjunchao <[email protected]> * Gpugraph v2 (#93) * compatible whole HBM mode * unify flag for graph emd storage mode and graph struct storage mode * format Co-authored-by: yangjunchao <[email protected]> * split generate batch into multi stage (#92) * split generate batch into multi stage * fix conflict Co-authored-by: root <[email protected]> * [GpuGraph] Uniq feature (#95) * uniq feature * uniq feature * uniq feature * [GpuGraph] global startid (#98) * uniq feature * uniq feature * uniq feature * global startid * load node edge seperately and release graph (#99) * load node edge seperately and release graph * load node edge seperately and release graph Co-authored-by: root <[email protected]> * v2 infer (#102) * optimize begin pass and end pass (#106) Co-authored-by: yangjunchao <[email protected]> * fix ins no (#104) * [GPUGraph] fix FillOneStep args (#107) * fix ins no * fix FillOnestep args * fix bug for whole hbm mode (#110) Co-authored-by: yangjunchao <[email protected]> * [GPUGraph] fix infer && add infer_table_cap (#108) * fix ins no * fix FillOnestep args * fix infer && add infer table cap * fix infer * 【PSCORE】perform ssd sparse table (#111) * perform ssd sparsetable;test=develop Conflicts: paddle/fluid/framework/fleet/ps_gpu_wrapper.cc * perform ssd sparsetable;test=develop * remove debug code; * remove debug code; * add jemalloc cmake;test=develop * fix wrapper;test=develop * fix sample core (#114) * [GpuGraph] optimize shuffle batch (#115) * fix sample core * optimize shuffle batch * release gpu mem when sample end (#116) Co-authored-by: root <[email protected]> * fix class not found err (PaddlePaddle#118) Co-authored-by: root <[email protected]> * optimize sample (PaddlePaddle#117) * optimize sample * optimize sample Co-authored-by: yangjunchao <[email protected]> * fix clear gpu mem (PaddlePaddle#119) Co-authored-by: root <[email protected]> * fix sample core (PaddlePaddle#121) Co-authored-by: yangjunchao <[email protected]> * add ssd cache (PaddlePaddle#123) * add ssd cache;test=develop * add ssd cache;test=develop * add ssd cache;test=develop * add multi epoch train & fix train table change ins & save infer embeding (PaddlePaddle#129) * add multi epoch train & fix train table change ins & save infer embedding * change epoch finish judge * change epoch finish change Co-authored-by: root <[email protected]> * Add debug log (PaddlePaddle#131) * Add debug log * Add debug log Co-authored-by: root <[email protected]> * optimize mem in uniq slot feature (PaddlePaddle#130) * [GpuGraph] cherry pick var slot feature && fix load multi path node (PaddlePaddle#136) * optimize mem in uniq slot feature * cherry-pick var slot_feature Co-authored-by: huwei02 <[email protected]> * [GpuGraph] fix kernel overflow (PaddlePaddle#138) * optimize mem in uniq slot feature * cherry-pick var slot_feature * fix kernel overflow && add max feature num flag Co-authored-by: huwei02 <[email protected]> * fix ssd cache;test=develop (PaddlePaddle#139) * slot feature secondary storage (PaddlePaddle#140) * slot feature secondary storage * slot feature secondary storage Co-authored-by: yangjunchao <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: xuewujiao <[email protected]> Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> Co-authored-by: Thunderbrook <[email protected]> Co-authored-by: danleifeng <[email protected]> Co-authored-by: huwei02 <[email protected]>
* Optimizing the zero key problem in the push phase * Optimize CUDA thread parallelism in MergeGrad phase * Optimize CUDA thread parallelism in MergeGrad phase * Performance optimization, segment gradient merging * Performance optimization, segment gradient merging * Optimize pullsparse and increase keys aggregation * sync gpugraph to gpugraph_v2 (PaddlePaddle#86) * change load node and edge from local to cpu (PaddlePaddle#83) * change load node and edge * remove useless code Co-authored-by: root <[email protected]> * extract pull sparse as single stage(PaddlePaddle#85) Co-authored-by: yangjunchao <[email protected]> Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> * [GPUGraph] graph sample v2 (PaddlePaddle#87) * change load node and edge from local to cpu (PaddlePaddle#83) * change load node and edge * remove useless code Co-authored-by: root <[email protected]> * extract pull sparse as single stage(PaddlePaddle#85) Co-authored-by: yangjunchao <[email protected]> * support ssdsparsetable;test=develop (PaddlePaddle#81) * graph sample v2 * remove log Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> Co-authored-by: danleifeng <[email protected]> * Release cpu graph * uniq nodeid (PaddlePaddle#89) * compatible whole HBM mode (PaddlePaddle#91) Co-authored-by: yangjunchao <[email protected]> * Gpugraph v2 (PaddlePaddle#93) * compatible whole HBM mode * unify flag for graph emd storage mode and graph struct storage mode * format Co-authored-by: yangjunchao <[email protected]> * split generate batch into multi stage (PaddlePaddle#92) * split generate batch into multi stage * fix conflict Co-authored-by: root <[email protected]> * [GpuGraph] Uniq feature (PaddlePaddle#95) * uniq feature * uniq feature * uniq feature * [GpuGraph] global startid (PaddlePaddle#98) * uniq feature * uniq feature * uniq feature * global startid * load node edge seperately and release graph (PaddlePaddle#99) * load node edge seperately and release graph * load node edge seperately and release graph Co-authored-by: root <[email protected]> * v2 infer (PaddlePaddle#102) * optimize begin pass and end pass (PaddlePaddle#106) Co-authored-by: yangjunchao <[email protected]> * fix ins no (PaddlePaddle#104) * [GPUGraph] fix FillOneStep args (PaddlePaddle#107) * fix ins no * fix FillOnestep args * fix bug for whole hbm mode (PaddlePaddle#110) Co-authored-by: yangjunchao <[email protected]> * [GPUGraph] fix infer && add infer_table_cap (PaddlePaddle#108) * fix ins no * fix FillOnestep args * fix infer && add infer table cap * fix infer * 【PSCORE】perform ssd sparse table (PaddlePaddle#111) * perform ssd sparsetable;test=develop Conflicts: paddle/fluid/framework/fleet/ps_gpu_wrapper.cc * perform ssd sparsetable;test=develop * remove debug code; * remove debug code; * add jemalloc cmake;test=develop * fix wrapper;test=develop * fix sample core (PaddlePaddle#114) * [GpuGraph] optimize shuffle batch (PaddlePaddle#115) * fix sample core * optimize shuffle batch * release gpu mem when sample end (PaddlePaddle#116) Co-authored-by: root <[email protected]> * fix class not found err (PaddlePaddle#118) Co-authored-by: root <[email protected]> * optimize sample (PaddlePaddle#117) * optimize sample * optimize sample Co-authored-by: yangjunchao <[email protected]> * fix clear gpu mem (PaddlePaddle#119) Co-authored-by: root <[email protected]> * fix sample core (PaddlePaddle#121) Co-authored-by: yangjunchao <[email protected]> * add ssd cache (PaddlePaddle#123) * add ssd cache;test=develop * add ssd cache;test=develop * add ssd cache;test=develop * add multi epoch train & fix train table change ins & save infer embeding (PaddlePaddle#129) * add multi epoch train & fix train table change ins & save infer embedding * change epoch finish judge * change epoch finish change Co-authored-by: root <[email protected]> * Add debug log (PaddlePaddle#131) * Add debug log * Add debug log Co-authored-by: root <[email protected]> * optimize mem in uniq slot feature (PaddlePaddle#130) * [GpuGraph] cherry pick var slot feature && fix load multi path node (PaddlePaddle#136) * optimize mem in uniq slot feature * cherry-pick var slot_feature Co-authored-by: huwei02 <[email protected]> * [GpuGraph] fix kernel overflow (PaddlePaddle#138) * optimize mem in uniq slot feature * cherry-pick var slot_feature * fix kernel overflow && add max feature num flag Co-authored-by: huwei02 <[email protected]> * fix ssd cache;test=develop (PaddlePaddle#139) * slot feature secondary storage (PaddlePaddle#140) * slot feature secondary storage * slot feature secondary storage Co-authored-by: yangjunchao <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: xuewujiao <[email protected]> Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> Co-authored-by: Thunderbrook <[email protected]> Co-authored-by: danleifeng <[email protected]> Co-authored-by: huwei02 <[email protected]>
fixed cross attention typeerror
beam support 20/30 and fused_multi_transformer_int8 keep fp32
No description provided.