Try to fix MultinomialSampler #102

reyoung · 2016-09-21T10:25:35Z

No description provided.

reyoung · 2016-09-21T10:26:28Z

paddle/gserver/layers/MultinomialSampler.cpp

-        intervals_[bigPos].thresh -= 1 - intervals_[smallPos].thresh;
-        smallPos = nextSmallPos(smallPos + 1);
-      }
-      bigPos = nextBigPos(bigPos + 1);


Here, may shift the bigPos when intervals_[bigPos].thresh > 1 and smallPos >= size.

reyoung · 2016-09-21T10:27:29Z

paddle/gserver/layers/MultinomialSampler.cpp


-  fillIntervals();
+    smallPos = nextSmallPos(0);


And always reset small pos because the big one before maybe small.

reyoung · 2016-09-21T10:27:48Z

paddle/gserver/tests/test_MultinomialSampler.cpp

      break;
    }
  }
 }

+
+TEST(MultinomialSampler, larger_then_1) {
+  std::vector<int> probs = { 1, 100, 100, 1, 1};


Old code will fails when use this test case

reyoung · 2016-09-21T10:28:25Z

paddle/gserver/tests/test_MultinomialSampler.cpp

+  }
+
+  for (size_t i=0; i < probs.size(); ++i) {
+    CHECK_LE(std::abs(cnt[i] - probs[i]), 1);


there is a +- 1 error here.

emailweixu · 2016-09-21T17:30:15Z

paddle/gserver/layers/MultinomialSampler.cpp

-
-  fillIntervals();
+    if (intervals_[bigPos].thresh < 1) {
+      bigPos = nextBigPos(0);


This will make the complexity O(n^2). The original complexity is O(n) every step either bigPos or smallPos will increase.
Should be:
if (intervals_[bigPos].thresh <=1 ) {
bigPos = nextBigPos(bigPos + 1);
}

* Also refine unittest to multiple iteration to prevent luckily random number. * Remove unused unittest before.

reyoung · 2016-09-22T13:03:39Z

@emailweixu Test many time to prevent lucky random number.

emailweixu · 2016-09-22T19:48:50Z

paddle/gserver/layers/MultinomialSampler.cpp

-        intervals_[smallPos].otherId = bigPos;
-        intervals_[bigPos].thresh -= 1 - intervals_[smallPos].thresh;
-        smallPos = nextSmallPos(smallPos + 1);
-      }


Add a line here should fix. no other changes are needed. ("smallPos < size" at line 53 can be removed)
if (smallPos >= size) break;

Done. Please use squash & merge to merge it.

emailweixu · 2016-09-22T23:50:28Z

paddle/gserver/layers/MultinomialSampler.cpp

+      // the big interval becomes a small interval.
+      bigPos = nextBigPos(bigPos + 1);
+    }
+    smallPos = nextSmallPos(0);


This line makes complexity O(n^2)

…tedoc Update contribute_to_paddle.md

resnet50 demo support rocm

* add animeganv2 network and dataset * animegan:refine code,add License Co-authored-by: qingqing01 <[email protected]>

* Optimizing the zero key problem in the push phase * Optimize CUDA thread parallelism in MergeGrad phase * Optimize CUDA thread parallelism in MergeGrad phase * Performance optimization, segment gradient merging * Performance optimization, segment gradient merging * Optimize pullsparse and increase keys aggregation * sync gpugraph to gpugraph_v2 (#86) * change load node and edge from local to cpu (#83) * change load node and edge * remove useless code Co-authored-by: root <[email protected]> * extract pull sparse as single stage(#85) Co-authored-by: yangjunchao <[email protected]> Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> * [GPUGraph] graph sample v2 (#87) * change load node and edge from local to cpu (#83) * change load node and edge * remove useless code Co-authored-by: root <[email protected]> * extract pull sparse as single stage(#85) Co-authored-by: yangjunchao <[email protected]> * support ssdsparsetable;test=develop (#81) * graph sample v2 * remove log Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> Co-authored-by: danleifeng <[email protected]> * Release cpu graph * uniq nodeid (#89) * compatible whole HBM mode (#91) Co-authored-by: yangjunchao <[email protected]> * Gpugraph v2 (#93) * compatible whole HBM mode * unify flag for graph emd storage mode and graph struct storage mode * format Co-authored-by: yangjunchao <[email protected]> * split generate batch into multi stage (#92) * split generate batch into multi stage * fix conflict Co-authored-by: root <[email protected]> * [GpuGraph] Uniq feature (#95) * uniq feature * uniq feature * uniq feature * [GpuGraph] global startid (#98) * uniq feature * uniq feature * uniq feature * global startid * load node edge seperately and release graph (#99) * load node edge seperately and release graph * load node edge seperately and release graph Co-authored-by: root <[email protected]> * v2 infer (#102) * optimize begin pass and end pass (#106) Co-authored-by: yangjunchao <[email protected]> * fix ins no (#104) * [GPUGraph] fix FillOneStep args (#107) * fix ins no * fix FillOnestep args * fix bug for whole hbm mode (#110) Co-authored-by: yangjunchao <[email protected]> * [GPUGraph] fix infer && add infer_table_cap (#108) * fix ins no * fix FillOnestep args * fix infer && add infer table cap * fix infer * 【PSCORE】perform ssd sparse table (#111) * perform ssd sparsetable;test=develop Conflicts: paddle/fluid/framework/fleet/ps_gpu_wrapper.cc * perform ssd sparsetable;test=develop * remove debug code; * remove debug code; * add jemalloc cmake;test=develop * fix wrapper;test=develop * fix sample core (#114) * [GpuGraph] optimize shuffle batch (#115) * fix sample core * optimize shuffle batch * release gpu mem when sample end (#116) Co-authored-by: root <[email protected]> * fix class not found err (PaddlePaddle#118) Co-authored-by: root <[email protected]> * optimize sample (PaddlePaddle#117) * optimize sample * optimize sample Co-authored-by: yangjunchao <[email protected]> * fix clear gpu mem (PaddlePaddle#119) Co-authored-by: root <[email protected]> * fix sample core (PaddlePaddle#121) Co-authored-by: yangjunchao <[email protected]> * add ssd cache (PaddlePaddle#123) * add ssd cache;test=develop * add ssd cache;test=develop * add ssd cache;test=develop * add multi epoch train & fix train table change ins & save infer embeding (PaddlePaddle#129) * add multi epoch train & fix train table change ins & save infer embedding * change epoch finish judge * change epoch finish change Co-authored-by: root <[email protected]> * Add debug log (PaddlePaddle#131) * Add debug log * Add debug log Co-authored-by: root <[email protected]> * optimize mem in uniq slot feature (PaddlePaddle#130) * [GpuGraph] cherry pick var slot feature && fix load multi path node (PaddlePaddle#136) * optimize mem in uniq slot feature * cherry-pick var slot_feature Co-authored-by: huwei02 <[email protected]> * [GpuGraph] fix kernel overflow (PaddlePaddle#138) * optimize mem in uniq slot feature * cherry-pick var slot_feature * fix kernel overflow && add max feature num flag Co-authored-by: huwei02 <[email protected]> * fix ssd cache;test=develop (PaddlePaddle#139) * slot feature secondary storage (PaddlePaddle#140) * slot feature secondary storage * slot feature secondary storage Co-authored-by: yangjunchao <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: xuewujiao <[email protected]> Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> Co-authored-by: Thunderbrook <[email protected]> Co-authored-by: danleifeng <[email protected]> Co-authored-by: huwei02 <[email protected]>

* Optimizing the zero key problem in the push phase * Optimize CUDA thread parallelism in MergeGrad phase * Optimize CUDA thread parallelism in MergeGrad phase * Performance optimization, segment gradient merging * Performance optimization, segment gradient merging * Optimize pullsparse and increase keys aggregation * sync gpugraph to gpugraph_v2 (PaddlePaddle#86) * change load node and edge from local to cpu (PaddlePaddle#83) * change load node and edge * remove useless code Co-authored-by: root <[email protected]> * extract pull sparse as single stage(PaddlePaddle#85) Co-authored-by: yangjunchao <[email protected]> Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> * [GPUGraph] graph sample v2 (PaddlePaddle#87) * change load node and edge from local to cpu (PaddlePaddle#83) * change load node and edge * remove useless code Co-authored-by: root <[email protected]> * extract pull sparse as single stage(PaddlePaddle#85) Co-authored-by: yangjunchao <[email protected]> * support ssdsparsetable;test=develop (PaddlePaddle#81) * graph sample v2 * remove log Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> Co-authored-by: danleifeng <[email protected]> * Release cpu graph * uniq nodeid (PaddlePaddle#89) * compatible whole HBM mode (PaddlePaddle#91) Co-authored-by: yangjunchao <[email protected]> * Gpugraph v2 (PaddlePaddle#93) * compatible whole HBM mode * unify flag for graph emd storage mode and graph struct storage mode * format Co-authored-by: yangjunchao <[email protected]> * split generate batch into multi stage (PaddlePaddle#92) * split generate batch into multi stage * fix conflict Co-authored-by: root <[email protected]> * [GpuGraph] Uniq feature (PaddlePaddle#95) * uniq feature * uniq feature * uniq feature * [GpuGraph] global startid (PaddlePaddle#98) * uniq feature * uniq feature * uniq feature * global startid * load node edge seperately and release graph (PaddlePaddle#99) * load node edge seperately and release graph * load node edge seperately and release graph Co-authored-by: root <[email protected]> * v2 infer (PaddlePaddle#102) * optimize begin pass and end pass (PaddlePaddle#106) Co-authored-by: yangjunchao <[email protected]> * fix ins no (PaddlePaddle#104) * [GPUGraph] fix FillOneStep args (PaddlePaddle#107) * fix ins no * fix FillOnestep args * fix bug for whole hbm mode (PaddlePaddle#110) Co-authored-by: yangjunchao <[email protected]> * [GPUGraph] fix infer && add infer_table_cap (PaddlePaddle#108) * fix ins no * fix FillOnestep args * fix infer && add infer table cap * fix infer * 【PSCORE】perform ssd sparse table (PaddlePaddle#111) * perform ssd sparsetable;test=develop Conflicts: paddle/fluid/framework/fleet/ps_gpu_wrapper.cc * perform ssd sparsetable;test=develop * remove debug code; * remove debug code; * add jemalloc cmake;test=develop * fix wrapper;test=develop * fix sample core (PaddlePaddle#114) * [GpuGraph] optimize shuffle batch (PaddlePaddle#115) * fix sample core * optimize shuffle batch * release gpu mem when sample end (PaddlePaddle#116) Co-authored-by: root <[email protected]> * fix class not found err (PaddlePaddle#118) Co-authored-by: root <[email protected]> * optimize sample (PaddlePaddle#117) * optimize sample * optimize sample Co-authored-by: yangjunchao <[email protected]> * fix clear gpu mem (PaddlePaddle#119) Co-authored-by: root <[email protected]> * fix sample core (PaddlePaddle#121) Co-authored-by: yangjunchao <[email protected]> * add ssd cache (PaddlePaddle#123) * add ssd cache;test=develop * add ssd cache;test=develop * add ssd cache;test=develop * add multi epoch train & fix train table change ins & save infer embeding (PaddlePaddle#129) * add multi epoch train & fix train table change ins & save infer embedding * change epoch finish judge * change epoch finish change Co-authored-by: root <[email protected]> * Add debug log (PaddlePaddle#131) * Add debug log * Add debug log Co-authored-by: root <[email protected]> * optimize mem in uniq slot feature (PaddlePaddle#130) * [GpuGraph] cherry pick var slot feature && fix load multi path node (PaddlePaddle#136) * optimize mem in uniq slot feature * cherry-pick var slot_feature Co-authored-by: huwei02 <[email protected]> * [GpuGraph] fix kernel overflow (PaddlePaddle#138) * optimize mem in uniq slot feature * cherry-pick var slot_feature * fix kernel overflow && add max feature num flag Co-authored-by: huwei02 <[email protected]> * fix ssd cache;test=develop (PaddlePaddle#139) * slot feature secondary storage (PaddlePaddle#140) * slot feature secondary storage * slot feature secondary storage Co-authored-by: yangjunchao <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: xuewujiao <[email protected]> Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> Co-authored-by: Thunderbrook <[email protected]> Co-authored-by: danleifeng <[email protected]> Co-authored-by: huwei02 <[email protected]>

fixed cross attention typeerror

beam support 20/30 and fused_multi_transformer_int8 keep fp32

reyoung assigned gangliao and emailweixu Sep 21, 2016

reyoung commented Sep 21, 2016

View reviewed changes

Try to fix MultinomialSampler

02c6cdf

reyoung force-pushed the add_moltinomial_sampler_unittest branch from b694f42 to 02c6cdf Compare September 21, 2016 14:11

emailweixu requested changes Sep 21, 2016

View reviewed changes

reyoung force-pushed the add_moltinomial_sampler_unittest branch 3 times, most recently from 0c43435 to 02c6cdf Compare September 22, 2016 12:14

Rewind smallPos in MultinomialSampler when every bigPos removed.

f63e641

* Also refine unittest to multiple iteration to prevent luckily random number. * Remove unused unittest before.

reyoung force-pushed the add_moltinomial_sampler_unittest branch from 47eda41 to f63e641 Compare September 22, 2016 12:59

emailweixu requested changes Sep 22, 2016

View reviewed changes

emailweixu reviewed Sep 22, 2016

View reviewed changes

Follow comments and revert most code to master

ff5247e

emailweixu merged commit 7eb29f2 into PaddlePaddle:master Sep 23, 2016

reyoung deleted the add_moltinomial_sampler_unittest branch September 23, 2016 05:48

zhhsplendid pushed a commit to zhhsplendid/Paddle that referenced this pull request Sep 25, 2019

Merge pull request PaddlePaddle#102 from PaddlePaddle/update-contribu…

aeb7262

…tedoc Update contribute_to_paddle.md

DemoMoon mentioned this pull request Mar 24, 2021

oneDNN 如何能提升DeepSpeech的语音处理性能 #31838

Closed

thisjiang pushed a commit to thisjiang/Paddle that referenced this pull request Oct 28, 2021

integrate cas in current workflow (PaddlePaddle#102)

465511b

wangxicoding pushed a commit to wangxicoding/Paddle that referenced this pull request Dec 9, 2021

Fix model_path in predict_glue.py (PaddlePaddle#102)

20d1ab9

zhoutianzi666 pushed a commit to zhoutianzi666/Paddle that referenced this pull request May 23, 2022

Merge pull request PaddlePaddle#102 from jiweibo/resnet50_rocm

fa32ad8

resnet50 demo support rocm

Thunderbrook added a commit to Thunderbrook/Paddle that referenced this pull request Sep 8, 2022

v2 infer (PaddlePaddle#102)

854959e

AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this pull request Sep 19, 2022

Add AnimeGANv2 model (PaddlePaddle#102)

10de3c6

* add animeganv2 network and dataset * animegan:refine code,add License Co-authored-by: qingqing01 <[email protected]>

AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this pull request Dec 6, 2023

Merge pull request PaddlePaddle#102 from Lamikins/main

8d9674e

fixed cross attention typeerror

laipaang added a commit to laipaang/Paddle that referenced this pull request Jan 16, 2024

Merge pull request PaddlePaddle#102 from laipaang/qingshui-2.4.2

5535c6f

beam support 20/30 and fused_multi_transformer_int8 keep fp32

lizexu123 pushed a commit to lizexu123/Paddle that referenced this pull request Feb 23, 2024

Format KD english API (PaddlePaddle#102)

88f849b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to fix MultinomialSampler #102

Try to fix MultinomialSampler #102

reyoung commented Sep 21, 2016

reyoung Sep 21, 2016

reyoung Sep 21, 2016

reyoung Sep 21, 2016

reyoung Sep 21, 2016

emailweixu Sep 21, 2016 •

edited

Loading

reyoung Sep 22, 2016

reyoung commented Sep 22, 2016

emailweixu Sep 22, 2016 •

edited

Loading

reyoung Sep 23, 2016

emailweixu Sep 22, 2016

Try to fix MultinomialSampler #102

Try to fix MultinomialSampler #102

Conversation

reyoung commented Sep 21, 2016

reyoung Sep 21, 2016

Choose a reason for hiding this comment

reyoung Sep 21, 2016

Choose a reason for hiding this comment

reyoung Sep 21, 2016

Choose a reason for hiding this comment

reyoung Sep 21, 2016

Choose a reason for hiding this comment

emailweixu Sep 21, 2016 • edited Loading

Choose a reason for hiding this comment

reyoung Sep 22, 2016

Choose a reason for hiding this comment

reyoung commented Sep 22, 2016

emailweixu Sep 22, 2016 • edited Loading

Choose a reason for hiding this comment

reyoung Sep 23, 2016

Choose a reason for hiding this comment

emailweixu Sep 22, 2016

Choose a reason for hiding this comment

emailweixu Sep 21, 2016 •

edited

Loading

emailweixu Sep 22, 2016 •

edited

Loading