-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GPU Profiler in PaddlePaddle #482
Conversation
gangliao
commented
Nov 15, 2016
•
edited
Loading
edited
- Add GPU profiler object API
- Add Performance tuning docs
- Add unit test
- pre-commit hook check code style
@@ -42,7 +42,7 @@ addons: | |||
before_install: | |||
- | | |||
if [ ${JOB} == "BUILD_AND_TEST" ]; then | |||
if ! git diff --name-only $TRAVIS_COMMIT_RANGE | grep -qvE '(\.md$)' | |||
if ! git diff --name-only $TRAVIS_COMMIT_RANGE | grep -qvE '(\.md$)|(\.rst$)|(\.jpg$)|(\.png$)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If only modify rst, md, Travis ci only build Docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool
@@ -95,7 +95,7 @@ As a simple example, consider the following: | |||
```bash | |||
# necessary | |||
sudo apt-get update | |||
sudo apt-get install -y g++ make cmake build-essential libatlas-base-dev python python-pip libpython-dev m4 libprotobuf-dev protobuf-compiler python-protobuf python-numpy git | |||
sudo apt-get install -y g++ make cmake swig build-essential libatlas-base-dev python python-pip libpython-dev m4 libprotobuf-dev protobuf-compiler python-protobuf python-numpy git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add swig to avoid demo failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool
@@ -14,3 +14,4 @@ add_simple_unittest(test_perturbation) | |||
add_simple_unittest(test_CpuGpuVector) | |||
add_simple_unittest(test_Allocator) | |||
add_simple_unittest(test_FPException) | |||
add_simple_unittest(test_GpuProfiler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update to the newest develop code, and use the pre commit
hooks. This file has a bad eof
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
TEST(Profiler, BilinearFwdBwd) { | ||
hl_profiler_start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could write a raii object for gpu profile, just like lock guard or set device. And we should add a global variable for profile refference count for maintaining the reentrant function call.
But it could be not in this PR.
|
||
.. code-block:: c++ | ||
|
||
TEST(Profiler, BilinearFwdBwd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rst file could reference other files locally. it is not need to copy and paste your codes. Please try to use literal_include(or something similar, i cannot remember the command exactly).
@@ -42,7 +42,7 @@ addons: | |||
before_install: | |||
- | | |||
if [ ${JOB} == "BUILD_AND_TEST" ]; then | |||
if ! git diff --name-only $TRAVIS_COMMIT_RANGE | grep -qvE '(\.md$)' | |||
if ! git diff --name-only $TRAVIS_COMMIT_RANGE | grep -qvE '(\.md$)|(\.rst$)|(\.jpg$)|(\.png$)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool
@@ -95,7 +95,7 @@ As a simple example, consider the following: | |||
```bash | |||
# necessary | |||
sudo apt-get update | |||
sudo apt-get install -y g++ make cmake build-essential libatlas-base-dev python python-pip libpython-dev m4 libprotobuf-dev protobuf-compiler python-protobuf python-numpy git | |||
sudo apt-get install -y g++ make cmake swig build-essential libatlas-base-dev python python-pip libpython-dev m4 libprotobuf-dev protobuf-compiler python-protobuf python-numpy git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool
因为这些commit中没有merge主干的部分,所以能将commit合并成一个提交么? |
@@ -280,4 +281,23 @@ inline StatSet& registerTimerArg2(uint64_t threshold = -1, | |||
|
|||
#endif // DISABLE_TIMER | |||
|
|||
class GpuProfiler final { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final关键词似乎会让Paddle的GCC版本提升到4.7+。我有印象4.6是不支持final的。。
Explicit virtual overrides
https://gcc.gnu.org/projects/cxx-status.html#cxx11
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不过我们把gcc升到4.8+也未尝不可。因为百度的toolchain已经是4.8.2了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我也觉得可以升到4.8+
class GpuProfiler final { | ||
public: | ||
GpuProfiler() { hl_profiler_start(); } | ||
~GpuProfiler() { hl_profiler_end(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里有一个问题是,如果有重入的调用就会有问题吧。例如
void funcA() {
GpuProfiler p;
...
}
void funcB() {
GpuProfiler p;
funcA();
...
}
所以,GpuProfiler还需要在全局维护一个引用计数,并且需要是使用 http://en.cppreference.com/w/cpp/thread/recursive_mutex 来保护的。即Gpu测试也只能在一个线程里进行开关控制。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有道理啊
====================== | ||
Since training deep neural network typically take a very long time to get over, performance is gradually becoming | ||
the most important thing in deep learning field. The first step to improve performance is to understand what parts | ||
are slow. No point in improving performance of a region which doesn’t take much time! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No point in improving performance of a region which doesn’t take much time!
没看懂这句话啥意思?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我把 there is 加上吧。
There is no point in improving performance of a region which doesn’t take much time!
这个应该是amdahl's law理念吧。。
The above code snippet includes two methods, you can use any of them to profile the regions of interest. | ||
|
||
1. :code:`REGISTER_TIMER_INFO` is a built-in timer wrapper which can calculate the time overhead of both cpu functions and cuda kernels. | ||
2. :code:`REGISTER_GPU_PROFILER` is a general purpose wrapper object of :code:`cudaProfilerStart` and :code:`cudaProfilerStop` to avoid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add blank line between 1. 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, thanks.
using namespace paddle; // NOLINT | ||
using namespace std; // NOLINT | ||
|
||
void MatrixCheckErr(const Matrix& matrix1, const Matrix& matrix2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MatrixCheckErr和testBilinearFwdBwd在test_MatrixCompare中都有,不用再写一遍了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
下一个PR吧 把test_matrixCompare.cpp里面几个check函数 转移到test_matrixUtil.h, 有些check函数也重复了。。。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK,不过会和@hedaoyuan #385 的冲突了么
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不至于冲突,他那个是更高一层abstraction, 有了他那个,确实也不急着改了。。。
…-api-cn-layers 1211 update api cn layers
* Create python-publish.yml + action * Update python-publish.yml
* update chnsenticorp and lcqmc to qianyan format * update md5 check