-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.1.0RC2/RC1 Performance degradation #5666
Comments
RC2 is actually at commit 8aaabce of the same release branch. Two commits have been made on top of RC2. So it seems like these two commits actually improved the performance? |
Another possibility is the build environment. The PyPI wheels are built using CentOS 6 + GCC 5.x. So your custom build may be using native platform-specific optimization that’s not available in the PyPI build. |
What I see - new PRs shouldn't impact in the performance. Thank you! |
Yes, we use Docker containers to build PyPI wheels. You can run the following command at the project root:
|
I think there are techniques for detecting supported instructions on runtime depending on platforms. On linux that would be simply parsing |
@trivialfis, probably, I missed you - do you mean that supported instructions sets depend on a platform where XGBoost was built? I suppose the issue is not solved yet. Probably, we have non-optimal env for XGBoost building or so on. @ShvetsKS, could you, please, try to build XGBoost using the appropriate container? |
@SmirnovEgorRu The default binary distribution is not optimal and I don't think it can be. As it uses older gcc (5.x), also non aggressive optimization level. Also I'm not sure LTO is enabled in that case. If you want optimal performance on default build, then we need to do PGO on test farm.
No. I haven't use tensorflow for a while. But I remember that if you download from pip, it gives a warning saying tensorflow supports avx2 but the current binary is not built with this compiler flag enabled. |
We have limit on what kind of optimization we can do, since PyPI wheels have to support a wide range of machines. For example, we cannot use AVX512 instructions. However, I think the more probable cause is the library dependencies. Unfortunately, there's not much we can do about libraries either, since PyPI requires that all Linux builds use CentOS 6. See https://www.python.org/dev/peps/pep-0571/ If build environments exhibit significant performance characteristics, we should look into using alternative distribution channels, such as Conda. |
@hcho3, @trivialfis, I agree that there are things what we can't do in public PIP releases.
|
@SmirnovEgorRu Thanks. We'll keep this issue open for now. |
I did experiments with gcc 5.5 instead of 7.5 version. Results are the same
and if I use docker for build option |
but if i use If we summarize current problem we can see:
|
That's weird. Let me try a bit more. |
Interesting. Could it be because of code bloat? |
I hope that modification of
I needed it to get
were deleted |
I released 1.1.0 today, as scheduled. If we happen to find a way to speed up PyPI releases without compromising usability, we could release a patch release (1.1.1). |
seems the root cause was found, and #5720 was prepared as attempt to fix it. |
@hcho3, the issue is solved in master. Can we release 1.1.1? |
@SmirnovEgorRu @ShvetsKS I filed #5732 to release 1.1.1. |
1.1.1 is now out. |
@hcho3, I appreciate this, thank you! |
Slowdown was discovered in case we compare training time obtained on release version - 1.1.0rc2(1.1.0rc1)https://pypi.org/project/xgboost/#history vs custom build on head (c42f533) of release branch(https://github.com/dmlc/xgboost/commits/release_1.1.0) we will see next results:
25% slowdown in average.
custom build was obtained by default instruction:
gcc version:
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
higgs1m is public benchmark (https://github.com/dmlc/xgboost-bench/tree/master/hist_method).
note we can see big difference from run to run measurements on rc1/2 build:
higgs1m:
rc1: 21.3283 sec ( [19.96872500400059, 20.516250001004664, 21.93073065700446, 22.897386002005078, 32.79936764600279] )
vs
custom: 16.2467 sec ( [15.885650180993252, 15.897585067999898, 16.517098495001846, 16.686414559000696, 18.001827495994803] )
The text was updated successfully, but these errors were encountered: