-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #3523: Fix CustomGlobalRandomEngine for R #3781
Conversation
Hi, is other std random method safe to use ? |
@trivialfis No idea. One thing I learned is that random functions in |
Thanks for the link. The current tests use random to generate dmatrix, see: Line 109 in 1db28b8
It may be helpful to make tests reproducible. |
@trivialfis Sure, we can look into using a random generation library. Boost.Random and PCG come to mind. The former increases build complexity greatly, whereas the latter doesn't compile well with MSVC currently. Any other suggestion? |
No, this is very new to me. I am currently trying to add unittests for gpu-hist, for now I might just hard code the generated matrix. :( |
You can do that, but it doesn't look elegant to me. Let me look into Boost.Random. It looks like we don't have to include entire Boost, as Boost.Random comes in a separate repository: https://github.com/boostorg/random |
Great! Thanks. |
Suggested by @RAMitchell, I implemented a very simple |
@trivialfis I gave Boost.Random a shot, but Boost modules are so tightly-coupled that Boost.Random ended up requiring 18 Boost modules :( Yes, a simple implementation of linear congruential generator is the way to go, since we can have reproducible tests without pulling insane amount of dependencies. |
Also need to mention that the 18 Boost modules amount to 443K lines of code, almost all being headers. This will potentially increase compilation time a lot. Therefore, I motion against including Boost as dependency. |
Great. I should push the update tomorrow. :) |
I wonder if the 64 bit random integers produced by the custom random engine while the apples clang's shuffle somehow expects them to be 32 bit here could be the root cause? |
It does seem very weird that even such a minimal example wouldn't work, which basically means that What I see in the code for Have you tried initializing the random engine like: std::random_device rd;
std::mt19937 g(rd());
std::shuffle(v.begin(), v.end(), g); I would be very surprised if this didn't work on all platforms. That said, the result might still differ between platforms because it's implementation specific, so if we want consistency then PCG seems like the best option. In general these things are easy to get wrong, I can definitely recommend Stephan Lavavej's talk on the subject: https://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful @khotilov I think |
@thvasilo
The random generator is wrapped in thread local storage: https://github.com/dmlc/xgboost/blob/d594b11f3590981c9c9aa3eaa9233027be312c6a/src/common/common.cc |
@khotilov This is a possibility. Let me try changing |
@hcho3 OK, so the problem lies in the |
Let us try to look closer at least, it could be very possible due to reason @khotilov suggested and we just have to fix the CustomRandomEngine |
The |
It looks like R has only 32-bit random generators. E.g., here's the code for its default Mersenne Twister generator: https://github.com/wch/r-source/blob/bc124ff19cca95328e114f0fc2f068150ee2aa61/src/main/RNG.c#L633 |
**Symptom** Apple Clang's implementation of `std::shuffle` expects doesn't work correctly when it is run with the random bit generator for R package: ```cpp CustomGlobalRandomEngine::result_type CustomGlobalRandomEngine::operator()() { return static_cast<result_type>( std::floor(unif_rand() * CustomGlobalRandomEngine::max())); } ``` Minimial reproduction of failure (compile using Apple Clang 10.0): ```cpp std::vector<int> feature_set(100); std::iota(feature_set.begin(), feature_set.end(), 0); // initialize with 0, 1, 2, 3, ..., 99 std::shuffle(feature_set.begin(), feature_set.end(), common::GlobalRandom()); // This returns 0, 1, 2, ..., 99, so content didn't get shuffled at all!!! ``` Note that this bug is platform-dependent; it does not appear when GCC or upstream LLVM Clang is used. **Diagnosis** Apple Clang's `std::shuffle` expects 32-bit integer inputs, whereas `CustomGlobalRandomEngine::operator()` produces 64-bit integers. **Fix** Have `CustomGlobalRandomEngine::operator()` produce 32-bit integers. Closes dmlc#3523.
**Symptom** Apple Clang's implementation of `std::shuffle` expects doesn't work correctly when it is run with the random bit generator for R package: ```cpp CustomGlobalRandomEngine::result_type CustomGlobalRandomEngine::operator()() { return static_cast<result_type>( std::floor(unif_rand() * CustomGlobalRandomEngine::max())); } ``` Minimial reproduction of failure (compile using Apple Clang 10.0): ```cpp std::vector<int> feature_set(100); std::iota(feature_set.begin(), feature_set.end(), 0); // initialize with 0, 1, 2, 3, ..., 99 std::shuffle(feature_set.begin(), feature_set.end(), common::GlobalRandom()); // This returns 0, 1, 2, ..., 99, so content didn't get shuffled at all!!! ``` Note that this bug is platform-dependent; it does not appear when GCC or upstream LLVM Clang is used. **Diagnosis** Apple Clang's `std::shuffle` expects 32-bit integer inputs, whereas `CustomGlobalRandomEngine::operator()` produces 64-bit integers. **Fix** Have `CustomGlobalRandomEngine::operator()` produce 32-bit integers. Closes dmlc#3523.
**Symptom** Apple Clang's implementation of `std::shuffle` expects doesn't work correctly when it is run with the random bit generator for R package: ```cpp CustomGlobalRandomEngine::result_type CustomGlobalRandomEngine::operator()() { return static_cast<result_type>( std::floor(unif_rand() * CustomGlobalRandomEngine::max())); } ``` Minimial reproduction of failure (compile using Apple Clang 10.0): ```cpp std::vector<int> feature_set(100); std::iota(feature_set.begin(), feature_set.end(), 0); // initialize with 0, 1, 2, 3, ..., 99 std::shuffle(feature_set.begin(), feature_set.end(), common::GlobalRandom()); // This returns 0, 1, 2, ..., 99, so content didn't get shuffled at all!!! ``` Note that this bug is platform-dependent; it does not appear when GCC or upstream LLVM Clang is used. **Diagnosis** Apple Clang's `std::shuffle` expects 32-bit integer inputs, whereas `CustomGlobalRandomEngine::operator()` produces 64-bit integers. **Fix** Have `CustomGlobalRandomEngine::operator()` produce 32-bit integers. Closes dmlc#3523.
Diagnosis Apple Clang's implementation of
std::shuffle
doesn't work correctly when it is run with the random bit generator for R package:Minimial reproduction of failure (compile using Apple Clang 10.0):
Note that this bug is platform-dependent; it does not appear when GCC or upstream LLVM Clang is used.
Fix Use a platform-independent implementation of
std::shuffle
. A header-only library called PCG (Apache license) is included for this purpose.Closes #3523.
TODO: add a regression test, replace other occurrences of
std::shuffle
in XGBoost codebase