-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TRACKER][BUG] Integer-based indexing causing failures when data grows very large. #2459
Comments
Logistic regression seems to throw an error before the algorithm even executes and it seems to be from CuPy: a = np.random.random((1000000, 2100))
from cuml.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(a, a[:,0])
|
I executed the Failed:
Did not fail:
|
I believe the solution for most of these should be to convert any and all variables representing number of array elements (e.g., n_rows * n_cols) to |
This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. |
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d. |
Testing these on 2108 w/ 1Mx2.1k (float32):
|
Addresses #2459 (likely not all of it) Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #3983
I really thought PCA/TSVD had been fixed in 21.08 but it does appear the issue still lingers:
Definitely still seems to be an overflow somewhere. I'm guessing in a 32-bit int. |
Related: #4105 |
The following script results in an illegal memory access: ```python import cupy as cp from cuml.ensemble import RandomForestClassifier X = cp.random.random((1000000, 2500),dtype=cp.float32) assert(X.size > 2**31) y = cp.zeros(X.shape[0]) y[::2]=1.0 model = RandomForestClassifier() model.fit(X,y) pred = model.predict(X) ``` Fixed by some targeted casting of integers to 64 bit. See #2459. Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Venkat (https://github.com/venkywonka) - Corey J. Nolet (https://github.com/cjnolet) URL: #4563
…3983) Addresses rapidsai#2459 (likely not all of it) Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#3983
The following script results in an illegal memory access: ```python import cupy as cp from cuml.ensemble import RandomForestClassifier X = cp.random.random((1000000, 2500),dtype=cp.float32) assert(X.size > 2**31) y = cp.zeros(X.shape[0]) y[::2]=1.0 model = RandomForestClassifier() model.fit(X,y) pred = model.predict(X) ``` Fixed by some targeted casting of integers to 64 bit. See rapidsai#2459. Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Venkat (https://github.com/venkywonka) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4563
Below is a reproducible example on the 0.15 nightly of cuML w/ CUDA 10.2.
There are several estimators that fail with either
illegal memory access was encountered
orinvalid configuration argument
, both of which seem to indicate an integer overflow might be occurring when representing the size of the underlying array.Here's the exception
In the above example, I used a size of
1Mx5000
, which is >>2^31
number of samples. I also tried with1Mx2500
, which is also >2^31
.Just to rule out the possibility that this error only occurs in the case of oversubscribing the GPU memory, I tried with
1Mx2100
, (which is<2^31
but still requires>32gb
of GPU memory to train):As a result of the behavior outlined above, I have a very strong suspicion the are integers being used to represent array sizes that should be promoted to
size_t
. This should also affectTruncatedSVD
.Keeping a list to track the places where this occurs so far:
The text was updated successfully, but these errors were encountered: