Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Sporadic OLS pytest fail in test_linear_regression_model_default #1739

Closed
dantegd opened this issue Feb 24, 2020 · 4 comments · Fixed by #3433
Closed

[BUG] Sporadic OLS pytest fail in test_linear_regression_model_default #1739

dantegd opened this issue Feb 24, 2020 · 4 comments · Fixed by #3433
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@dantegd
Copy link
Member

dantegd commented Feb 24, 2020

Describe the bug
Have seen it only twice so far, both times in CI environments as far as I remember, but the single GPU OLS test test_linear_regression_model_default has had a very odd failure in CUDA 10.2.

Steps/Code to reproduce bug
Run OLS pytests (python/cuml/test/test_linear_regression.py). It might be needed to run multiple times, and potentially in Docker containers that reproduce the CI environment. Error looks like this:

datatype = <class 'numpy.float64'>

    @pytest.mark.parametrize('datatype', [np.float32, np.float64])
    def test_linear_regression_model_default(datatype):
    
        X_train, X_test, y_train, y_test = small_regression_dataset(datatype)
    
        # Initialization of cuML's linear regression model
        cuols = cuLinearRegression()
    
        # fit and predict cuml linear regression model
        cuols.fit(X_train, y_train)
        cuols_predict = cuols.predict(X_test).to_array()
    
        # sklearn linear regression model initialization and fit
        skols = skLinearRegression()
        skols.fit(X_train, y_train)
    
        skols_predict = skols.predict(X_test)
    
>       assert array_equal(skols_predict, cuols_predict,
                           1e-1, with_sign=True)
E       assert False
E        +  where False = array_equal(array([  74.52165384,   86.95202056, -174.44437402, -266.12312205,\n         98.6368422 , -298.8942179 , -119.92719853,...78734, -266.81005479,   38.99922689,  -84.58865571,\n         27.61662986,   -3.16178172,  -90.74953231, -240.09593383]), array([  78.40532309,   92.59378147, -165.29200792, -293.69589019,\n         53.59525341, -261.96500642, -135.10476893,...18215, -245.64010199,   59.21660234,  -59.3483221 ,\n         32.61762818,   16.48199754,  -66.87532649, -223.41115976]), 0.1, with_sign=True)

/rapids/cuml/python/cuml/test/test_linear_model.py:111: AssertionError

Environment details (please complete the following information):

  • Environment location: Docker
  • Linux Distro/Architecture: CentOS 7 at least
  • GPU Model/Driver: V100-32GB
  • CUDA: 10.2
  • Method of cuDF & cuML install: source in CI

Additional context
Link to example log: https://gpuci.gpuopenanalytics.com/job/docker/job/tests/job/docker-test-cuml/283/CUDA_VERSION=10.2,LINUX_VERSION=centos7,PYTHON_VERSION=3.7/testReport/junit/cuml.test/test_linear_model/test_linear_regression_model_default_float64_/

@dantegd dantegd added bug Something isn't working ? - Needs Triage Need team to review and classify labels Feb 24, 2020
@JohnZed JohnZed closed this as completed Mar 19, 2020
@JohnZed
Copy link
Contributor

JohnZed commented Mar 19, 2020

Have not seen this in ages, believed to be closed out.

@wphicks wphicks reopened this Feb 4, 2021
@wphicks wphicks changed the title [BUG] Sporadic OLS pytest fail in CUDA 10.2 [BUG] Sporadic OLS pytest fail Feb 4, 2021
@wphicks
Copy link
Contributor

wphicks commented Feb 4, 2021

Observed in both CUDA 10.1 and 10.2 at least. @dantegd believes that this may be a Volta/Pascal but not Turing/Ampere issue

@wphicks wphicks changed the title [BUG] Sporadic OLS pytest fail [BUG] Sporadic OLS pytest fail in test_linear_regression_model_default Feb 4, 2021
@wphicks
Copy link
Contributor

wphicks commented Feb 4, 2021

Now reproduced on Volta/ CUDA 11 and with both float32 and float64

rapids-bot bot pushed a commit that referenced this issue Feb 10, 2021
Closes #1739 

Addresses most items of #3224

Authors:
  - Dante Gama Dessavre (@dantegd)

Approvers:
  - John Zedlewski (@JohnZed)

URL: #3433
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants