Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [MG] coo2csr failure on 64GB input #405

Closed
afender opened this issue Jul 22, 2019 · 1 comment
Closed

[BUG] [MG] coo2csr failure on 64GB input #405

afender opened this issue Jul 22, 2019 · 1 comment
Assignees
Labels
bug Something isn't working
Milestone

Comments

@afender
Copy link
Member

afender commented Jul 22, 2019

Describe the bug

mem free: 3.15815e+10  mem total: 3.40583e+10  mem used: 2.4768e+09
  mem free: 3.10876e+10  mem total: 3.40583e+10  mem used: 2.97068e+09
  mem free: 3.10876e+10  mem total: 3.40583e+10  mem used: 2.97068e+09
  mem free: 3.10876e+10  mem total: 3.40583e+10  mem used: 2.97068e+09
  mem free: 3.10876e+10  mem total: 3.40583e+10  mem used: 2.97068e+09
  mem free: 3.10876e+10  mem total: 3.40583e+10  mem used: 2.97068e+09
  mem free: 3.10876e+10  mem total: 3.40583e+10  mem used: 2.97068e+09
  mem free: 3.10247e+10  mem total: 3.40583e+10  mem used: 3.0336e+09
  mem free: 3.09492e+10  mem total: 3.40583e+10  mem used: 3.10909e+09
  mem free: 3.10247e+10  mem total: 3.40583e+10  mem used: 3.0336e+09
  mem free: 3.10876e+10  mem total: 3.40583e+10  mem used: 2.97068e+09
  mem free: 3.10876e+10  mem total: 3.40583e+10  mem used: 2.97068e+09
  mem free: 3.10247e+10  mem total: 3.40583e+10  mem used: 3.0336e+09
  mem free: 3.10876e+10  mem total: 3.40583e+10  mem used: 2.97068e+09
  mem free: 3.05308e+10  mem total: 3.40583e+10  mem used: 3.52748e+09
  mem free: 3.09618e+10  mem total: 3.40583e+10  mem used: 3.09651e+09
myRowCount: 0
myRowCount: 247029006
myRowCount: 247028939
myRowCount: 247028972
myRowCount: -1976231961
myRowCount: 0
myRowCount: -342503438
myRowCount: 247028973
myRowCount: 247029038
myRowCount: 0
myRowCount: 0
myRowCount: 247029005
myRowCount: 0
myRowCount: 247029019
myRowCount: 0
myRowCount: 247029009terminate called after throwing an instance of 'std::runtime_error'
terminate called recursively
  what():  ERROR: RMM runtime call  RMM_ALLOC((&cooRowNew), (sizeof(idx_t) * myRowCount), (nullptr))out of memory
/home/nfs/iroy/anaconda3/envs/cugraph_env/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 1 leaked semaphores to clean up at shutdown
  len(cache))
distributed.nanny - WARNING - Worker process 70692 was killed by signal 6
distributed.nanny - WARNING - Restarting worker

To Reproduce
run hibench test on bigdatax2 on a dgx in the lab (disable Nx comparison)

@afender afender added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jul 22, 2019
@afender afender added this to the 0.9.0 milestone Jul 22, 2019
@afender afender removed the ? - Needs Triage Need team to review and classify label Jul 22, 2019
@afender afender changed the title [BUG] coo2csr failure on 64GB input [BUG] [MG] coo2csr failure on 64GB input Jul 22, 2019
@afender
Copy link
Member Author

afender commented Jul 24, 2019

closed by #410

@afender afender closed this as completed Jul 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants