Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes devices vector alloc to fix seg fault, removes unused RAFT code in PLC, re-enables full CI testing #3167

Merged

Conversation

rlratzel
Copy link
Contributor

@rlratzel rlratzel commented Jan 22, 2023

closes #3124

  • Adds check to avoid allocating and copying zero-length device vectors. This prevents the seg fault shown below.
  • Removes the special case to ignore seg faults in CI scripts
  • Adds a test to reproduce seg fault locally (see output below).

This PR addresses the problem shown below:

================================= test session starts =================================
platform linux -- Python 3.8.15, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/envs/test/bin/python3.8
cachedir: .pytest_cache
rapids_pytest_benchmark: 0.0.14
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /cugraph/python/pylibcugraph, configfile: pytest.ini
plugins: cov-4.0.0, rapids-pytest-benchmark-0.0.14, benchmark-3.2.3
collected 9 items / 8 deselected / 1 selected

python/pylibcugraph/pylibcugraph/tests/test_graph_sg.py::test_SGGraph_create_from_cudf
get edgelist...edgelist =     src  dst  wgt
0    0    1  0.0
1    1    2  0.1
2    2    4  0.2
done
create Graph...done
created SGGraph plc_graph=<pylibcugraph.graphs.SGGraph object at 0x7fb7e35f30f0>
PASSED

=========================== 1 passed, 8 deselected in 1.69s ===========================
Segmentation fault (core dumped)

@cjnolet found a work-around for us, so this should pass CI and can be merged after rapidsai/raft#1224

@rlratzel rlratzel added bug Something isn't working non-breaking Non-breaking change labels Jan 22, 2023
@rlratzel rlratzel added this to the 23.02 milestone Jan 22, 2023
@rlratzel rlratzel changed the title Removes special case to ignore seg faults in CI scripts, adds test to reproduce seg fault locally. Does not alloc and copy zero-length device vectors, removes special case to ignore seg faults in CI scripts, adds test to reproduce seg fault locally. Jan 25, 2023
@rlratzel rlratzel marked this pull request as ready for review January 25, 2023 18:52
@rlratzel rlratzel requested review from a team as code owners January 25, 2023 18:52
ajschmidt8
ajschmidt8 approved these changes Jan 25, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jan 26, 2023

Codecov Report

Base: 56.26% // Head: 56.26% // No change to project coverage 👍

Coverage data is based on head (3bba486) compared to base (64dabc3).
Patch has no changes to coverable lines.

Additional details and impacted files
@@              Coverage Diff              @@
##           branch-23.02    #3167   +/-   ##
=============================================
  Coverage         56.26%   56.26%           
=============================================
  Files               153      153           
  Lines              9658     9658           
=============================================
  Hits               5434     5434           
  Misses             4224     4224           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@cjnolet cjnolet requested a review from a team as a code owner February 2, 2023 00:06
@rlratzel rlratzel added the DO NOT MERGE Hold off on merging; see PR for details label Feb 2, 2023
@rlratzel
Copy link
Contributor Author

rlratzel commented Feb 2, 2023

I can't request changes on my own PRs, so I added the DO NOT MERGE label so this isn't accidentally merged before commenting on if removing RAFT static linking still needs to/can be done: #3167 (comment)

…y linked again) for wheel builds. The change was done while debugging a crash, and is no longer needed.
@rlratzel rlratzel removed the DO NOT MERGE Hold off on merging; see PR for details label Feb 2, 2023
@rlratzel rlratzel changed the title Does not alloc and copy zero-length device vectors, removes special case to ignore seg faults in CI scripts, adds test to reproduce seg fault locally. Fixes devices vector alloc and removes RAFT code to fix seg fault, re-enables full CI testing Feb 2, 2023
@rlratzel rlratzel added the DO NOT MERGE Hold off on merging; see PR for details label Feb 2, 2023
@rlratzel
Copy link
Contributor Author

rlratzel commented Feb 2, 2023

Added DO NOT MERGE back until I've finished verifying it's safe to remove the RAFT sources from PLC, in the off chance CI passes even if they're needed.

@rlratzel rlratzel removed the DO NOT MERGE Hold off on merging; see PR for details label Feb 2, 2023
@rlratzel rlratzel changed the title Fixes devices vector alloc and removes RAFT code to fix seg fault, re-enables full CI testing Fixes devices vector alloc to fix seg fault, removes unused RAFT code in PLC, re-enables full CI testing Feb 2, 2023
@rlratzel
Copy link
Contributor Author

rlratzel commented Feb 2, 2023

/merge

@ChuckHastings
Copy link
Collaborator

Cancelled CI run. We need rapidsai/raft#1224 to be pushed into nightlies so that the change can be picked up by CI.

@ChuckHastings
Copy link
Collaborator

/merge

@rapids-bot rapids-bot bot merged commit f780add into rapidsai:branch-23.02 Feb 3, 2023
@rlratzel rlratzel deleted the branch-23.02-py_cleanup_crash branch September 28, 2023 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI crashes on pytest teardown
6 participants