Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update UCX to 1.13.1 in CI and sets UCX_TLS=^posix #6573

Merged

Conversation

abellina
Copy link
Collaborator

@abellina abellina commented Sep 19, 2022

Fixes: #6572

This sets a config, as recommended by the UCX team, to use SysV API for shared memory allocations instead of posix. In the smoke tests we have seen that /dev/shm may be too small in some of the VMs we are getting to test, and this is the recommended workaround (UCX is going to automatically fallback to SysV if posix fails in the future).

@abellina abellina requested a review from pxLi September 19, 2022 15:42
@abellina abellina changed the title Updates UCX to 1.13.1 in Dockerfile-blossom.ubuntu and sets UCX_TLS=^… Update UCX to 1.13.1 in CI and sets UCX_TLS=^posix Sep 19, 2022
@abellina
Copy link
Collaborator Author

Note, for this PR to be mergeable [databricks] should get added to the title and the build should pass.

@abellina abellina force-pushed the shuffle/ucx_update_tls_for_smoke_tests branch from 1dd7d08 to 9ae20f1 Compare September 19, 2022 15:44
@sameerz sameerz added the bug Something isn't working label Sep 19, 2022
@abellina
Copy link
Collaborator Author

build

@abellina
Copy link
Collaborator Author

Note that we need to reverse this #6579 if that PR is merged.

@abellina
Copy link
Collaborator Author

Note, for this PR to be mergeable [databricks] should get added to the title and the build should pass.

This is not an issue on databricks, see: #6572 (comment).

@abellina abellina marked this pull request as ready for review September 20, 2022 16:23
@abellina
Copy link
Collaborator Author

build

@abellina abellina merged commit 998c76e into NVIDIA:branch-22.10 Sep 20, 2022
@abellina abellina deleted the shuffle/ucx_update_tls_for_smoke_tests branch September 20, 2022 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] UCX smoke tests can fail with OOM when initializing UCX
3 participants