-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java APIs to fetch CUDA runtime info [skip ci] #8465
Conversation
Signed-off-by: sperlingxx <[email protected]>
Signed-off-by: sperlingxx <[email protected]>
public enum CudaComputeMode { | ||
cudaComputeModeDefault(0), | ||
cudaComputeModeExclusive(1), | ||
cudaComputeModeProhibited(2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thee enums do not match the java naming convention, which is UPPER_CASE_WITH_UNDERSCORES. Also because all of them are under the CudaComputeMode
class we don't need to prefix them all with cudaComputeMode
Could you please change them to
public enum CudaComputeMode {
DEFAULT(0),
EXCLUSIVE(1),
PROHIBITED(2),
EXCLUSIVE_PROCESS(3);
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied the UPPER_CASE_WITH_UNDERSCORES style.
*/ | ||
public static CudaComputeMode getComputeMode() { | ||
int nativeMode = Cuda.getNativeComputeMode(); | ||
switch (nativeMode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally for other enums we put the mapping from native in the enum class itself. That way all of the info is in a single file and is consistent. We also can make the code a lot smaller if we don't worry about performance. Because this is not something that is going to be called in a tight loop where performance matters, I would much rather have smaller code with guaranteed consistency. You can use the BinaryOp.fromNative
as an example.
static BinaryOp fromNative(int nativeId) {
for (BinaryOp type : OPS) {
if (type.nativeId == nativeId) {
return type;
}
}
throw new IllegalArgumentException("Could not translate " + nativeId + " into a BinaryOp");
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the way of native mapping. For now, it follows above style.
@@ -399,6 +399,14 @@ public void testPoolLimitNonPoolMode() { | |||
() -> Rmm.initialize(RmmAllocationMode.CUDA_DEFAULT, false, 1024, 2048)); | |||
} | |||
|
|||
@Test | |||
public void testGetCudaRuntimeInfo() { | |||
Rmm.initialize(RmmAllocationMode.POOL, false, 1024); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we run these APIs without initializing the pool? Because in the common case I suspect that is how they are going to be used.
Also why are the tests a part of RMM. They should have nothing to do with RMM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this initialization is used to set the GPU device for the later operations, per the current JNI implementation. But even so, moving these tests to Cuda would be better.
And I am wondering whether all the calls in CudaJni need to set the device first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this case to a newly-created test class CudaTest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And there is no necessary to initialize RMM before running these APIs.
@@ -399,6 +399,14 @@ public void testPoolLimitNonPoolMode() { | |||
() -> Rmm.initialize(RmmAllocationMode.CUDA_DEFAULT, false, 1024, 2048)); | |||
} | |||
|
|||
@Test | |||
public void testGetCudaRuntimeInfo() { | |||
Rmm.initialize(RmmAllocationMode.POOL, false, 1024); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this initialization is used to set the GPU device for the later operations, per the current JNI implementation. But even so, moving these tests to Cuda would be better.
And I am wondering whether all the calls in CudaJni need to set the device first.
Signed-off-by: sperlingxx <[email protected]>
Signed-off-by: sperlingxx <[email protected]>
/** | ||
* Compute-exclusive-thread mode | ||
* Only one thread in one process will be able to use cudaSetDevice() with this device. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mode is deprecated. Better to add the same comment here.
(base) liangcail@liangcail-ubuntu18:~/work/projects/on_github/spark-rapids$ nvidia-smi -c 1
Warning: Exclusive_Thread was deprecated! Setting Exclusive_Process instead.
Unable to set the compute mode for GPU 00000000:01:00.0: Insufficient Permissions
Terminating early due to previous errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added warning message for the deprecation.
Signed-off-by: sperlingxx <[email protected]>
@gpucibot merge |
Closes #8084 #6363
Signed-off-by: sperlingxx [email protected]