-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JNI methods for detecting and purging non-empty nulls from LIST and STRUCT #12742
Add JNI methods for detecting and purging non-empty nulls from LIST and STRUCT #12742
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use case for this? We are working on the cudf side, trying to totally eliminate the non-empty null issue.
This is needed because we need to support HashAggregate[List] which cannot be done unless we support sorting with non-empty nulls, so as a work around cudf has these functions which we can use in the interim. For reference checkout NVIDIA/spark-rapids#6679, NVIDIA/spark-rapids#5430 and perhaps follow the trail from NVIDIA/spark-rapids#6680. How far is the team in figuring out the solution for this and when is the fix in cudf expected to go in? An alternative, and my preference, would be to let this go in for now and revert it when cudf is ready. |
I see the issue with In addition, the PR description doesn't have any context/meaningful description. Please update it. |
ColumnView in Java just calls into the cpp code, if cpp guarantees there will be no non-empty nulls then we won't need to call this from Java. As you said if we are working on a proper fix this wouldn't be needed hopefully. |
I am having difficulty running pre-commit hooks and thus having to push multiple updates due to formatting errors |
You should run |
/test |
re-run tests |
CI Java failed due to:
|
Yes, I can see that thanks. The reason why I was confused about the CI failure was that the tests are passing locally for me, so I was wondering if the failure was because of something going haywire in the CI because the "update branch" was enabled. |
The CI job is failing but when I try to run the exact CI jobs using @ajschmidt8 help and by following steps listed here. The tests pass for me locally and inside rapidsai ci docker. We are still looking into the cause of this |
List<Integer> list1 = Arrays.asList(4, 5, null); | ||
List<Integer> list2 = Arrays.asList(7, 8, 9); | ||
List<Integer> list3 = null; | ||
ColumnVector input = makeListsColumn(DType.INT32, list0, list1, list2, list3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI failed because there is memory leak in Java tests. Sorry I overlooked the Java tests. Please fix them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ColumnVector
as well as HostColumnVector
must be wrapped in try
blocks. Probably the buffer vars need to be wrapped too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... makes me want to have this failure on my local so we can avoid this. It will still be good to get to the bottom of this discrepancy.
In the meanwhile, I will push a fix. Thanks for pointing us to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, the discrepancy isn't ideal.
Could it be caused by the types of GPUs that are used in CI vs. on our local machines?
I'm wondering if that or the amount of memory on a given GPU would contribute to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there is a variable that controls the test behavior, and that var is set by the local environment. @revans2 Do you know about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a memory leak happens and java's GC is fast, then the leak is cleared up before RMM is shut down and there is no error. If GC is slow, then the leak is not cleared and you get the error. We have a flag to print out messages to help debug leaks, but it is off by default because it makes the tests take a lot longer. Just set -Dai.rapids.refcount.debug=true
on the maven commend line when running the tests. Even if they pass you can then look at the logs to see if any leaks were detected and use the stack traces to track down what is leaked and why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in this case aren't the vectors off the heap? Did you mean the MemoryCleaner instead of GC?
@@ -6693,6 +6693,9 @@ void testApplyBooleanMaskFromListOfStructure() { | |||
} | |||
} | |||
|
|||
/** | |||
* The caller needs to make sure to close the returned ColumnView |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the returned ColumnView is not enough. There are more local variables/columns need to be closed.
* The caller needs to make sure to close the returned ColumnView | |
* The caller needs to make sure to call this function in a `try` block as local variables need to be closed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh, I am rushing through this and context-switching isn't helping either. I think it will make sense for me to close all the local allocations and leave only the returned vector to be closed by the caller.
ColumnView colWithNonEmptyNulls = new ColumnView(input.type, input.rows, Optional.of(2L), dmb, | ||
input.getDeviceBufferFor(BufferType.OFFSET), input.getChildColumnViews()); | ||
assertEquals(2, colWithNonEmptyNulls.nullCount); | ||
return colWithNonEmptyNulls; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this returned vector should be in this try block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not the problem, the problem is that the input
column will be closed while the view won't. Unfortunately my build env is completely ruined and I will have to spend time to fix it before I can proceed. Thanks for your quick reviews
/merge |
Description
This PR adds methods for detecting and purging non-empty nulls.
Checklist