-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement predict_per_tree() in FIL #5303
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! I left some inline feedback on details, but in general it looks quite good.
The one larger thing I'd like to see us clean up is how we're handling the shared memory to global memory fallback. Rather than having custom logic at every point we need to touch that, it would be nice to encapsulate all of that in shared_memory_buffer
. Specifically, I would love to see the signature for fill
change to
fill(index_type element_count, T value=T{}, T* fallback_buffer=nullptr)
Then, we don't have to have any special logic for when we are or are not using the fallback. Before we launch the kernel, we allocate global memory if we detect that we don't have enough shared memory. Otherwise, we have an empty buffer. We then pass that pointer all the way through the layers without any other processing down to shared_memory_buffer
. If it sees that the fill
is going to fail, it fills the fallback buffer and returns that pointer. That keeps the logic for all different prediction types the same, and it encapsulates any special handling in the shared_memory_buffer
object.
Other than that, I think we're looking pretty great!
FYI, I renamed the enum throughout the codebase so that it's not confused with the array type.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful! Everything about this is great. I had one small request for where we put the output_t
logic, but it should not hold up a merge unnecessarily.
I'm testing for perf regressions now, and we can merge as soon as that gets cleared. If we get that output_t
change in before that's done, great, but otherwise I'm fine with merging as is.
/merge |
This reverts commit ecd4d02 to avoid a race condition.
No description provided.