-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuda: update interface to take 64-bit M #411
Conversation
338b5e2
to
b4e1c4e
Compare
This breaks the cufinufft interface, so is the plan to go into 2.2.1 ? 2.3? |
Right. The plan was to get this in before the 2.2.0 release (see discussion in #255). But perhaps that is too late now? In terms of breaking the interface, it's quite a benign breakage (so we may not need to wait until version 3.0). From what I understand, it will break the ABI (since we have |
Thanks. Ah, I just read your response - hoping you're in USA not Sweden :). |
Heh. In Sweden actually… This is prime work time!
Done.
Yeah I'm thinking we wait on this. Would like to get @blackwer's eyes on this before I do anything. |
91412f7
to
37dbab2
Compare
I don't see how this could cause any problems, so it looks good to me. Since it's not a pointer type and we're increasing the size so it won't cause overflows. I falsely said in another post that this had to do with It is technically an API change so semantic versioning would probably tell us to wait until 2.3 for this if it doesn't make it into 2.2. That said, given the automatic type casting I'm inclined to ignore that since it really shouldn't ever break anything (unless someone is linking 2.2 while using 2.2.1 headers or decided to make their own bindings). I personally vote we just go ahead and merge it but I understand the reticence there |
Right. That was my thinking too. Still, I'm a bit concerned about how this interacts with finufft/src/cuda/2d/spread2d_wrapper.cu Lines 125 to 135 in 37dbab2
From what I understand,
Would tend to agree, but we can discuss later. As I said elsewhere, no need to rush this. |
Can you add a 1d test with M>2.2e9, which becomes part of the CI for cufinufft? (at 1e9 NU pt/sec, it should take only 3 secs!). Thanks! Alex Tips: keep the cufft size < int32) |
I've been working on testing this yesterday and today. Some notes:
So I guess the question here is whether it's worth the effort (do we foresee GPU memory going into triple or quadruple digits anytime soon?) to finish the conversion. It is doable, of course, but would require some time and increase complexity of the code. |
Interesting.
Actually int (signed) overflows at 2^31 so 2.2e9 exceeds it (you can half
your RAM total).
But, I would be fine having the GPU code be int-only. Since idxnupts is 4
rather than 8 bytes, it allows the user more RAM for other stuff.
Best, Alex
…On Thu, May 9, 2024 at 11:50 AM Joakim Andén ***@***.***> wrote:
I've been working on testing this yesterday and today. Some notes:
- The code as written doesn't work due to the (silent) overflows when
assigning int64_t to int in a few places. This can be fixed, but
requires more work.
- The set of possible problem sizes is actually quite limited on
current hardware due to the limit on GPU memory. So far, I'm able to run
4e9 points in single precision without sorting. On the GPU, this takes up
approximately 4e9 * 4 bytes for the locations, 4e9 * 8 bytes for the
strengths and 4e9 * 4 bytes for idxnupts, which comes out to 60 GB.
(Actually, the last one should be 4e9 * 8 so it can properly index the
points, so the total comes out to 75 GB.)
- Running the tests is actually very slow (more than 3 seconds). The
vast majority of the time is not taken up by the cufinufft call, but
generating the data and checking the accuracy of the output (on the CPU).
This can of course be sped up using multithreading.
So I guess the question here is whether it's worth the effort (do we
foresee GPU memory going into triple or quadruple digits anytime soon?) to
finish the conversion. It is doable, of course, but would require some time
and increase complexity of the code.
—
Reply to this email directly, view it on GitHub
<#411 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACNZRSQKJKULFET5D7ENR33ZBOLOBAVCNFSM6AAAAABBE3ADAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBSHEZTKOJYG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
*-------------------------------------------------------------------~^`^~._.~'
|\ Alex Barnett Center for Computational Mathematics, Flatiron Institute
| \ http://users.flatironinstitute.org/~ahb 646-876-5942
|
Following discussion on Tuesday, we're scrapping the effort to transition the internals to handle 64-bit integers for the number of points. This is such a limited use case that it's not worth it at this point (may be relevant once we start having GPUs with more than 80 GB of memory). As a result, we're going to do the same thing as for the number of modes. That is, we allow 64-bit integers, but check whether these will fit into 32-bit integers and if so cast (otherwise, we error). |
While the interface changes, we still won't allow more than 2e9 points (for now) since this will complicate the code quite a bit with little tangible benefits (transforms these size are currently out of range for most GPUs in terms of memory consumption).
Had to update Jenkins slightly to install dependencies without cache (
Not sure why this worked (a fresh Docker image should have no cache, correct?), but it got rid of the error message. |
@janden do you need reviews from people on this before we bring in? you'll have to remind them re the final decision: |
Yes I'll tag @blackwer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me after the typo gets fixed
For compatibility with FINUFFT.
This is quite an invasive patch and I'm not sure it's doing the right thing in several places (in particular, I'm not certain that CUDA is happy taking
int64_t
for block dimensions, etc.). That being said, it is compiling and tests are passing, so that's something.