-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable serialization and shared memory #410
Conversation
…e machinery for looping over databoxes
…s that catch issues related to CRTP.
…nto jmm/serialization
Co-authored-by: Jeff Peterson <[email protected]>
…nto jmm/serialization
Thanks for the feedback @jhp-lanl ! I think it should all be addressed now... modulo (a) CI issues and (b) the EOSPAC question we discussed. I will resolve the latter in a separate MR after doing some testing in a self-contained reproducer.
Nope, it was actually unsanitized input in |
wow! Thanks for the find and fix! |
// extract each individual EOS and do something with it | ||
std::vector<EOS> eos_host_vec = deserializer.eos_objects; | ||
// get on device if you want | ||
for (auto EOS : eos_host_vec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a can of worms at some point when we want to have 1 copy per GPU rather than 1 copy per node. Some host codes may want to hand us GPU memory so we may need to have a facility to handle that case at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this will currently produce is 1 copy per MPI rank on device and 1 copy per node on host, because GetOnDevice
will do its own allocation. The case where we want device shared memory is one I'm aware of but haven't thought deeply about. My suggestion is we punt on this but agree there's an engineering problem to solve here.
@@ -32,11 +32,21 @@ constexpr unsigned long all_values = (1 << 7) - 1; | |||
} // namespace thermalqs | |||
|
|||
constexpr size_t MAX_NUM_LAMBDAS = 3; | |||
enum class DataStatus { Deallocated = 0, OnDevice = 1, OnHost = 2 }; | |||
enum class DataStatus { Deallocated = 0, OnDevice = 1, OnHost = 2, UnManaged = 3 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have sufficient coverage in our testing to make sure we are catching the potential issues? I recall this took up many of @Yurlungur's cycles to get this to work correctly back in the day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so. I think our tests should catch this if there's an issue. But I can't promise that. The UnManaged
tag just makes Finalize
a no-op.
} | ||
|
||
rhoT_orig.Finalize(); | ||
rhoSie_orig.Finalize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I knew this once and forgot, but I'm realizing that this will likely require substantial modifications in the fortran interface to account for the new ways of initialization and finalization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to expose a serialization/deserialization interface on the fortran side, yes. That will need to be on the tasking for next FY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I'll start thinking about this when the work packages are more finalized.
Tests pass on re-git (thanks @rbberger for fixing the CI!) and integration tested in riot downstream (thanks @AstroBarker !) So I'm calling this good and pulling the trigger. |
PR Summary
This PR enables both serialization and shared memory for all equations of state. This is for use with MPI Windows shared memory on, e.g., Rocinante and other limited environments. Note there is still an outstanding issue with the EOSPAC backend, see #408 , which I will need to resolve in a future MR. I'm still thinking about how best to resolve that particular issue, and insight is welcome.
Another outstanding issue is that the CI does not currently test against EOSPAC with shared memory enabled. When possible, we should update the CI, and thread this through the spackage. @rbberger maybe you can handle that in a separate PR at some point? #409.
When shared memory is not enabled in EOSPAC, the machinery all still works. An EOSPAC EOS will report that it cannot utilize any shared memory, but the serialization and de-serialization routines will still work. The CI on Darwin will test this and I've also tested it by hand.
In the meantime, I plan to test the shared memory case by hand after I get a more up-to-date eospac environment set up through spack.
PR Checklist
make format
command after configuring withcmake
.If preparing for a new release, in addition please check the following:
when='@main'
dependencies are updated to the release version in the package.py