Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable serialization and shared memory #410

Merged
merged 56 commits into from
Sep 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
53bc81a
temporarily switch to spiner serialization branch
jonahm-LANL Aug 14, 2024
bc4b50a
It works for analytic EOSs and modifiers and the variant
jonahm-LANL Aug 15, 2024
3e9e9f7
Merge branch 'jmm/cxx17' into jmm/serialization
jonahm-LANL Aug 15, 2024
073dbe5
add CheckParams as official part of API
jonahm-LANL Aug 20, 2024
c9ac614
CheckParams docs
jonahm-LANL Aug 20, 2024
b11a8de
got a little paranoid about CRTP
jonahm-LANL Aug 20, 2024
8424317
serialization routines implemented in stellar collapse, with some nic…
jonahm-LANL Aug 20, 2024
f84ed1d
stellar collapse EOS has serialization now
jonahm-LANL Aug 20, 2024
36d9211
SpinerEOSDependsRhoT check params
jonahm-LANL Aug 20, 2024
e6289d8
add GetDataBoxPointers_ to SpinerEOSDependsRhoT
jonahm-LANL Aug 20, 2024
44822df
copy paste feels bad
jonahm-LANL Aug 20, 2024
d438ce2
spiner EOS compiles now
jonahm-LANL Aug 20, 2024
f7bb9fc
cleaned up base class and modifiers a little bit
jonahm-LANL Aug 21, 2024
8dab338
table bounds -> table utils. Jeff was right.
jonahm-LANL Aug 21, 2024
2abb8b8
thread SpinerTricks through a few models
jonahm-LANL Aug 21, 2024
d8adfa3
add test for spiner tricks
jonahm-LANL Aug 21, 2024
3699b42
Add some sanity checks to base class. Fix bug in base class. Add test…
jonahm-LANL Aug 21, 2024
1505a33
add helmholtz tests
jonahm-LANL Aug 21, 2024
0f0c5bb
eospac serialization. Still need to test. Time for Darwin.
jonahm-LANL Aug 22, 2024
93f302c
Redesign with shared mem settings in plac
jonahm-LANL Aug 22, 2024
1fbf448
completed testing shared memory routines for stellar collapse
jonahm-LANL Aug 22, 2024
0e302d8
change spinereosdependsrhosie to work like dependsrhot
jonahm-LANL Aug 23, 2024
528294c
add test for tabulated EOS... now to make sure it works
jonahm-LANL Aug 23, 2024
2774533
the code finally compiles on darwin
jonahm-LANL Aug 23, 2024
dab9759
formatting
jonahm-LANL Aug 23, 2024
ba09841
tweaks
jonahm-LANL Aug 23, 2024
8e1be78
Merge branch 'jmm/serialization' of github.com:lanl/singularity-eos i…
jonahm-LANL Aug 23, 2024
2159368
added hidden static size
jonahm-LANL Aug 23, 2024
bfc8ed5
trying to debug eospac
jonahm-LANL Aug 23, 2024
d89aec3
Give up and expose shared memory and dynamic memory as different conc…
jonahm-LANL Aug 24, 2024
50dc6f3
document
jonahm-LANL Aug 24, 2024
2942ac1
tests pass with eospac
jonahm-LANL Aug 24, 2024
1d93b72
Merge branch 'jmm/serialization' of github.com:lanl/singularity-eos i…
jonahm-LANL Aug 24, 2024
a73245d
formatting
jonahm-LANL Aug 24, 2024
263bea6
CC, CHANGELOG, and some minor typos
jonahm-LANL Aug 24, 2024
7e5cf15
Merge branch 'main' into jmm/serialization
Yurlungur Aug 24, 2024
9d2b76c
Add BulkSerializer
jonahm-LANL Aug 26, 2024
d8f92c9
typos Brandon caught
jonahm-LANL Aug 26, 2024
ade9791
Merge branch 'jmm/serialization' of github.com:lanl/singularity-eos i…
jonahm-LANL Aug 26, 2024
7856aa5
typo
jonahm-LANL Aug 27, 2024
9f24343
Update doc/sphinx/src/using-eos.rst
Yurlungur Aug 27, 2024
1214961
jhp suggested doc changes
jonahm-LANL Aug 27, 2024
da45bb2
Merge branch 'jmm/serialization' of github.com:lanl/singularity-eos i…
jonahm-LANL Aug 27, 2024
f190325
jhp comments round 2
jonahm-LANL Aug 27, 2024
44650e6
Update singularity-eos/eos/eos_gruneisen.hpp
Yurlungur Aug 27, 2024
85d5119
Merge branch 'jmm/serialization' of github.com:lanl/singularity-eos i…
jonahm-LANL Aug 27, 2024
b16e9f5
formatting
jonahm-LANL Aug 27, 2024
ee7b62e
extended MPI example a bit
jonahm-LANL Aug 27, 2024
b7cd6cd
update docs with MPI info
jonahm-LANL Aug 27, 2024
a0cc47a
oops scoping
jonahm-LANL Aug 27, 2024
5d7cb38
actually we can't use views they don't have built in iterators... onl…
jonahm-LANL Aug 27, 2024
8375015
Default Resizer is MemberResizer.
jonahm-LANL Aug 27, 2024
b403596
doc typo thanks Brandon for the catch
jonahm-LANL Aug 29, 2024
00a191b
dholladay comments
jonahm-LANL Sep 3, 2024
a8c43bd
bug in deserialize based on when SharedMemorySizeInBytes is available
jonahm-LANL Sep 5, 2024
6e3ca63
Merge branch 'main' into jmm/serialization
jonahm-LANL Sep 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## Current develop

### Added (new features/APIs/variables/...)
- [[PR410]](https://github.com/lanl/singularity-eos/pull/410) Enable serialization and de-serialization
- [[PR330]](https://github.com/lanl/singularity-eos/pull/330) Piecewise grids for Spiner EOS.

### Fixed (Repair bugs, etc)
Expand Down
6 changes: 6 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,9 @@ cmake_dependent_option(
option(SINGULARITY_USE_FORTRAN "Enable fortran bindings" ON)
option(SINGULARITY_USE_KOKKOS "Use Kokkos for portability" OFF)
option(SINGULARITY_USE_EOSPAC "Enable eospac backend" OFF)
option(SINGULARITY_EOSPAC_ENABLE_SHMEM
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know SHMEM is something else but hopefully next to EOSPAC it's clear what this means

"Support shared memory with EOSPAC backend. Requires EOSPAC version 6.5.7."
OFF)

# TODO This should be dependent option (or fortran option)
option(SINGULARITY_BUILD_CLOSURE "Mixed Cell Closure" ON)
Expand Down Expand Up @@ -271,6 +274,9 @@ endif()
if(SINGULARITY_USE_SPINER_WITH_HDF5)
target_compile_definitions(singularity-eos_Interface INTERFACE SINGULARITY_USE_SPINER_WITH_HDF5)
endif()
if (SINGULARITY_USE_EOSPAC AND SINGULARITY_EOSPAC_ENABLE_SHMEM)
target_compile_definitions(singularity-eos_Interface INTERFACE SINGULARITY_EOSPAC_ENABLE_SHARED_MEMORY)
endif()

# ------------------------------------------------------------------------------#
# Handle dependencies
Expand Down
1 change: 1 addition & 0 deletions doc/sphinx/src/building.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ The main CMake options to configure building are in the following table:
``SINGULARITY_USE_FORTRAN`` ON Enable Fortran API for equation of state.
``SINGULARITY_USE_KOKKOS`` OFF Uses Kokkos as the portability backend. Currently only Kokkos is supported for GPUs.
``SINGULARITY_USE_EOSPAC`` OFF Link against EOSPAC. Needed for sesame2spiner and some tests.
``SINGULARITY_EOSPAC_ENABLE_SHMEM`` OFF Enable shared memory support in EOSPAC backend.
``SINGULARITY_BUILD_CLOSURE`` OFF Build the mixed cell closure models
``SINGULARITY_BUILD_TESTS`` OFF Build test infrastructure.
``SINGULARITY_BUILD_PYTHON`` OFF Build Python bindings.
Expand Down
294 changes: 286 additions & 8 deletions doc/sphinx/src/using-eos.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,14 @@ The parallelism model
----------------------

For the most part, ``singularity-eos`` tries to be agnostic to how you
parallelize your code on-node. (It knows nothing at all about
distributed memory parallelism.) An ``EOS`` object can be copied into
any parallel code block by value (see below) and scalar calls do not
attempt any internal multi-threading, meaning ``EOS`` objects are not
thread-safe, but are compatible with thread safety, assuming the user
calls them appropriately. The main complication is ``lambda`` arrays,
which are discussed below.
parallelize your code on-node. It knows nothing at all about
distributed memory parallelism, with one exception, discussed
below. An ``EOS`` object can be copied into any parallel code block by
value (see below) and scalar calls do not attempt any internal
multi-threading, meaning ``EOS`` objects are not thread-safe, but are
compatible with thread safety, assuming the user calls them
appropriately. The main complication is ``lambda`` arrays, which are
discussed below.

The vector ``EOS`` method overloads are a bit different. These are
thread-parallel operations launched by ``singularity-EOS``. They run
Expand All @@ -39,6 +40,271 @@ A more generic version of the vector calls exists in the ``Evaluate``
method, which allows the user to specify arbitrary parallel dispatch
models by writing their own loops. See the relevant section below.

Serialization and shared memory
--------------------------------

While ``singularity-eos`` makes a best effort to be agnostic to
parallelism, it exposes several methods that are useful in a
distributed memory environment. In particular, there are two use-cases
the library seeks to support:

#. To avoid stressing a filesystem, it may desirable to load a table from one thread (e.g., MPI rank) and broadcast this data to all other ranks.
#. To save memory it may be desirable to place tabulated data, which is read-only after it has been loaded from file, into shared memory on a given node, even if all other data is thread local in a distributed-memory environment. This is possible via, e.g., `MPI Windows`_.

Therefore ``singularity-eos`` exposes several methods that can be used
in this context. The function

.. cpp:function:: std::size_t EOS::SerializedSizeInBytes() const;

returns the amount of memory required in bytes to hold a serialized
EOS object. The return value will depend on the underlying equation of
state model currently contained in the object. The function

.. cpp:function:: std::size_t EOS::SharedMemorySizeInBytes() const;

returns the amount of data (in bytes) that a given object can place into shared memory. Again, the return value depends on the model the object currently represents.

.. note::

Many models may not be able to utilize shared memory at all. This
holds for most analytic models, for example. The ``EOSPAC`` backend
will only utilize shared memory if the ``EOSPAC`` version is sufficiently recent
to support it and if ``singularity-eos`` is built with serialization
support for ``EOSPAC`` (enabled with
``-DSINGULARITY_EOSPAC_ENABLE_SHMEM=ON``).

The function

.. cpp:function:: std::size_t EOS::Serialize(char *dst);

fills the ``dst`` pointer with the memory required for serialization
and returns the number of bytes written to ``dst``. The function

.. cpp:function:: std::pair<std::size_t, char*> EOS::Serialize();

allocates a ``char*`` pointer to contain serialized data and fills
it.

.. warning::

Serialization and de-serialization may only be performed on objects
that live in host memory, before you have called
``eos.GetOnDevice()``. Attempting to serialize device-initialized
objects is undefined behavior, but will likely result in a
segmentation fault.

The pair is the pointer and its size. The function

.. code-block:: cpp

std::size_t EOS::DeSerialize(char *src,
const SharedMemSettings &stngs = DEFAULT_SHMEM_STNGS)

Sets an EOS object based on the serialized representation contained in
``src``. It returns the number of bytes read from ``src``. Optionally,
``DeSerialize`` may also write the data that can be shared to a
pointer contained in ``SharedMemSettings``. If you do this, you must
pass this pointer in, but designate only one thread per shared memory
domain (frequently a node or socket) to actually write to this
data. ``SharedMemSettings`` is a struct containing a ``data`` pointer
and a ``is_domain_root`` boolean:

.. code-block:: cpp

struct SharedMemSettings {
SharedMemSettings();
SharedMemSettings(char *data_, bool is_domain_root_)
: data(data_), is_domain_root(is_domain_root_) {}
char *data = nullptr; // defaults
bool is_domain_root = false;
};

The ``data`` pointer should point to a shared memory allocation. The
``is_domain_root`` boolean should be true for exactly one thread per
shared memory domain.

For example you might call ``DeSerialize`` as

.. code-block:: cpp

std::size_t read_size = eos.DeSerialize(packed_data,
singularity::SharedMemSettings(shared_data,
my_rank % NTHREADS == 0));
assert(read_size == write_size); // for safety

.. warning::

Note that for equation of state models that have dynamically
allocated memory, ``singularity-eos`` reserves the right to point
directly at data in ``src``, so it **cannot** be freed until you
would call ``eos.Finalize()``. If the ``SharedMemSettings`` are
utilized to request data be written to a shared memory pointer,
however, you can free the ``src`` pointer, so long as you don't free
the shared memory pointer.

Putting everything together, a full sequence with MPI might look like this:

.. code-block:: cpp

singularity::EOS eos;
std::size_t packed_size, shared_size;
char *packed_data;
if (rank == 0) { // load eos object
eos = singularity::StellarCollapse(filename);
packed_size = eos.SerializedSizeInBytes();
shared_size = eos.SharedMemorySizeInBytes();
}

// Send sizes
MPI_Bcast(&packed_size, 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
MPI_Bcast(&spacked_size, 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);

// Allocate data needed
packed_data = (char*)malloc(packed_size);
if (rank == 0) {
eos.Serialize(packed_data);
eos.Finalize(); // Clean up this EOS object so it can be reused.
}
MPI_Bcast(packed_data, packed_size, MPI_BYTE, 0, MPI_COMM_WORLD);

// the default doesn't do shared memory.
// we will change it below if shared memory is enabled.
singularity::SharedMemSettings settings = singularity::DEFAULT_SHMEM_STNGS;

char *shared_data;
char *mpi_base_pointer;
int mpi_unit;
MPI_Aint query_size;
MPI_Win window;
MPI_Comm shared_memory_comm;
int island_rank, island_size; // rank in, size of shared memory region
if (use_mpi_shared_memory) {
// Generate shared memory comms
MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0, MPI_INFO_NULL, &shared_memory_comm);
// rank on a region that shares memory
MPI_Comm_rank(shared_memory_comm, &island_rank);
// size on a region that shares memory
MPI_COMM_size(shared_memory_comm, &island_size);

// Create the MPI shared memory object and get a pointer to shared data
// this allocation is a collective and must be called on every rank.
// the total size of the allocation is the sum over ranks in the shared memory comm
// of requested memory. So it's valid to request all you want on rank 0 and nothing
// on the remaining ranks.
MPI_Win_allocate_shared((island_rank == 0) ? shared_size : 0,
1, MPI_INFO_NULL, shared_memory_comm, &mpi_base_pointer,
&window);
// This gets a pointer to the shared memory allocation valid in local address space
// on every rank
MPI_Win_shared_query(window, MPI_PROC_NULL, &query_size, &mpi_unit, &shared_data);
// Mutex for MPI window. Writing to shared memory currently allowed.
MPI_Win_lock_all(MPI_MODE_NOCHECK, window);
// Set SharedMemSettings
settings.data = shared_data;
settings.is_domain_root = (island_rank == 0);
}
eos.DeSerialize(packed_data, settings);
if (use_mpi_shared_memory) {
MPI_Win_unlock_all(window); // Writing to shared memory disabled.
MPI_Barrier(shared_memory_comm);
free(packed_data);
}

In the case where many EOS objects may be active at once, you can
combine serialization and comm steps. You may wish to, for example,
have a single pointer containing all serialized EOS's. Same for the
shared memory. ``singularity-eos`` provides machinery to do this in
the ``singularity-eos/base/serialization_utils.hpp`` header. This
provides a helper struct, ``BulkSerializer``:

.. code-block:: cpp

template<typename Container_t, Resizer_t = MemberResizer>
singularity::BulkSerializer

which may be initialized by a collection of ``EOS`` objects or by
simply assigning (or constructing) its member field, ``eos_objects``
appropriately. An example ``Container_t`` might be
``std::vector<EOS>``. A specialization for ``vector`` is provided as
``VectorSerializer``. The ``Resizer_t`` is a functor that knows how to
resize a collection. For example, the ``MemberResizor`` functor used
for ``std::vector``s

.. code-block:: cpp

struct MemberResizer {
template<typename Collection_t>
void operator()(Collection_t &collection, std::size_t count) {
collection.resize(count);
}
};

which will work for any ``stl`` container with a ``resize`` method.

The ``BulkSerializer`` provides all the above-described serialization
functions for ``EOS`` objects: ``SerializedSizeInBytes``,
``SharedMemorySizeInBytes``, ``Serialize``, and ``DeSerialize``, but
it operates on all ``EOS`` objects contained in the container it
wraps, not just one. Example usage might look like this:

.. code-block:: cpp

int packed_size, shared_size;
singularity::VectorSerializer<EOS> serializer;
if (rank == 0) { // load eos object
// Code to initialize a bunch of EOS objects into a std::vector<EOS>
/*
Initialization code goes here
*/
serializer = singularity::VectorSerializer<EOS>(eos_vec);
packed_size = serializer.SerializedSizeInBytes();
shared_size = serializer.SharedMemorySizeInBytes();
}

// Send sizes
MPI_Bcast(&packed_size, 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
MPI_Bcast(&packed_size, 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);

// Allocate data needed
packed_data = (char*)malloc(packed_size);
if (rank == 0) {
serializer.Serialize(packed_data);
serializer.Finalize(); // Clean up all EOSs owned by the serializer
}
MPI_Bcast(packed_data, packed_size, MPI_BYTE, 0, MPI_COMM_WORLD);

singularity::SharedMemSettings settings = singularity::DEFAULT_SHMEM_STNGS;
// same MPI declarations as above
if (use_mpi_shared_memory) {
// same MPI code as above including setting the settings
settings.data = shared_data;
settings.is_domain_root = (island_rank == 0);
}
singularity::VectorSerializer<EOS> deserializer;
deserializer.DeSerialize(packed_data, settings);
if (use_mpi_shared_memory) {
// same MPI code as above
}
// extract each individual EOS and do something with it
std::vector<EOS> eos_host_vec = deserializer.eos_objects;
// get on device if you want
for (auto EOS : eos_host_vec) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a can of worms at some point when we want to have 1 copy per GPU rather than 1 copy per node. Some host codes may want to hand us GPU memory so we may need to have a facility to handle that case at some point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this will currently produce is 1 copy per MPI rank on device and 1 copy per node on host, because GetOnDevice will do its own allocation. The case where we want device shared memory is one I'm aware of but haven't thought deeply about. My suggestion is we punt on this but agree there's an engineering problem to solve here.

EOS eos_device = eos.GetOnDevice();
// ...
}

It is also possible to (with care) mix serializers... i.e., you might
serialize with a ``VectorSerializer`` and de-serialize with a
different container, as all that is required is that a container have
a ``size``, provide iterators, and be capable of being resized.

.. warning::

Since EOSPAC is a library, DeSerialization is destructive for EOSPAC
and may have side-effects.

.. _`MPI Windows`: https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report/node311.htm

.. _variant section:

Variants
Expand Down Expand Up @@ -444,7 +710,7 @@ unmodified EOS model, call
.. cpp:function:: auto GetUnmodifiedObject();

The return value here will be either the type of the ``EOS`` variant
type or the unmodified model (for example ``IdealGas``) or, depending
type or the unmodified model (for example ``IdealGas``), depending
on whether this method was callled within a variant or on a standalone
model outside a variant.

Expand Down Expand Up @@ -552,6 +818,18 @@ might look something like this:

.. _eos methods reference section:

CheckParams
------------

You may check whether or not an equation of state object is
constructed self-consistently and ready for use by calling

.. cpp:function:: void CheckParams() const;

which raise an error and/or print an equation of state specific error
message if something has gone wrong. Most EOS constructors and ways of
building an EOS call ``CheckParams`` by default.

Yurlungur marked this conversation as resolved.
Show resolved Hide resolved
Equation of State Methods Reference
------------------------------------

Expand Down
2 changes: 1 addition & 1 deletion sesame2spiner/io_eospac.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@

#include <ports-of-call/portability.hpp>
#include <singularity-eos/base/fast-math/logs.hpp>
#include <singularity-eos/base/spiner_table_bounds.hpp>
#include <singularity-eos/base/spiner_table_utils.hpp>
#include <spiner/databox.hpp>

#include <eospac-wrapper/eospac_wrapper.hpp>
Expand Down
3 changes: 2 additions & 1 deletion singularity-eos/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ register_headers(
base/fast-math/logs.hpp
base/robust_utils.hpp
base/root-finding-1d/root_finding.hpp
base/spiner_table_bounds.hpp
base/serialization_utils.hpp
base/spiner_table_utils.hpp
base/variadic_utils.hpp
base/math_utils.hpp
base/constants.hpp
Expand Down
12 changes: 11 additions & 1 deletion singularity-eos/base/constants.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,21 @@ constexpr unsigned long all_values = (1 << 7) - 1;
} // namespace thermalqs

constexpr size_t MAX_NUM_LAMBDAS = 3;
enum class DataStatus { Deallocated = 0, OnDevice = 1, OnHost = 2 };
enum class DataStatus { Deallocated = 0, OnDevice = 1, OnHost = 2, UnManaged = 3 };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have sufficient coverage in our testing to make sure we are catching the potential issues? I recall this took up many of @Yurlungur's cycles to get this to work correctly back in the day.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so. I think our tests should catch this if there's an issue. But I can't promise that. The UnManaged tag just makes Finalize a no-op.

enum class TableStatus { OnTable = 0, OffBottom = 1, OffTop = 2 };
constexpr Real ROOM_TEMPERATURE = 293; // K
constexpr Real ATMOSPHERIC_PRESSURE = 1e6;

struct SharedMemSettings {
SharedMemSettings() = default;
SharedMemSettings(char *data_, bool is_domain_root_)
: data(data_), is_domain_root(is_domain_root_) {}
bool CopyNeeded() const { return (data != nullptr) && is_domain_root; }
char *data = nullptr;
bool is_domain_root = false; // default true or false?
};
const SharedMemSettings DEFAULT_SHMEM_STNGS = SharedMemSettings();

} // namespace singularity

#endif // SINGULARITY_EOS_BASE_CONSTANTS_HPP_
Loading
Loading