diff --git a/docs/user_guide/source/components/anatomy.rst b/docs/user_guide/source/components/anatomy.rst new file mode 100644 index 0000000000..0832b6dc3c --- /dev/null +++ b/docs/user_guide/source/components/anatomy.rst @@ -0,0 +1,136 @@ +.. _sec:basics_interface_components_anatomy: + +*************************** +Anatomy of an ADIOS Program +*************************** + +Anatomy of an ADIOS Output +-------------------------- + +.. code:: C++ + + ADIOS adios("config.xml", MPI_COMM_WORLD); + | + | IO io = adios.DeclareIO(...); + | | + | | Variable<...> var = io.DefineVariable<...>(...) + | | Attribute<...> attr = io.DefineAttribute<...>(...) + | | Engine e = io.Open("OutputFileName.bp", adios2::Mode::Write); + | | | + | | | e.BeginStep() + | | | | + | | | | e.Put(var, datapointer); + | | | | + | | | e.EndStep() + | | | + | | e.Close(); + | | + | |--> IO goes out of scope + | + |--> ADIOS goes out of scope or adios2_finalize() + + +The pseudo code above depicts the basic structure of performing output. The ``ADIOS`` object is necessary to hold all +other objects. It is initialized with an MPI communicator in a parallel program or without in a serial program. +Additionally, a config file (XML or YAML format) can be specified here to load runtime configuration. Only one ADIOS +object is needed throughout the entire application but you can create as many as you want (e.g. if you need to separate +IO objects using the same name in a program that reads similar input from an ensemble of multiple applications). + +The ``IO`` object is required to hold the variable and attribute definitions, and runtime options for a particular input +or output stream. The IO object has a name, which is used only to refer to runtime options in the configuration file. +One IO object can only be used in one output or input stream. The only exception where an IO object can be used twice is +one input stream plus one output stream where the output is reusing the variable definitions loaded during input. + +``Variable`` and ``Attribute`` definitions belong to one IO object, which means, they can only be used in one output. +You need to define new ones for other outputs. Just because a Variable is defined, it will not appear in the output +unless an associated Put() call provides the content. + +A stream is opened and closed once. The ``Engine`` object implements the data movement for the stream. It depends on the +runtime options of the IO object that what type of an engine is created in the Open() call. One output step is denoted +by a pair of BeginStep..EndStep block. + +An output step consist of variables and attributes. Variables are just definitions without content, so one must call a +Put() function to provide the application data pointer that contains the data content one wants to write out. Attributes +have their content in their definitions so there is no need for an extra call. + +Some rules: + +* Variables can be defined any time, before the corresponding Put() call +* Attributes can be defined any time before EndStep +* The following functions must be treated as Collective operations + + * ADIOS + * Open + * BeginStep + * EndStep + * Close + +.. note:: + + If there is only one output step, and we only want to write it to a file on disk, never stream it to other + application, then BeginStep and EndStep are not required but it does not make any difference if they are called. + +Anatomy of an ADIOS Input +------------------------- + +.. code:: C++ + + ADIOS adios("config.xml", MPI_COMM_WORLD); + | + | IO io = adios.DeclareIO(...); + | | + | | Engine e = io.Open("InputFileName.bp", adios2::Mode::Read); + | | | + | | | e.BeginStep() + | | | | + | | | | varlist = io.AvailableVariables(...) + | | | | Variable var = io.InquireVariable(...) + | | | | Attribute attr = io.InquireAttribute(...) + | | | | | + | | | | | e.Get(var, datapointer); + | | | | | + | | | | + | | | e.EndStep() + | | | + | | e.Close(); + | | + | |--> IO goes out of scope + | + |--> ADIOS goes out of scope or adios2_finalize() + +The difference between input and output is that while we have to define the variables and attributes for an output, we +have to retrieve the available variables in an input first as definitions (Variable and Attribute objects). + +If we know the particular variable (name and type) in the input stream, we can get the definition using +InquireVariable(). Generic tools that process any input must use other functions to retrieve the list of variable names +and their types first and then get the individual Variable objects. The same is true for Attributes. + +Anatomy of an ADIOS File-only Input +----------------------------------- + +Previously we explored how to read using the input mode `adios2::Mode::Read`. Nonetheless, ADIOS has another input mode +named `adios2::Mode::ReadRandomAccess`. `adios2::Mode::Read` mode allows data access only timestep by timestep using +`BeginStep/EndStep`, but generally it is more memory efficient as ADIOS is only required to load metadata for the +current timestep. `ReadRandomAccess` can only be used with file engines and involves loading all the file metadata at +once. So it can be more memory intensive than `adios2::Mode::Read` mode, but allows reading data from any timestep using +`SetStepSelection()`. If you use `adios2::Mode::ReadRandomAccess` mode, be sure to allocate enough memory to hold +multiple steps of the variable content. + +.. code:: C++ + + ADIOS adios("config.xml", MPI_COMM_WORLD); + | + | IO io = adios.DeclareIO(...); + | | + | | Engine e = io.Open("InputFileName.bp", adios2::Mode::ReadRandomAccess); + | | | + | | | Variable var = io.InquireVariable(...) + | | | | var.SetStepSelection() + | | | | e.Get(var, datapointer); + | | | | + | | | + | | e.Close(); + | | + | |--> IO goes out of scope + | + |--> ADIOS goes out of scope or adios2_finalize() diff --git a/docs/user_guide/source/components/components.rst b/docs/user_guide/source/components/components.rst index 97b4f0b277..791a149a0f 100644 --- a/docs/user_guide/source/components/components.rst +++ b/docs/user_guide/source/components/components.rst @@ -10,3 +10,4 @@ Interface Components .. include:: engine.rst .. include:: operator.rst .. include:: runtime.rst +.. include:: anatomy.rst diff --git a/docs/user_guide/source/components/variable.rst b/docs/user_guide/source/components/variable.rst index baf071defb..69adbc884d 100644 --- a/docs/user_guide/source/components/variable.rst +++ b/docs/user_guide/source/components/variable.rst @@ -6,15 +6,17 @@ An ``adios2::Variable`` is the link between a piece of data coming from an appli This component handles all application variables classified by data type and shape. Each ``IO`` holds a set of Variables, and each ``Variable`` is identified with a unique name. -They are created using the reference from ``IO::DefineVariable`` or retrieved using the pointer from ``IO::InquireVariable`` functions in :ref:`IO`. +They are created using the reference from ``IO::DefineVariable`` or retrieved using the pointer from +``IO::InquireVariable`` functions in :ref:`IO`. Data Types --------------------- +---------- Only primitive types are supported in ADIOS2. -Fixed-width types from ` and `_ should be preferred when writing portable code. -ADIOS2 maps primitive types to equivalent fixed-width types (e.g. ``int`` -> ``int32_t``). -In C++, acceptable types ``T`` in ``Variable`` along with their preferred fix-width equivalent in 64-bit platforms are given below: +Fixed-width types from ` and `_ should be +preferred when writing portable code. ADIOS2 maps primitive types to equivalent fixed-width types +(e.g. ``int`` -> ``int32_t``). In C++, acceptable types ``T`` in ``Variable`` along with their preferred fix-width +equivalent in 64-bit platforms are given below: .. code-block:: c++ @@ -52,19 +54,19 @@ In C++, acceptable types ``T`` in ``Variable`` along with their preferred fix Python APIs: use the equivalent fixed-width types from numpy. If ``dtype`` is not specified, ADIOS2 handles numpy defaults just fine as long as primitive types are passed. - Shapes ---------------------- +------ ADIOS2 is designed for MPI applications. Thus different application data shapes must be supported depending on their scope within a particular MPI communicator. -The shape is defined at creation from the ``IO`` object by providing the dimensions: shape, start, count in the ``IO::DefineVariable``. -The supported shapes are described below. +The shape is defined at creation from the ``IO`` object by providing the dimensions: shape, start, count in the +``IO::DefineVariable``. The supported shapes are described below. 1. **Global Single Value**: Only a name is required for their definition. -These variables are helpful for storing global information, preferably managed by only one MPI process, that may or may not change over steps: *e.g.* total number of particles, collective norm, number of nodes/cells, etc. +These variables are helpful for storing global information, preferably managed by only one MPI process, that may or may +not change over steps: *e.g.* total number of particles, collective norm, number of nodes/cells, etc. .. code-block:: c++ @@ -157,8 +159,80 @@ be applicable to it. JoinedArrays are currently only supported by BP4 and BP5 engines, as well as the SST engine with BP5 marshalling. - +Global Array Capabilities and Limitations +----------------------------------------- + +ADIOS2 is focusing on writing and reading N-dimensional, distributed, global arrays of primitive types. The basic idea +is that, usually, a simulation has such a data structure in memory (distributed across multiple processes) and wants to +dump its content regularly as it progresses. ADIOS2 was designed to: + +1. to do this writing and reading as fast as possible +2. to enable reading any subsection of the array + +.. image:: https://imgur.com/6nX67yq.png + :width: 400 + +The figure above shows a parallel application of 12 processes producing a 2D array. Each process has a 2D array locally +and the output is created by placing them into a 4x3 pattern. A reading application's individual process then can read +any subsection of the entire global array. In the figure, a 6 process application decomposes the array in a 3x2 pattern +and each process reads a 2D array whose content comes from multiple producer processes. + +The figure hopefully helps to understand the basic concept but it can be also misleading if it suggests limitations that +are not there. Global Array is simply a boundary in N-dimensional space where processes can place their blocks of data. +In the global space: + +1. one process can place multiple blocks + + .. image:: https://imgur.com/Pb1s03h.png + :width: 400 + +2. does NOT need to be fully covered by the blocks + + .. image:: https://imgur.com/qJBXYcQ.png + :width: 400 + + * at reading, unfilled positions will not change the allocated memory + +3. blocks can overlap + + .. image:: https://imgur.com/GA59lZ2.png + :width: 300 + * the reader will get values in an overlapping position from one of the block but there is no control over from which + block +4. each process can put a different size of block, or put multiple blocks of different sizes + +5. some process may not contribute anything to the global array + +Over multiple output steps + +1. the processes CAN change the size (and number) of blocks in the array + + * E.g. atom table: global size is fixed but atoms wander around processes, so their block size is changing + + .. image:: https://imgur.com/DorjG2q.png + :width: 400 + +2. the global dimensions CAN change over output steps + + * but then you cannot read multiple steps at once + * E.g. particle table size changes due to particles disappearing or appearing + + .. image:: https://imgur.com/nkuHeVX.png + :width: 400 + + +Limitations of the ADIOS global array concept + +1. Indexing starts from 0 +2. Cyclic data patterns are not supported; only blocks can be written or read +3. If Some blocks may fully or partially fall outside of the global boundary, the reader will not be able to read those + parts + +.. note:: + Technically, the content of the individual blocks is kept in the BP format (but not in HDF5 format) and in staging. + If you really, really want to retrieve all the blocks, you need to handle this array as a Local Array and read the + blocks one by one.