Sphinx Doc: overtake Sergei's and Jan's suggestions

alpaka-group · Jul 29, 2020 · 11ef7ff · 11ef7ff
1 parent b0b17f7
commit 11ef7ff
Show file tree

Hide file tree

Showing 7 changed files with 81 additions and 72 deletions.
diff --git a/docs/source/advanced/rationale.rst b/docs/source/advanced/rationale.rst
@@ -272,7 +272,7 @@ The constant memory is a fast, cached, read-only memory that is beneficial when
 In this case it is as fast as a read from a register.
 
 
-Access to Accelerator Dependent Functionality
+Access to Accelerator-Dependent Functionality
 +++++++++++++++++++++++++++++++++++++++++++++
 
 There are two possible ways to implement access to accelerator dependent functionality inside a kernel:

diff --git a/docs/source/basic/abstraction.rst b/docs/source/basic/abstraction.rst
@@ -5,11 +5,11 @@ Abstraction
 
    Objective of the abstraction is to separate the parallelization strategy from the algorithm itself.
    Algorithm code written by users should not depend on any parallelization library or specific strategy.
-   This would allow to exchange the parallelization back-end without any changes to the algorithm itself.
+   This would enable exchanging the parallelization back-end without any changes to the algorithm itself.
    Besides allowing to test different parallelization strategies this also makes it possible to port algorithms to new, yet unsupported, platforms.
 
 Parallelism and memory hierarchies at all levels need to be exploited in order to achieve performance portability across various types of accelerators.
-Within this chapter an abstraction will be derivated that tries to provide a maximum of parallelism while simultaneously considering implementability and applicability in hardware.
+Within this chapter an abstraction will be derive that tries to provide a maximum of parallelism while simultaneously considering implementability and applicability in hardware.
 
 Looking at the current HPC hardware landscape, we often see nodes with multiple sockets/processors extended by accelerators like GPUs or Intel Xeon Phi, each with their own processing units.
 Within a CPU or a Intel Xeon Phi there are cores with hyper-threads, vector units and a large caching infrastructure.

diff --git a/docs/source/basic/intro.rst b/docs/source/basic/intro.rst
@@ -3,7 +3,7 @@ Introduction
 
 The *alpaka* library defines and implements an abstract interface for the *hierarchical redundant parallelism* model.
 This model exploits task- and data-parallelism as well as memory hierarchies at all levels of current multi-core architectures.
-This allows to achieve portability of performant codes across various types of accelerators by ignoring specific unsupported levels and utilizing only the ones supported on a specific accelerator.
+This allows to achieve performance portability across various types of accelerators by ignoring specific unsupported levels and utilizing only the ones supported on a specific accelerator.
 All hardware types (multi- and many-core CPUs, GPUs and other accelerators) are treated and can be programmed in the same way.
 The *alpaka* library provides back-ends for *CUDA*, *OpenMP*, *Boost.Fiber* and other methods.
 The policy-based C++ template interface provided allows for straightforward user-defined extension of the library to support other accelerators.
@@ -38,26 +38,26 @@ If you do not install alpaka in a default path such as ``/usr/local/`` you have
 
 The cmake configuration decides which alpaka accelerators are available during compiling. For example, if you configure your ``cmake`` build with the CUDA back-end (``-DALPAKA_ACC_GPU_CUDA_ENABLE=ON``), ``cmake`` checks, if the CUDA SDK is available and if it found, the C++ template ``alpaka::acc::AccGpuCudaRt`` is available during compiling.
 
-What is alpaka
---------------
+About alpaka
+------------
 
 alpaka is ...
 ~~~~~~~~~~~~~
 
-An Abstract Interface
-   It describing parallel execution on multiple hierarchy levels. It allows to implement a mapping to various hardware architectures but is no optimal mapping itself.
+Abstract
+   It describes parallel execution on multiple hierarchy levels. It allows to implement a mapping to various hardware architectures but is no optimal mapping itself.
 
-Sustainably
-   *alpaka* decouple the application from the availability of different accelerator frameworks in different versions, such as OpenMP, CUDA, HIP, etc. (50% on the way to reach full performance portability).
+Sustainable
+   *alpaka* decouples the application from the availability of different accelerator frameworks in different versions, such as OpenMP, CUDA, HIP, etc. (50% on the way to reach full performance portability).
 
-Heterogeneity
+Heterogeneous
    An identical algorithm / kernel can be executed on heterogeneous parallel systems by selecting the target device. This allows the best performance for each algorithm and/or a good utilization of the system without major code changes.
 
-Maintainability
+Maintainable
    *alpaka* allows to provide a single version of the algorithm / kernel that can be used by all back-ends. There is no need for "copy and paste" kernels with different API calls for different accelerators. All the accelerator dependent implementation details are hidden within the *alpaka* library.
 
-Testability
-   Due to the easy back-end switch, no special hardware is required for testing the kernels. Even if the simulation itself will always use the *CUDA* back-end, the tests can completely run on a CPU. As long as the *alpaka* library is thoroughly tested for compatibility between the acceleration back-ends, the user simulation code is guaranteed to generate identical results (ignoring rounding errors / non-determinism) and is portable without any changes.
+Testable
+   Due to the easy back-end switch, no special hardware is required for testing the kernels. Even if the simulation itself always uses the *CUDA* back-end, the tests can completely run on a CPU. As long as the *alpaka* library is thoroughly tested for compatibility between the acceleration back-ends, the user simulation code is guaranteed to generate identical results (ignoring rounding errors / non-determinism) and is portable without any changes.
 
 Optimizable
    Everything in *alpaka* can be replaced by user code to optimize for special use-cases.
@@ -68,19 +68,19 @@ Extensible
 Data Structure Agnostic
    The user can use and define arbitrary data structures.
 
-alpaka is not ...
-~~~~~~~~~~~~~~~~~
+alpaka does not ...
+~~~~~~~~~~~~~~~~~~~
 
-An automatically optimal mapping of algorithms / kernels to various acceleration platforms
-   Except in trivial examples an optimal execution always depends on suitable selected data structure. An adaptive selection of data structures is a separate topic that has to be implemented in a distinct library.
+Automatically provide an optimal mapping of kernels to various acceleration platforms
+   Except in trivial examples an optimal execution always depends on suitable selected data structures. An adaptive selection of data structures is a separate topic that has to be implemented in a distinct library.
 
-Automatically optimizing concurrent data accesses
+Automatically optimize concurrent data access
    *alpaka* does not provide feature to create optimized memory layouts.
 
-Handling or hiding differences in arithmetic operations
+Handle differences in arithmetic operations
    For example, due to **different rounding** or different implementations of floating point operations, results can differ slightly between accelerators.
 
-Guaranteeing any determinism of results
+Guarantee determinism of results
    Due to the freedom of the library to reorder or repartition the threads within the tasks it is not possible or even desired to preserve deterministic results. For example, the non-associativity of floating point operations give non-deterministic results within and across accelerators.
 
 The *alpaka* library is aimed at parallelization on shared memory, i.e. within nodes of a cluster.