forked from Sandia-OpenSHMEM/SOS
-
Notifications
You must be signed in to change notification settings - Fork 0
/
NEWS
438 lines (407 loc) · 21.9 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
-*- text -*-
Sandia OpenSHMEM NEWS -- history of user-visible changes.
v1.5.3rc1
---------
- Added several enhancements to better support SOS as a backend for Intel® SHMEM.
- Added extension support for GPU RDMA and external heap creation.
- Added support for multi-NIC configurations via libfabric. The feature is
enabled by default. It can be disabled with the environment variable,
SHMEM_OFI_DISABLE_MULTIRAIL=1.
- Added initial support for multi-NIC topology optimizations via hwloc.
Detection of hwloc is enabled by default. It can be disabled with the
configuration flag, --without-hwloc.
- Moved the "tests-sos" package of unit tests and performance benchmarks to a
new Git submodule hosted at https://github.com/openshmem-org/tests-sos.
- Added shmemx_ibput and shmemx_ibget as extension APIs.
- Added shmemx_signal_add and shmemx_signal_set as extension APIs.
- Added several configuration flags to optimize for the CXI libfabric provider:
--enable-ofi-inject and --enable-nonfetch-amo, which are enabled by default.
- Manpage generation is now disabled by default to shorten build times. It can
be re-enabled during configuration with the --enable-manpages flag.
- Included multiple bugfixes, including in teams configuration,
remote-virtual-addressing checks, buffer argument overlap checks, and more.
v1.5.2
------
- This full release includes the changes listed below for v1.5.2rc1 and the
following additional changes.
- Added configuration support for building Sandia OpenSHMEM with oneAPI
compilers.
- Resolved a performance issue with fetching atomic operations when using the
CXI libfabric provider.
- In the OFI transport, added the FI_RECV capability to contexts (for driving
progress) resolving an issue with the Omni-Path Express (opx) provider.
v1.5.2rc1
---------
- Added support for the CXI libfabric provider, enabling SOS on the Slingshot
interconnect. Setup instructions are on SOS's Github wiki.
- Added support for shmem_team_ptr routine.
- Added support for negatives strides in the OpenSHMEM teams APIs.
- Added checks for incorrect buffer overlap when error-checking is enabled.
- Added a configure option to enable deprecated tests,
--enable-deprecated-tests, which is disabled by default.
- Added a configure option to enable libfabric manual progress,
--enable-ofi-manual-progress, which is disabled by default.
- Added experimental support for a hint to shmem_malloc_with_hints,
SHMEMX_MALLOC_NO_BARRIER, which removes the barrier at exit from the routine.
Users must synchronize appropriately after such an operation.
- Initialized the OFI tx/rx capabilities appropriately to limit what is enabled
by providers. This resolves an issue with the Omni-Path Express (opx) provider.
- Patched the symmetric data segment initialization to better support MacOS.
v1.5.1
------
- Added a Dockerfile so users can create containers that provide a configurable
Ubuntu sandbox for running SOS applications and benchmarks.
v1.5.1rc1
---------
- Added deprecation warnings for shmem_barrier, active-set reductions, and
other deprecated routines, such as the 32/64-bit collectives.
- Removed usage of deprecated APIs in shmem_perf_suite, apps, and spec-examples
- Added missing types for the OpenSHMEM v1.5 reductions (e.g., fixed-width
integers, ptrdiff_t, size_t, and schar).
- Fixed an incorrect function signature for shmem_team_get_config().
- Resolved some critical issues with the UCX transport.
- Updated upstream configuration scripts from OpenMPI for UCX, PMI, etc.
- Fixed issues with the OpenSHMEM teams API (e.g. team-based broadcasts now
update the destination object on all PEs including the root).
- Corrected the return value for shmem_test_all and shmem_test_all_vector.
- Added support for shmem_signal_wait_until.
- Resolved a critical build issue on Mac OSX.
- Fixed several issues in the SOS unit tests and updated them to use the
OpenSHMEM v1.5 APIs.
- Migrated SOS continuous integration from Travis CI to Github
Actions and Workflows.
v1.5.0
------
- This full release includes the changes listed below for v1.5.0rc1 and the
following additional changes.
- Added a multiplier for scaling the number of trial iterations in performance
test suite benchmarks, configurable by environment variable,
SHMEM_PERF_SUITE_TRIALS_MULTIPLIER.
- Added {SOS base}/examples directory with simple OpenSHMEM example programs
and instructions on how to build and run those.
- With the recent fixes in libfabric 1.11.x, RXM/Verbs provider support does
not require setting the FI_MR_CACHE_MAX_COUNT environment variable to 0
when threading is used.
- Additional bugfixes, including profiling symbols for some team-based
collectives, support for C99 compilation, and unit test failures
with single PE.
v1.5.0rc1
---------
- Support for OpenSHMEM v1.5 specification
- New features include: a teams API and teams-based collectives,
put-with-signal routines, nonblocking atomic routines, multiple-element
wait/test vector comparison routines, shmem_malloc_with_hints, and a
profiling interface. See OpenSHMEM 1.5 Specification, Annex G for details.
- Deprecations include: active-set-based library constants and collective
routines, shmem_barrier, and short/unsigned short variants for
shmem_wait_until and shmem_test. See OpenSHMEM 1.5 Specification, Annex F
for details.
- Added support for UCX transport
- Added shmem_malloc_with_hints. Currently, no hint values are supported.
- Added shmem_signal_fetch
- Multiple bugfixes, including in teams resource management, put completion
logic, and configure issues when multiple transports are detected.
v1.4.5
------
- Added a complete OpenSHMEM teams API to the shmemx interfaces.
- Added the OpenSHMEM wait/test vector API to the shmemx interfaces.
- Added support for shared memory atomic operations with the appropriate
acquire/release memory ordering where required. This feature requires
the --enable-shr-transport build flag.
- Improved experimental support for the RXM/Verbs provider stack with
libfabric 1.8 and newer.
- Updated SOS to use the new OFI memory registration mode flags instead of
the deprecated FI_MR_SCALABLE/FI_MR_BASIC modes.
- Updated the wait/test any/all/some "status" array semantics to reflect
recent changes to the OpenSHMEM specification.
- Added a memory sync before returning from barriers to ensure shared memory
updates and remote updates cached in the NIC are visible in memory.
- Updated the utility atomics (spinlocks and counters) to use __atomic built-in
functions rather than the deprecated __sync builtins.
- Removed redundant shmem_barrier_all from the GUPS example application.
- Moved unit tests derived from OpenSHMEM specification examples to the
test/spec-example directory.
v1.4.4
------
- Experimental support for RXM/Verbs provider stack with libfabric 1.8.
Requires --enable-hard-polling and --enable-ofi-mr=basic build flags. The
FI_MR_CACHE_MAX_COUNT environment variable should be set to 0 when threading
is used. Depending on the libfabric build, setting
SHMEM_OFI_PROVIDER="verbs;ofi_rxm" may be necessary to select the provider.
- Rewrite of node-level detection/management to enable improvements in on-node
communication and resource management.
- Added bandwidth optimized ring reduction algorithm, selectable by setting the
SHMEM_REDUCE_ALGORITHM environment variable to "ring". See README for
details.
- Added the SHMEM_COLL_SIZE_CROSSOVER environment variable to control the
message size at which collective communication transitions between latency
and bandwidth optimization.
- Added SHMEM_OFI_STX_AUTO environment variable to enable automatic
partitioning of STX resources on node. See README for details.
- Updated unit tests and performance test suite to properly handle failed
context creation.
- Updated OFI transport to support providers that do not support the
FI_SHARED_CONTEXT TX attribute (STX) on endpoints.
- Updated OFI transport to bind a CQ to the target EP (required by some
providers) and poll the target CQ when driving manual progress. As a result,
manual progress no longer requries a target-side counter and can be used in
conjunction with hard polling.
- Added support for decimal values in the SHMEM_SYMMETRIC_SIZE environment
variable (e.g., 1.5G).
- Updated shmem_init_thread routine to return an error when library
initialization fails (e.g., due to invalid SHMEM_SYMMETRIC_SIZE value).
- Fixed argument/context handling in Mandelbrot example program.
- Improved handling of context creation failure.
- Added missing pshmem_global_exit symbol to profiling interfaces.
- Updated shmem_ptr to return a pointer for the local process, even when
inter-PE shared memory is not enabled.
- Updated fence, quiet, and context destroy to perform no operation for
SHMEMX_CTX_INVALID.
- Additional bugfixes and improvement (see git log for details).
v1.4.3
------
- Added NBI atomics support to performance suite benchmarks.
- Fixed shmem_ctx_t type compatibility with C++ and older C compilers.
v1.4.3rc1
---------
- Added proposed OpenSHMEM put-with-signal routines to shmemx interaces.
- Added proposed OpenSHMEM wait/test-any/some/all to shmemx interfaces.
- Added proposed OpenSHMEM nonblocking fetching atomic operations to shmemx
interfaces.
- Added proposed OpenSHMEM SHMEM_CTX_INVALID constant to shmemx interfaces.
- Added put-with-signal optimization that uses the FI_FENCE capability (see
--enable-ofi-fence for details).
- Added support for backtrace printing when SOS aborts (see SHMEM_BACKTRACE
environment variable for details).
- OFI transport merged cntr and cq endpoints, improving resource utilization.
- Fixed issue with missing pshmem symbols in proposed profiling interfaces.
- Improved support for SLURM using PMI-1.
- Unit tests for the C++ generic interfaces (currently an SOS extension) were
moved to the "test/shmemx" directory and context creation failure handling
was improved in all tests to improve portability of the SOS test suite.
- Constant and type declarations were moved from shmem.h to shmem-def.h to
eliminate dependence of pshmem.h and shmemx.h on shmem.h
v1.4.2
------
- This full release includes the changes listed below for v1.4.2rc1 and the
following additional change.
- Improved performance when using OFI thread completion model, by releasing the
context lock when blocked in quiet/fence operations.
v1.4.2rc1
---------
- Added support for using MPI as the process manager, via --enable-pmi-mpi.
- Added option to flush stdout/stderr during barrier operations, enabled via
SHMEM_BARRIERS_FLUSH environment variable.
- Added PE affinity information to debugging output, enabled via SHMEM_DEBUG
environment variable.
- Fixed issue with SHMEM_OFI_PROVIDER environment variable being unrecognized.
- Added experimental performance counters API extension (see shmemx.h)
- Added --enable-ofi-mr={basic,scalable,rma-event} and removed --with-ofi-mr,
--disable-mr-scalable, and --enable-mr-rma-event.
- Renamed --with-total-data-ordering configuration flag to
--enable-total-data-ordering.
- Added multithreaded lock unit test to demonstrate the use of hierarchical
locking to enable multithreaded usage of OpenSHMEM locks.
- Updated perf. suite tests to improve bandwidth reporting and improve coverage
over atomic operations.
- Updated manual progress to automatically enable completion polling.
- Updated target counter binding in OFI to comply with libfabric requirements.
- Fixed invalid locking in context destroy debug message when bounce buffering
is not enabled on the context.
- Simplified the OFI random STX allocator.
v1.4.1
------
- This full release includes the changes listed below for v1.4.1rc1 and the
following additional changes.
- Improved the fidelity of throughput measurements in the performance testing
suite.
- Added check to performance suite to verify that source and target processes
are located on different nodes.
- Fixed threading synchronization race in context destruction.
- Added a multi-threaded lock unit test with an accompanying library for
thread-safe usage of OpenSHMEM locks.
- Additional minor code cleanups and bugfixes (see git log for details).
v1.4.1rc1
---------
- Added multi-threaded implementations of the shmem_perf_suite throughput
benchmarks, which make use of OpenSHMEM contexts and OpenMP.
- Modified the test suite to enable building and running external to the SOS
repository (see https://github.com/openshmem-org/tests-sos).
- Provided a pkg-config file for Sandia OpenSHMEM named 'sos'.
- Added the SHMEM_OFI_STX_DISABLE_PRIVATE mode which disables STX
privatization, potentially improving load balance across transmit resources.
- SHMEM_OFI_STX_MAX default value reduced to 1 to avoid resource exhaustion
with some providers.
- Improved the portability of SOS's error-reporting utilities.
- Introduced support for the shmem_query_thread routine.
- Added an optional probe routine to assist OFI transport progress (see
--enable-manual-progress), which may improve the performance of pending
communication operations for some providers (notably, PSM2).
- Added the shmemx_register_gettid function, which allows users to pass a
function pointer for setting custom thread IDs.
- Add check for ASLR when remote virtual addressing optimization is enabled.
- Added the --disable-aslr-check options to disable the check for address space
layout randomization.
- Added a unit test to validate the memory barriers in various SOS
synchronization routines.
- Multiple bugfixes in the unit tests, performance suite, Portals finalization,
OFI STX management, the SOS simple build script, the default poll limit, and
more.
- Fix incorrect SHMEM_MINOR_VERSION value.
v1.4.0
------
- Support for OpenSHMEM 1.4 specification.
- New features include: Thread safety, communication management API (contexts),
test routines, sync routines, calloc for symmetric objects, bitwise atomic
operations, and updated C11 generic selection bindings. See OpenSHMEM 1.4
Specification, Annex G for details.
- Deprecations include: Fortran API, shmem_wait routines, mpp header directory,
and SMA-prefixed environment variables. See OpenSHMEM 1.4 Specification,
Annex F for details.
- Added support for OFI FI_THREAD_COMPLETION thread safety model, needed to use
the SHMEM_THREAD_MULTIPLE mode with the PSM2 provider and others (see
--enable-thread-completion). For providers that support FI_THREAD_SAFE, this
mode may also provider better performance for private and serialized
contexts.
- Added manpages and unit tests for OpenSHMEM 1.4 API routines.
- Optimized performance of private and serialized contexts.
- Improved performance of shmem_fence operation.
- Added memory barriers needed to support shared memory interactions between
threads and PEs.
- Updated unit tests to remove use of deprecated API routines.
- Added support to OFI transport for sharing STX resources across contexts.
- Added several environment variables that can be used to control STX resource
management in the OFI transport: SHMEM_OFI_STX_MAX, SHMEM_OFI_STX_ALLOCATOR,
and SHMEM_OFI_STX_THRESHOLD. See README for details.
- Updated data segment exposure method to improve compatibility with tools.
- Updated thread safety support in Portals transport, eliminating several
possible races in communication tracking/completion code.
v1.4.0rc2
---------
- Added support to OFI transport for sharing STX resources across contexts.
- Added several environment variables that can be used to control STX resource
management in the OFI transport: SHMEM_OFI_STX_MAX, SHMEM_OFI_STX_ALLOCATOR,
and SHMEM_OFI_STX_THRESHOLD. See README for details.
- Added support for returning gracefully when context creation is unsuccessful.
- Updated data segment exposure method to improve compatibility with tools.
- Updated thread safety support in Portals, eliminating several possible races
in communication tracking/completion code.
v1.4.0rc1
---------
- Added support for OFI FI_THREAD_COMPLETION thread safety model, needed to use
the SHMEM_THREAD_MULTIPLE mode with the PSM2 provider and others (see
--enable-thread-completion). For providers that support FI_THREAD_SAFE, this
mode may also provider better performance for private and serialized
contexts.
- Added manpages for OpenSHMEM 1.4 API routines.
- Improved performance of private and serialized contexts.
- Improved performance of shmem_fence operation.
- Added memory barriers needed to support shared memory interactions between
threads and PEs.
- Updated unit tests to remove use of deprecated API routines.
v1.4.0a1
--------
- Support for OpenSHMEM 1.4 specification.
- New features include: Thread safety, communication management API (contexts),
test routines, sync routines, calloc for symmetric objects, bitwise atomic
operations, and updated C11 generic selection bindings.
- Deprecations include: Fortran API, shmem_wait routines, mpp header directory,
and SMA-prefixed environment variables.
- This is an alpha release; full API support is provided, but the
implementation may contain bugs and is not yet optimized for performance.
v1.3.4
------
- Added manpages for OpenSHMEM API routines.
- Added polling limit control variables SMA_OFI_TX_POLL_LIMIT and
SMA_OFI_RX_POLL_LIMIT (see README for details) to allow control over quiet
and wait polling limits, respectively.
- Rewrite of environment variable management code and improved info output.
- Added support for processor atomics in thread-safe build.
- Added support for local communication optimization (--enable-memcpy).
- Added support for libfabric FI_MR_RMA_EVENT memory registration mode.
- Added experimental support for the PMIx process management interface
(--with-pmix).
- Fixed completion race in multithreaded mode.
- Fixed bugs in support for zero-length point-to-point and collective
communication.
- Multiple bugfixes to test suite, including GUPS benchmark
v1.3.3
------
- Fixed dissemination barrier race that could lead to incorrect synchronization.
- Improved collect and all-to-all collectives performance.
- Improved error reporting and debugging output.
- Added support for proposed bitwise atomics (see shmemx.h).
- Added shmemx_pcontrol function to profiling interface extension (see shmemx.h).
- Added symbol hiding support in SHMEM library.
- Fixed symmetric heap corruption bug in GUPS benchmark (see test/apps/gups.c).
- Added target-side measurement and asymmetric configuration support to
performance suite benchmarks.
v1.3.2
------
- Improved support for proposed multithreading extension (see shmemx.h),
enabled at compile time via --enable-threads.
- Enabled single-process, direct execution of SOS binaries with simple PMI.
- Added argument error checking for all SHMEM routines, enabled at compile time
via --enable-error-checking.
- Multiple build system improvements, including support for VPATH builds.
- Added new C and Fortran bindings generator that generates all headers and
bindings, including profiling interfaces.
- Added support for Fortran complex reductions API.
- Updated Fortran bindings to use short (OpenSHMEM style) header by default.
- Updated SHMEM_DEBUG output to include detailed build information.
- Added --enable-completion-polling build option to poll in quiet/fence
operations rather than waiting. This can improve performance for libfabric
providers that require software-generated progress.
- Added --disable-bounce-buffers build option to disable bounce buffering by
default (i.e. set SHMEM_BOUNCE_SIZE to 0 by default).
- Improved library path propagation (rpath) in compiler wrappers.
- Improved PMI simple build and fixed integration of libpmi_simple library.
- Update symmetric heap allocator to dlmalloc v2.8.6.
- Update PMI-1 client library from MPICH.
- Improved bandwidth efficiency and fixed bug in collect routines.
- Fixed several bugs in tree-based collectives when using PE active sets.
- Fixed several bugs in recursive-doubling reduction routine when using PE
active sets and when source and target buffers overlap.
- Fixed synchronization bug in memory management routines.
1.3.1
-----
- Support for allocating the symmetric heap using Linux huge pages.
- Improved support for Cray XC platforms.
- Improved support for variations in atomics support across OFI providers.
- Initial support for thread safety in the OFI transport.
- Fix several issues with C/C++ bindings and profiling interface.
- Added new benchmarks, unit tests, and examples.
1.3.0
-----
- Support for OpenSHMEM 1.3 specification, including nonblocking communication,
fetch/set atomics, all-to-all collectives, C11 generic bindings, and addition
of const/volatile to C API.
- Improvements to error checking of OpenSHMEM calls and internal debugging
enabled through --enable-error-checking configure option.
- Improvements to C++ compatibility, including support for type-generic
bindings in C++.
- Support for fabric and domain selection in the OFI transport, see
SMA_OFI_DOMAIN/FABRIC environment variables in the README file for details.
- Support for systems using the Aries network via the OFI GNI provider.
- Enabled XPMEM support in combination with the OFI transport.
- Support for PMI2-compliant process managers.
- The OFI transport automatically falls back to software reductions when the
provider does not support the particular combination of atomic operation and
datatype for a given OpenSHMEM reduction.
- Added multiple communication performance benchmarks to the test suite.
- Multiple bug fixes and improvements to stability and performance.
1.2.0
-----
- Support for OpenSHMEM 1.2 specification
- Support for libfabric
- Integrated support for PMI-1 compliant launcher (e.g. MPICH Hydra)
- Build now generates 'oshrun' launcher script for launching OpenSHMEM applications
- Multiple bug fixes and improvements to stability and performance
- Add shmemx.h and shmemx.fh header files for Sandia OpenSHMEM extensions
- Add support for short, SGI-style Fortran header (configure option
--disable-long-fortran-header) required by UH test suite
- Removed stale copies of UH tests, upstreamed changes, and moved to external
testing model. Extensive bug fixes and improvements to remaining test suite.
1.0a8
-----
- Initial public release