Skip to content

mercury 2.2.0rc1

Pre-release
Pre-release
Compare
Choose a tag to compare
@github-actions github-actions released this 04 May 02:47

Summary

This version brings bug fixes and updates to our v2.0.0 release.

New features

  • [NA OFI]
    • Choose addr format dynamically based on user preferences
    • Add support for IPv6
    • Add support for FI_SOCKADDR_IB
    • Add support for HPE cxi provider,
      init info format for cxi is:
      • NIC:PID (both or only one may be passed), NIC is cxi[0-9], PID is [0-510]
    • Use hwloc to select interface to use if NIC information is available
      (only supported by cxi at the moment)
    • Support device memory types and FI_HMEM for verbs and cxi providers
    • Update min required version to libfabric 1.9
    • Improve debug output to print verbose FI info of selected provider
  • [NA UCX]
    • Use active messaging UCP_FEATURE_AM for unexpected messages (only), this
      allows for removal of address resolution and retry on first message to
      exchange connection IDs
    • Turn on mempool by default
    • Support device memory types
  • [NA PSM]
    • Add mercury NA plugin for the qlogic/intel PSM interface
      • Also support PSM2 (Intel OmniPath) through the PSM NA plugin
  • [NA]
    • Add na_addr_format init info
    • Update NA_Mem_register() API call to support memory types (e.g., CUDA, ROCm, ZE) and devices IDs
    • Add na_loc module for hwloc detection
    • Remove na_uint, na_int, na_bool_t and na_size_t types
    • Use separate versioning for library and update to v3.0.0
  • [NA IP]
    • Refactor na_ip_check_interface() to only use getaddrinfo() and getifaddrs()
    • Add family argument to force detection of IPv4/IPv6 addresses
    • Add ip debug log
  • [HG util]
    • Add mercury_byteswap.h for bswap macros
    • Add mercury_inet.h for htonll and ntohll routine
    • Add mercury_param.h to use sys/param.h or MIN/MAX macros etc
    • Use separate versioning for library and update to v3.0.0
  • [HG bulk]
    • Add support for memory attributes through a new HG_Bulk_create_attr() routine (support CUDA, ROCm, ZE)
  • [HG]
    • Remove MERCURY_ENABLE_STATS CMake option and use 'diag' log subsys instead
      • Modify behavior of stats field to turn on diagnostics
      • Refactor existing counters (used only if debug is on)
    • Add checksum levels that can be manually controlled at runtime (disabled by default, HG_CHECKSUM_NONE level)
    • Update to mchecksum v2.0

Bug fixes

  • [NA OFI]
    • Switch tcp provider to FI_PROGRESS_MANUAL
    • Prevent empty authorization keys from being passed
    • New implementation of address management
      • Fix duplicate addresses on multithreaded lookups
      • Redefine address keys and raw addresses to prevent allocations
      • Use FI addr map to lookup by FI addr
      • Improve serialization and deserialization of addresses
    • Fix provider table and use EP proto
    • Refactor and clean up plugin initialization
      • Clean up ip and domain checking
      • Ensure interface name is not used as domain name for verbs etc
      • Use NA IP module and add missing NA_OFI_VERIFY_PROV_DOM for tcp provider
      • Rework handling of fi_info to open fabric/domain/endpoint
      • Separate fabric from domain and keep single domain per NA class
      • Refactor handling of scalable vs standard endpoints
    • Improve handling of retries after FI_EAGAIN return code
      • Abort retried ops after default 90s timeout
      • Abort ops to a target being retried after first NA_HOSTUNREACH error in CQ
  • [NA UCX]
    • Fix potential error not returned correctly on conn_insert()
  • [HG util]
    • Make sure we round up ms time conversion, this ensures that small timeouts
      do not result in busy spin.
    • Fix 'none' log level not recognized
    • Let mercury log print counters on exit when debug outlet is on
  • [HG proc]
    • Prevent call to save_ptr()/restore_ptr() during HG_FREE

⚠️ Known Issues

  • [NA OFI]
    • [tcp/verbs;ofi_rxm] Using more than 256 peers requires FI_UNIVERSE_SIZE to be set.
  • [NA UCX]
    • NA_Addr_to_string() cannot be used on non-listening processes to convert a self-address to a string.