-
Notifications
You must be signed in to change notification settings - Fork 430
Yossi's notes
shamisp edited this page Oct 28, 2014
·
11 revisions
WIP:
Tag matching API for UCPImplement RMA on UCPCreate PD independently, use it to create iface (needed: uGNI PoC)Performance tests for UCPSIDR connection establishmentUCP bootstrap - use one transport to bootstrap others.Add worker APIImplement UCT AM callback which holds reference to the message.When cannot initiate the operation, UCT would return either NO_EP_RESOURCES or NO_IFACE_RESOURCES.Add more allocators for TL buffers (huge pages, mmap, ...)Rename uct_lkey_t to uct_mem_region_t
API features:
- Flags for communication: solicited event, completion,...
- Advertise required alignment for operations and best-performance alignment for operations
Make sure communication can be initiated from callbacks.Pass configuration to UCP_CONTEXTAdd timers support for async API
IB features:
- RoCE
- RRoCE (GID index)
- Path Query (RDMA CM / IB CM)
LMCNon-default P_Key indexSL
Design improvements:
- Extract more IB common code
- Move 'stats' library under 'tools'
Inconsistency with atomic/get bcopy API: in case the transport completes the operation immediately (e.g mmap), it should still call the callback. which means callbacks are called from communication functions, which means communication functions cannot be called from callbacks..const correctnessMove 'perf' library under 'tools'
Usability/debug improvements:
- In debug mode - check that EP is connected before sending
- Log by categories/objects
- All configuration variables should begin with UCX_
- Support custom env prefix
- Dump statistics to shared memory / unix socket.
Check for constant_tsc bit, and take CPU frequency from sysfs instead of procinfo.Add doxygenIn ucx_perftest, use PMI/librte instead of MPI
Performance improvements:
- Separate rx/tx progress
- likely/unlikely
Tests:
- Bidirectional tests
- Performance tests with multiple nodes (e.g pairs, all2all)
- Performance test should take expected performance from resource capabilities.
- Count warnings during gtest, and fail the test if they happen
- Print warning from perftest if not running with optimal performance:
- not in release mode
RTE support in gtest - maybe not needed; uGNI supports loopback.AM message rate/bandwidthCheck capability flags in testsin p2p_test, definesender_entity
andreceiver_entity
RC:
- Don't use descriptor in atomic add - pass a global /dev/null buffer.
- Use scatter-to-CQ for atomic/get replies
- Handle SRQ watermark event.
- Remove RC EP's from the hash table when they are removed (refcount)
- Get rid of RC iface counters. instead have an array will all ep's which have pending sends. This should make flush operation faster.
Update callbacks APIStatistics for RC.Configure all RC QP parameters.Parameter checks in debug mode.Log data packets in RC.Allocate and fill descriptor only after making sure there are send resources.Update bw/latency for transports.Construct WQE with SSE.Performance tests for GET and Atomics.Inline sends with >1 WQEBB.Use NOP for flushHandle async events in IB and print full informationSeparate parameter for send CQ size.Check for send CQ resources.Normalize transport namesGET supportScatter-to-CQ for 64 bytesAtomic operationsAvoid queuing 2 callbacks for some flows (atomic add, am zcopy)
Autoconf:
- -libverbs is added to LIBS (global)
- -libcm is added to LIBS
- HAVE_TL_xx in automake can be on, even if HAVE_IB is off