-
Notifications
You must be signed in to change notification settings - Fork 429
Yossi's notes
Yossi edited this page Jan 3, 2016
·
11 revisions
Design improvements:
- add protection flags to mem_reg/mem_dereg, so we would be able to send from read-only memory
- uct_completion_t should be returned from UCT and not passed to UCT, same way we are doing for UCP. Because the user would have to allocate it in advance anyway.
- modify am callback signature to accept only the data, and obtain the descriptor by calling another function
- refactor MM PD
- remove sockaddr structs
- RTE API - ucp_ep_create(worker, cb, arg) -> the callback will retrieve the address on-demand.
- event-based API
- Extract more IB common code
- Move 'stats' library under 'tools'
Inconsistency with atomic/get bcopy API: in case the transport completes the operation immediately (e.g mmap), it should still call the callback. which means callbacks are called from communication functions, which means communication functions cannot be called from callbacks..const correctnessMove 'perf' library under 'tools'
UCM:
- Support mprotect()
- Save the mmap-ed pointers in page table structure, rather than in a list
Problems:
- Update CodeStyle with:
- include file order
- local variable names structure
- number for each rule
- add checkpatch.pl
- space lines
Need protocol sync before destroying RC QP - solved by ignoring errorsFail if there were warnings during testPost receives only if there is active message handler (improve time with valgrind) - can't be done because we have control messages
Zero copy E2E:
memory hookspage table- organize files in uct/base
- rename pd to md
- registration cache
- expose memory registration performance
- zero copy protocols
- rndv protocols
UD:
- Base AV
- More efficient TX moderation
Common progress for verbs/accellSchedulingReliabilityRing of control SKBs
WIP:
Tag matching API for UCPImplement RMA on UCPCreate PD independently, use it to create iface (needed: uGNI PoC)Performance tests for UCPSIDR connection establishmentUCP bootstrap - use one transport to bootstrap others.Add worker APIImplement UCT AM callback which holds reference to the message.When cannot initiate the operation, UCT would return either NO_EP_RESOURCES or NO_IFACE_RESOURCES.Add more allocators for TL buffers (huge pages, mmap, ...)Rename uct_lkey_t to uct_mem_region_t
API features:
- Flags for communication: solicited event, completion,...
- Advertise required alignment for operations and best-performance alignment for operations
Make sure communication can be initiated from callbacks.Pass configuration to UCP_CONTEXTAdd timers support for async API
IB features:
- RoCE
- RRoCE (GID index)
- Path Query (RDMA CM / IB CM)
LMCNon-default P_Key indexSL
Usability/debug improvements:
- In debug mode - check that EP is connected before sending
- Log by categories/objects
- Support custom env prefix
- Dump statistics to shared memory / unix socket.
All configuration variables should begin with UCX_Check for constant_tsc bit, and take CPU frequency from sysfs instead of procinfo.Add doxygenIn ucx_perftest, use PMI/librte instead of MPI
Performance improvements:
- Separate rx/tx progress
- likely/unlikely
Tests:
- Bidirectional tests
- Performance tests with multiple nodes (e.g pairs, all2all)
- Performance test should take expected performance from resource capabilities.
Count warnings during gtest, and fail the test if they happen- Print warning from perftest if not running with optimal performance:
- not in release mode
RTE support in gtest - maybe not needed; uGNI supports loopback.AM message rate/bandwidthCheck capability flags in testsin p2p_test, definesender_entity
andreceiver_entity
RC:
- Don't use descriptor in atomic add - pass a global /dev/null buffer.
- Use scatter-to-CQ for atomic/get replies
- Handle SRQ watermark event.
- Remove RC EP's from the hash table when they are removed (refcount)
- Get rid of RC iface counters. instead have an array will all ep's which have pending sends. This should make flush operation faster.
Avoid RX descriptor allocation when calling AM callback and it returns UCS_OKUpdate callbacks APIStatistics for RC.Configure all RC QP parameters.Parameter checks in debug mode.Log data packets in RC.Allocate and fill descriptor only after making sure there are send resources.Update bw/latency for transports.Construct WQE with SSE.Performance tests for GET and Atomics.Inline sends with >1 WQEBB.Use NOP for flushHandle async events in IB and print full informationSeparate parameter for send CQ size.Check for send CQ resources.Normalize transport namesGET supportScatter-to-CQ for 64 bytesAtomic operationsAvoid queuing 2 callbacks for some flows (atomic add, am zcopy)
Autoconf:
- HAVE_TL_xx in automake can be on, even if HAVE_IB is off
-libverbs is added to LIBS (global)-libcm is added to LIBS