Management/Communication of Not Supported operations. #6

jjhursey · 2018-03-23T12:58:02Z

The PMIx standard allows any implementation to return PMIX_ERR_NOT_SUPPORTED for any request. Users of the API must be prepared to handle this case.

The central questions are:

When is PMIX_ERR_NOT_SUPPORTED supposed to be returned? In a blocking operation, it should be at the call site. In a non-blocking operation, that might not be possible without blocking until the request is processed by the RM. Should it be returned in the callback then?
Should we have a PMIx_Query_support function where a user can ask:
- Is this function supported?
- If function is supported, is this attribute or combination of attributes supported by this function?
- If function and attribute set is supported, are there any limitations on requests (e.g., memory limits on job allocation requests) that can be made?
Do we need a broader range of "not supported" error codes to distinguish between these cases (e.g., NOT_SUPPORTED, NOT_SUPPORTED_ATTRIBUTE, NOT_SUPPORTED_ATTRIBUTE_SET, ``NOT_SUPPORTED_REQUEST`)
Is a PMIx_Query_support function even useful for an application, or is the error code at the call site (or callback) enough?
Do we need a perror-like interface for the PMIx implementation to return a string explaining the reason why it returned "not supported"?

Below are some error codes that could be used/extended.

#define PMIX_ERR_SERVER_FAILED_REQUEST              -10
#define PMIX_ERR_BAD_PARAM                          -27
#define PMIX_ERR_NOT_SUPPORTED                      -47
#define PMIX_ERR_NOT_IMPLEMENTED                    -48

Note that any info key can be marked a PMIX_INFO_REQUIRED(m) which means that the PMIx implementation must return an error if does not support that attribute, instead of possibly ignoring it.

The text was updated successfully, but these errors were encountered:

jjhursey · 2018-03-23T14:51:23Z

(From discussion at March 23, 2018 face-to-face)

Non-blocking operations
- Can return "not supported" at call site if the PMIx library cannot handle initiating the request. It will likely return "success" because it initiated the request.
- The callback may return "not supported" if, during the processing of the request, something not supported was encountered.
- This restriction is necessary because the RM might be making the decision of what can be supported (maybe even "at this time") or not, which is not known at the time of the nonblocking function call unless we forced the nonblocking function to block - which is not what we want.
If a "not supported" operation is encountered then a PMIx event is generated that the application can catch and inspect.
- This is always generated regardless of when it was determined to be unsupported. Only one event will be generated for a given call.
When should 'supported' be checked (note that 'client-side' and 'server-side' could be different versions of the PMIx reference library (RL) - so implementations need to think about this).
- The client-side process will always get an event in the situation of an unsupported operation.
- Client-side of RL - if the attribute is something that the client-side should process, then it checks if it can return the error and generate an event. Otherwise it forwards to the server-side.
- Server-side RL - Check if it's for it to handle, otherwise forward to the Resource Manager
- Resource Manager - Return back in the void * information about the error management information. The Server-side RL would inspect that void * and issue the event to the Client-side process.
  - New API Need macro support to determine if this "status information" is set or not in the void *. Need macro for the RM to set this string, and for the server-side RL to read the string.
  - A reference library - We need to be careful in introducing a new macro. Some 'v2.0' compliant RMs won't know to se the macro, so the server-side RL needs to handle that case. The server-side RL can optionally see this additional information. If it's not set then an event is triggered, but without any additional information. The PMIx Reference library will add this as a release note in the v2.1.x series when it adds this macro.

jjhursey · 2018-03-23T15:09:30Z

@rhc54 had an example roughly like this. The client registers different callback data (potentially) at every call site providing context about that call site. The event handler chain and user's callback function manages the processing of the event handler. The main calling thread can wait for that event to be processed to control what to do next.

PMIx_Register_event_handler(PMIX_ERR_NOT_SUPPORTED, my_callback, cbdata { mutex/cond, FILE, ...} )
ret = PMIx_Get_nb() --> this generates an PMIX_ERR_NOT_SUPPORTED
if( success == ret ) {
   PMIx_Deregister_event_handler()
} else {
   // wait for my_callback to give me more information
}

my_callback(cbdata)
{
  // event handlers flow in a change
  // unpack the synchronization information you encoded in the cbdata
  // Process the error, and stop the chain
}

jjhursey · 2018-03-23T15:31:21Z

Counter example:

Process:

Register(my_fn, cbdataA)
PMIx_get_nb() // this is successful, eventually
Register(my_fn, cbdataB)
PMIx_get_nb() // this is not supported

Race in the event chain

foo(cbdataA)
  |
  v
foo(cbdataB)
  |
  v
...

How does foo(cbdataA) know that the "unsupported" was for it's call or foo(cbdataB)?

Can we use the info array passed back to the event callback? The info passed to the callback is the info passed to the PMIx_get_nb so the user can add a call site specific information to the event callback. That gives information not about the point of callback registration, but at the point of the API call. We need a new attribute that PMIx_get_nb can pass a "add-this-info-to-callback's-info-array" attribute with some additional information (void * or info array) that will be tacked on to the end of the info array provided to the event callback (after the info set that was given at the point of function registration).

Already adding to the 'info' array for the event for "not supported" to provide context.

jjhursey · 2018-05-17T19:40:26Z

Per teleconf - Ralph has some text to clarify this harnessing the event functionality (the PRL needs some work to fully support this).

rhc54 · 2018-08-10T15:37:35Z

Sigh - this came up again just yesterday via another use-case. I'll raise it on my priority, but it (obviously) won't be for v2 of the standard.

rhc54 · 2018-08-10T15:39:58Z

Changed the milestone - I don't think we can do this for v3 as v3.0.0 has already been released, so I moved it out to v4

jjhursey added the Question label Mar 23, 2018

jjhursey added this to the PMIx v2 Standard milestone Mar 23, 2018

jjhursey assigned rhc54 May 17, 2018

rhc54 modified the milestones: PMIx v2 Standard, PMIx v4 Standard Aug 10, 2018

SteVwonder mentioned this issue Apr 4, 2019

Creating PMIx interface "classes" based on stability #179

Closed

SteVwonder mentioned this issue May 2, 2019

Creating PMIx "classes/slices" based on functionality #182

Closed

jjhursey modified the milestones: PMIx v4 Standard, PMIx v4.1 Standard Dec 18, 2020

jjhursey modified the milestones: PMIx v4.1 Standard, PMIx v4.2 Standard Sep 17, 2021

jjhursey modified the milestones: PMIx v4.2 Standard, PMIx v5.1 Standard Jan 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Management/Communication of Not Supported operations. #6

Management/Communication of Not Supported operations. #6

jjhursey commented Mar 23, 2018

jjhursey commented Mar 23, 2018

jjhursey commented Mar 23, 2018

jjhursey commented Mar 23, 2018

jjhursey commented May 17, 2018

rhc54 commented Aug 10, 2018

rhc54 commented Aug 10, 2018

Management/Communication of Not Supported operations. #6

Management/Communication of Not Supported operations. #6

Comments

jjhursey commented Mar 23, 2018

jjhursey commented Mar 23, 2018

jjhursey commented Mar 23, 2018

jjhursey commented Mar 23, 2018

jjhursey commented May 17, 2018

rhc54 commented Aug 10, 2018

rhc54 commented Aug 10, 2018