Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Management/Communication of Not Supported operations. #6

Open
jjhursey opened this issue Mar 23, 2018 · 6 comments
Open

Management/Communication of Not Supported operations. #6

jjhursey opened this issue Mar 23, 2018 · 6 comments
Assignees
Labels

Comments

@jjhursey
Copy link
Member

The PMIx standard allows any implementation to return PMIX_ERR_NOT_SUPPORTED for any request. Users of the API must be prepared to handle this case.

The central questions are:

  • When is PMIX_ERR_NOT_SUPPORTED supposed to be returned? In a blocking operation, it should be at the call site. In a non-blocking operation, that might not be possible without blocking until the request is processed by the RM. Should it be returned in the callback then?
  • Should we have a PMIx_Query_support function where a user can ask:
    • Is this function supported?
    • If function is supported, is this attribute or combination of attributes supported by this function?
    • If function and attribute set is supported, are there any limitations on requests (e.g., memory limits on job allocation requests) that can be made?
  • Do we need a broader range of "not supported" error codes to distinguish between these cases (e.g., NOT_SUPPORTED, NOT_SUPPORTED_ATTRIBUTE, NOT_SUPPORTED_ATTRIBUTE_SET, ``NOT_SUPPORTED_REQUEST`)
  • Is a PMIx_Query_support function even useful for an application, or is the error code at the call site (or callback) enough?
  • Do we need a perror-like interface for the PMIx implementation to return a string explaining the reason why it returned "not supported"?

Below are some error codes that could be used/extended.

#define PMIX_ERR_SERVER_FAILED_REQUEST              -10
#define PMIX_ERR_BAD_PARAM                          -27
#define PMIX_ERR_NOT_SUPPORTED                      -47
#define PMIX_ERR_NOT_IMPLEMENTED                    -48

Note that any info key can be marked a PMIX_INFO_REQUIRED(m) which means that the PMIx implementation must return an error if does not support that attribute, instead of possibly ignoring it.

@jjhursey jjhursey added this to the PMIx v2 Standard milestone Mar 23, 2018
@jjhursey
Copy link
Member Author

(From discussion at March 23, 2018 face-to-face)

  • Non-blocking operations
    • Can return "not supported" at call site if the PMIx library cannot handle initiating the request. It will likely return "success" because it initiated the request.
    • The callback may return "not supported" if, during the processing of the request, something not supported was encountered.
    • This restriction is necessary because the RM might be making the decision of what can be supported (maybe even "at this time") or not, which is not known at the time of the nonblocking function call unless we forced the nonblocking function to block - which is not what we want.
  • If a "not supported" operation is encountered then a PMIx event is generated that the application can catch and inspect.
    • This is always generated regardless of when it was determined to be unsupported. Only one event will be generated for a given call.
  • When should 'supported' be checked (note that 'client-side' and 'server-side' could be different versions of the PMIx reference library (RL) - so implementations need to think about this).
    • The client-side process will always get an event in the situation of an unsupported operation.
    • Client-side of RL - if the attribute is something that the client-side should process, then it checks if it can return the error and generate an event. Otherwise it forwards to the server-side.
    • Server-side RL - Check if it's for it to handle, otherwise forward to the Resource Manager
    • Resource Manager - Return back in the void * information about the error management information. The Server-side RL would inspect that void * and issue the event to the Client-side process.
      • New API Need macro support to determine if this "status information" is set or not in the void *. Need macro for the RM to set this string, and for the server-side RL to read the string.
      • A reference library - We need to be careful in introducing a new macro. Some 'v2.0' compliant RMs won't know to se the macro, so the server-side RL needs to handle that case. The server-side RL can optionally see this additional information. If it's not set then an event is triggered, but without any additional information. The PMIx Reference library will add this as a release note in the v2.1.x series when it adds this macro.

@jjhursey
Copy link
Member Author

@rhc54 had an example roughly like this. The client registers different callback data (potentially) at every call site providing context about that call site. The event handler chain and user's callback function manages the processing of the event handler. The main calling thread can wait for that event to be processed to control what to do next.

PMIx_Register_event_handler(PMIX_ERR_NOT_SUPPORTED, my_callback, cbdata { mutex/cond, FILE, ...} )
ret = PMIx_Get_nb() --> this generates an PMIX_ERR_NOT_SUPPORTED
if( success == ret ) {
   PMIx_Deregister_event_handler()
} else {
   // wait for my_callback to give me more information
}
my_callback(cbdata)
{
  // event handlers flow in a change
  // unpack the synchronization information you encoded in the cbdata
  // Process the error, and stop the chain
}

@jjhursey
Copy link
Member Author

Counter example:

Process:

Register(my_fn, cbdataA)
PMIx_get_nb() // this is successful, eventually
Register(my_fn, cbdataB)
PMIx_get_nb() // this is not supported

Race in the event chain

foo(cbdataA)
  |
  v
foo(cbdataB)
  |
  v
...

How does foo(cbdataA) know that the "unsupported" was for it's call or foo(cbdataB)?

Can we use the info array passed back to the event callback? The info passed to the callback is the info passed to the PMIx_get_nb so the user can add a call site specific information to the event callback. That gives information not about the point of callback registration, but at the point of the API call. We need a new attribute that PMIx_get_nb can pass a "add-this-info-to-callback's-info-array" attribute with some additional information (void * or info array) that will be tacked on to the end of the info array provided to the event callback (after the info set that was given at the point of function registration).

Already adding to the 'info' array for the event for "not supported" to provide context.

@jjhursey
Copy link
Member Author

Per teleconf - Ralph has some text to clarify this harnessing the event functionality (the PRL needs some work to fully support this).

@rhc54
Copy link
Member

rhc54 commented Aug 10, 2018

Sigh - this came up again just yesterday via another use-case. I'll raise it on my priority, but it (obviously) won't be for v2 of the standard.

@rhc54
Copy link
Member

rhc54 commented Aug 10, 2018

Changed the milestone - I don't think we can do this for v3 as v3.0.0 has already been released, so I moved it out to v4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants