-
Notifications
You must be signed in to change notification settings - Fork 0
20221102 meeting
Coalesce the discussion about cancelling allocation requests that has been going on in multiple venues (e.g., MPI Session WG, etc.).
-
We discussed the current allocation interface to re-accustom ourselves with the nuance between the
PMIX_ALLOC_REQ_ID
(user supplied) and thePMIX_ALLOC_ID
(PMIx produced) -
Expected behavior when cancelled: the party that cancels gets a PMIX_SUCCESS (or equivalent if async), the party that alloc-ed will get a PMIX_ERR_CANCELLED.
Multiple possible options were considered:
- The
PMIX_ALLOC_ID
is used to cancel, this is non-workable for cancelling since this is returned when the allocation is complete; thus it must use thePMIX_ALLOC_REQ_ID
- The cancellation is 'cached' and will cancel anything that matches the
PMIX_ALLOC_REQ_ID
in the future; we don't like that, because it is global shared state that would itself need to be cancelled somehow. Ugly and problematic. - The cancellation is ignored, because it doesn't match any current allocation request. This is simple, the user could use its own synchronization if the use case is important to them, but we believe that in general, the cancellation will be issued at the same location as the allocation, thus decreasing the relevance of this scenario. Overall, that looks like the adequate solution.
The expected outcome is that the allocation request (if any is later posted) will succeed, and the cancellation will return PMIX_ERR_NOT_FOUND
(or the appropriate error code).
We had a similar discussion for the case where the cancellation comes after the allocation has completed (and returned/will return when probed PMIX_SUCCESS). We think the same reasoning applies and again the cancel should return PMIX_ERR_NOT_FOUND
.
We had some very quick discussion if the API should come in the form of a new function, or some new attribute keys to existing job-control operations. This has not been completely resolved yet.
This is too late for v5, it may make it to v5.1
I believe we said Dec 7th. Please correct if wrong.