-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orphan mitigation could be initiated by end users #598
Orphan mitigation could be initiated by end users #598
Conversation
Hey mattmcneeney! Thanks for submitting this pull request! I'm here to inform the recipients of the pull request that you and the commit authors have already signed the CLA. |
cc @duglin my AI from today's call |
spec.md
Outdated
@@ -790,9 +790,11 @@ For success responses, the following fields are defined: | |||
} | |||
``` | |||
|
|||
If the response contains `"state": "failed"` then the Platform MUST send a | |||
If the response contains `"state": "failed"` then the Platform SHOULD send a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattmcneeney the GET /v2/service_instances/:instance_id/last_operation
being common to all async requests, this sentence would then be applied on failed async updates, where it should not, per #508
Hence the suggestion in #591 (comment) to only keep this sentence in the orphan section of the specs and to specialize it w.r.t. the associated async operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point. I can't work out how to word this to cover this scenario. Do you have any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think remove the section here entirely, and then add it to the end of the orphan mitigation section, something like
If a GET /v2/serviceinstances/:service_id/last operation response contains
"state": "failed"then the Platform SHOULD send a deprovision request to the Service Broker to prevent an orphan being created on the Service Broker if appropriate. The Platform MAY allow end users to determine when this deprovision request is sent to the Service Broker in order to allow them to investigate why the failure occurred. However, while the Platform will attempt to send a deprovision request, Service Brokers MAY automatically delete
@mattmcneeney what's the status of this one? |
a7dd9ad
to
78ba1c6
Compare
@duglin I have just reworked this PR! Let me know what you think... All of the orphan mitigation text is now centralised in the one section, with a few links to it from elsewhere. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @mattmcneeney
I've suggested some changes w.r.t. rfc2119 semantics, as well as service instance updates as in the use case reported in #508
spec.md
Outdated
If the response contains `"state": "failed"` then the Platform MUST send an | ||
unbind request to the Service Broker to prevent an orphan being created on | ||
the Service Broker. | ||
If the response contains `"state": "failed"`, then the Platform MAY need to send |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the response contains `"state": "failed"`, then the Platform MAY need to send | |
If the response contains `"state": "failed"`, then the Platform may need to send |
MAY in uppercase refers to https://tools.ietf.org/html/rfc2119
an item that is truly optional. One vendor may choose to include the item because a particular marketplace requires it or because the vendor feels that it enhances the product while another vendor may omit the same item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lowercase MAYs will fail @duglin's checker. I think it is OK to use the RFC2119 definition here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that it is a platform decision on whether users trigger orphan mitigation or not, then indeed it seems ok to use RFC2119 definition
@duglin it seems however excessive for the checker to always require use of the RFC2119 definition, whereas in some cases below, plain english may
lower case is what makes most sense in the specs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we don't want an RFC keyword then we could use "might" instead. In this case, why are we removing the MUST?
spec.md
Outdated
following scenarios: | ||
* The Service Broker does not return a response before a request from the | ||
Platform times out (typically 60 seconds). The Service Broker might eventually | ||
succeed in creating a resource, however the Platform MAY have already considered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
succeed in creating a resource, however the Platform MAY have already considered | |
succeed in creating a resource, however the Platform may have already considered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/may/might/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 'might' is better because I don't think the sentence (and the MAY) is meant to be using RFC2119 keywords - the sentence is non-normative - it's more just talking about a situation that might happen - it's not saying that the platform should (or should not) actually do something per the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you really want to keep the MAY here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, fixed
spec.md
Outdated
request times out, the Service Broker might eventually succeed in provisioning | ||
a Service Instance after the Platform considers the request a failure. This | ||
results in an orphan Service Instance on the Service Broker's side. | ||
Orphaned Service Instances and Service Bindings MAY be created by in any of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orphaned Service Instances and Service Bindings MAY be created by in any of the | |
Orphaned Service Instances and Service Bindings may be created by in any of the |
MAY in uppercase refers to https://tools.ietf.org/html/rfc2119
an item that is truly optional. One vendor may choose to include the item because a particular marketplace requires it or because the vendor feels that it enhances the product while another vendor may omit the same item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ MAY/ might/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why might?
spec.md
Outdated
to send an unbind request, Service Brokers MAY automatically delete | ||
any resources associated with the failed bind request on their own. | ||
Responses with any other status code MUST be interpreted as a failure. If a | ||
failure occurs, then the Platform MAY need to send an unbind request to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failure occurs, then the Platform MAY need to send an unbind request to the | |
failure occurs, then the Platform may need to send an unbind request to the |
spec.md
Outdated
to send a deprovision request, Service Brokers MAY automatically delete | ||
any resources associated with the failed provisioning request on their own. | ||
Responses with any other status code MUST be interpreted as a failure. If a | ||
failure occurs, then the Platform MAY need to send a deprovision request to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failure occurs, then the Platform MAY need to send a deprovision request to the | |
failure occurs, then the Platform may need to send a deprovision request to the |
spec.md
Outdated
to send a deprovision request, Service Brokers MAY automatically delete | ||
any resources associated with the failed provisioning request on their own. | ||
For asynchronous provision operations, if the response contains | ||
`"state": "failed"`, then the Platform MAY need to send a deprovision request to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`"state": "failed"`, then the Platform MAY need to send a deprovision request to | |
`"state": "failed"`, then the Platform may need to send a deprovision request to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we remove the MUST?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while, but I think because we are making this the user's decision, and they may not ever do it (though they should).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the sync case, if the provision fails with a 500 we require orphan mitigation - I believe. I'm not sure why we would do something different for async. Granted, a state==failed doesn't say it's the same thing as a 500, but I think #570 will fix that. Perhaps we should look to combine the two PRs. At this point, I'm more concerned about the inconsistency between sync and async.
To address what I think is the point of this PR, orphan mitigation might be delayed or done by the end-user, I think we could solve that with additional text in the orphan mitigation section instead. Perhaps something that talks about how orphan mitigation just means that at some point the Platform needs to do some kind of clean-up to ensure both sides are in-sync. Whether that happens immediately or is delayed, giving the user a chance to talk to the broker to debug what's going on, is an implementation detail. But the overall requirement is that there must be some kind of guarantee that both side's view of the world is consistent at some point - eventual consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is throughout the spec we don't always just refer to the Orphan Mitigation section; we specifically talk about sending deprovision/unbind requests. This PR is getting a bit big and tricky now (especially after having not looked at it for a while), but fundamentally I think we agree; Orphan Mitigation MAY not happen immediately. I'm up for merging PRs or whatever as long as we can make that clearer.
spec.md
Outdated
success. | ||
success. Service Brokers MAY automatically delete any resources they believe to | ||
be orphaned on their own. Note that the Platform MAY allow end users to | ||
determine when orphan mitigation occurs, in order to allow end users to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
determine when orphan mitigation occurs, in order to allow end users to | |
decide to sometimes postpone orphan mitigation, in order to allow end users to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to allow orphan mitigation to be skipped. We just want to allow it to be triggered by a user (so it may not happen immediately).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean a user can postpone an orphan migration in order to inspect a failing service instance on the broker side ?
I've updated the suggested change above if this might help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
Technically, in CF, orphan mitigation will never occur until the developer has run cf delete-service
after a failed provision. So it could never occur, but in reality, the user is in charge of determining when it will occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok makes sense. Thanks Matt.
spec.md
Outdated
|
||
Platforms SHOULD initiate orphan mitigation in the following scenarios: | ||
|
||
| Status Code Of Service Broker Response | Platform Interpretation Of Response | Platform Initiates Orphan Mitigation? | | ||
| --- | --- | --- | | ||
| 200 | Success | No | | ||
| 200 with malformed response | Failure | No | | ||
| 200 with `"state": "failed"` (applicable to [Polling Last Operation for Service Instances](#polling-last-operation-for-service-instances) and [Polling Last Operation for Service Bindings](#polling-last-operation-for-service-bindings) endpoints) | Failure | Yes | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 200 with `"state": "failed"` (applicable to [Polling Last Operation for Service Instances](#polling-last-operation-for-service-instances) and [Polling Last Operation for Service Bindings](#polling-last-operation-for-service-bindings) endpoints) | Failure | Yes | | |
| 200 with `"state": "failed"` (applicable to [Polling Last Operation for Service Instances provisionning/deprovisionning](#polling-last-operation-for-service-instances) and [Polling Last Operation for Service binding/unbinding](#polling-last-operation-for-service-bindings) endpoints) | Failure | Yes | | |
| 200 with `"state": "failed"` (applicable to [Polling Last Operation for Service Instances update](#polling-last-operation-for-service-instances) | Update failure | No | |
Adding a line to explicitly cover failed updates which repeatedly required clarifications, such as in the use case reported in #508
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point. Fixing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear... this one will be fixed in #591, not here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused! Is the text as it is in this PR now OK? Or does that overlap with #591?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm the one who is confused :-) But didn't @gberche-orange, in his comment above, suggest an edit to 1667 and a new 1668 line? If so, you said "Ah good point. Fixing" so I assumed you were going to pick up those edits - which would then solve #591, I think. But, if you want to leave the current PR 'as is' and have me do his suggested edits in #591 I'm ok with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused! Is the text as it is in this PR now OK?
I believe #598 still misses clarification that failed updates to service instance and service binding should not lead to orphan mitigation. See my suggested change from Oct 18th at https://github.com/openservicebrokerapi/servicebroker/pull/598/files#r226194972
spec.md
Outdated
|
||
Platforms SHOULD initiate orphan mitigation in the following scenarios: | ||
|
||
| Status Code Of Service Broker Response | Platform Interpretation Of Response | Platform Initiates Orphan Mitigation? | | ||
| --- | --- | --- | | ||
| 200 | Success | No | | ||
| 200 with malformed response | Failure | No | | ||
| 200 with `"state": "failed"` (applicable to [Polling Last Operation for Service Instances](#polling-last-operation-for-service-instances) and [Polling Last Operation for Service Bindings](#polling-last-operation-for-service-bindings) endpoints) | Failure | Yes | | ||
| 201 | Success | No | | ||
| 201 with malformed response | Failure | Yes | | ||
| All other 2xx | Failure | Yes | | ||
| 408 | Client timeout failure (request not received at the server) | No | | ||
| All other 4xx | Request rejected | No | | ||
| 5xx | Service Broker error | Yes | | ||
| Timeout | Failure | Yes | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Timeout | Failure | Yes | | |
| Timeout | Failure | Yes | | |
| Timeout (applicable to [Service Instance Update](#updating-a-service-instance) or [Polling Last Operation for Service Instances update](#polling-last-operation-for-service-instances) | Update Failure | No | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot. Fixed.
a9a08f6
to
6901158
Compare
I believe this is ready for review again! |
@mattmcneeney there are comment in there - did you want to address those? |
@duglin replied! |
Reviews please @openservicebrokerapi/osbapi-pmc |
6901158
to
382a951
Compare
spec.md
Outdated
Note that per the [Orphans](#orphans) section, this error response does not | ||
cause orphan mitigation to be initiated. Therefore, Platforms receiving this | ||
error response SHOULD resend the request at a later time. | ||
Upon reciving this error response, Platforms MUST NOT perform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/reciving/receiving/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
spec.md
Outdated
to send a deprovision request, Service Brokers MAY automatically delete | ||
any resources associated with the failed provisioning request on their own. | ||
For asynchronous provision operations, if the response contains | ||
`"state": "failed"` then the Platform MAY need to perform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure the MAY here is appropriate because MAY is used to denote an optional thing that the Platform may or may not choose to do - it's up to it. However, in this case it's not up to the Platform, it's actually up to the orphan mitigation section. So I think saying something more like Orphan Mitigation might be necessary, see the [Orphan Mitigation section](#orphan-migitiation) for more details.
might be better. Same for the cases below too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes complete sense. Thanks, changed!
spec.md
Outdated
request times out, the Service Broker might eventually succeed in provisioning | ||
a Service Instance after the Platform considers the request a failure. This | ||
results in an orphan Service Instance on the Service Broker's side. | ||
Orphaned Service Instances and Service Bindings MAY be created by in any of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/MAY be/might have been/
Not a proper use of RFC2119
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
382a951
to
58b7fbf
Compare
Thanks for the comments @duglin Ready for review again |
58b7fbf
to
ef99445
Compare
spec.md
Outdated
|
||
Platforms SHOULD initiate orphan mitigation in the following scenarios: | ||
The following table aims to describe when Orphan Mitigation should be initiated: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/should/is to/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks!
ef99445
to
1b6a3aa
Compare
spec.md
Outdated
|
||
Platforms SHOULD initiate orphan mitigation in the following scenarios: | ||
|
||
| Status Code Of Service Broker Response | Platform Interpretation Of Response | Platform Initiates Orphan Mitigation? | | ||
| --- | --- | --- | | ||
| 200 | Success | No | | ||
| 200 with malformed response | Failure | No | | ||
| 200 with `"state": "failed"` (applicable to [Polling Last Operation for Service Instances](#polling-last-operation-for-service-instances) and [Polling Last Operation for Service Bindings](#polling-last-operation-for-service-bindings) endpoints) | Failure | Yes | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused! Is the text as it is in this PR now OK?
I believe #598 still misses clarification that failed updates to service instance and service binding should not lead to orphan mitigation. See my suggested change from Oct 18th at https://github.com/openservicebrokerapi/servicebroker/pull/598/files#r226194972
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarification that failed updates to service instance and service binding should not lead to orphan mitigation would be moved to a distinct PR to cover use case reported in #508
https://github.com/openservicebrokerapi/servicebroker/pull/598/files#r226194972
5c997a5
to
43be1ae
Compare
43be1ae
to
a740358
Compare
spec.md
Outdated
|
||
The following table aims to describe when Orphan Mitigation is to be initiated: | ||
|
||
| Request | Service Broker Response Status Code | Platform Interpretation Of Response | Orphan Mitigation MUST be performed for Service Instances | Orphan Mitigation MUST be performed for Service Bindings | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...by the Platform
|
||
To mitigate orphan Service Instances and Service Bindings, the Platform SHOULD | ||
To mitigate orphaned Service Instances and Service Bindings, the Platform SHOULD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use SHOULD here, but MUST down in the table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I can't introduce a MUST here.
a740358
to
6583346
Compare
Ready for re-review @openservicebrokerapi/osbapi-pmc |
4980cd2
to
d36b136
Compare
d36b136
to
5a5e2c1
Compare
@jberkhahn @fmui I had to fix a travis error - need reviews again! |
✅ Hey mattmcneeney! The commit authors and yourself have already signed the CLA. |
Attempt to fix #573