Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orphan mitigation could be initiated by end users #598

Conversation

mattmcneeney
Copy link
Contributor

Attempt to fix #573

@cfdreddbot
Copy link

Hey mattmcneeney!

Thanks for submitting this pull request! I'm here to inform the recipients of the pull request that you and the commit authors have already signed the CLA.

@mattmcneeney
Copy link
Contributor Author

cc @duglin my AI from today's call

spec.md Outdated
@@ -790,9 +790,11 @@ For success responses, the following fields are defined:
}
```

If the response contains `"state": "failed"` then the Platform MUST send a
If the response contains `"state": "failed"` then the Platform SHOULD send a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattmcneeney the GET /v2/service_instances/:instance_id/last_operation being common to all async requests, this sentence would then be applied on failed async updates, where it should not, per #508

Hence the suggestion in #591 (comment) to only keep this sentence in the orphan section of the specs and to specialize it w.r.t. the associated async operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. I can't work out how to word this to cover this scenario. Do you have any suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think remove the section here entirely, and then add it to the end of the orphan mitigation section, something like
If a GET /v2/serviceinstances/:service_id/last operation response contains"state": "failed"then the Platform SHOULD send a deprovision request to the Service Broker to prevent an orphan being created on the Service Broker if appropriate. The Platform MAY allow end users to determine when this deprovision request is sent to the Service Broker in order to allow them to investigate why the failure occurred. However, while the Platform will attempt to send a deprovision request, Service Brokers MAY automatically delete

@duglin
Copy link
Contributor

duglin commented Oct 16, 2018

@mattmcneeney what's the status of this one?

@mattmcneeney mattmcneeney force-pushed the orphan-mitigation-could-be-end-user-controlled branch 2 times, most recently from a7dd9ad to 78ba1c6 Compare October 17, 2018 11:21
@mattmcneeney
Copy link
Contributor Author

@duglin I have just reworked this PR! Let me know what you think... All of the orphan mitigation text is now centralised in the one section, with a few links to it from elsewhere.

cc @jberkhahn @gberche-orange

Copy link
Contributor

@gberche-orange gberche-orange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @mattmcneeney

I've suggested some changes w.r.t. rfc2119 semantics, as well as service instance updates as in the use case reported in #508

spec.md Outdated
If the response contains `"state": "failed"` then the Platform MUST send an
unbind request to the Service Broker to prevent an orphan being created on
the Service Broker.
If the response contains `"state": "failed"`, then the Platform MAY need to send
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If the response contains `"state": "failed"`, then the Platform MAY need to send
If the response contains `"state": "failed"`, then the Platform may need to send

MAY in uppercase refers to https://tools.ietf.org/html/rfc2119

an item that is truly optional. One vendor may choose to include the item because a particular marketplace requires it or because the vendor feels that it enhances the product while another vendor may omit the same item.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lowercase MAYs will fail @duglin's checker. I think it is OK to use the RFC2119 definition here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that it is a platform decision on whether users trigger orphan mitigation or not, then indeed it seems ok to use RFC2119 definition

@duglin it seems however excessive for the checker to always require use of the RFC2119 definition, whereas in some cases below, plain english may lower case is what makes most sense in the specs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't want an RFC keyword then we could use "might" instead. In this case, why are we removing the MUST?

spec.md Outdated
following scenarios:
* The Service Broker does not return a response before a request from the
Platform times out (typically 60 seconds). The Service Broker might eventually
succeed in creating a resource, however the Platform MAY have already considered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
succeed in creating a resource, however the Platform MAY have already considered
succeed in creating a resource, however the Platform may have already considered

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/may/might/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 'might' is better because I don't think the sentence (and the MAY) is meant to be using RFC2119 keywords - the sentence is non-normative - it's more just talking about a situation that might happen - it's not saying that the platform should (or should not) actually do something per the spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you really want to keep the MAY here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, fixed

spec.md Outdated
request times out, the Service Broker might eventually succeed in provisioning
a Service Instance after the Platform considers the request a failure. This
results in an orphan Service Instance on the Service Broker's side.
Orphaned Service Instances and Service Bindings MAY be created by in any of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Orphaned Service Instances and Service Bindings MAY be created by in any of the
Orphaned Service Instances and Service Bindings may be created by in any of the

MAY in uppercase refers to https://tools.ietf.org/html/rfc2119

an item that is truly optional. One vendor may choose to include the item because a particular marketplace requires it or because the vendor feels that it enhances the product while another vendor may omit the same item.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ MAY/ might/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why might?

spec.md Outdated
to send an unbind request, Service Brokers MAY automatically delete
any resources associated with the failed bind request on their own.
Responses with any other status code MUST be interpreted as a failure. If a
failure occurs, then the Platform MAY need to send an unbind request to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
failure occurs, then the Platform MAY need to send an unbind request to the
failure occurs, then the Platform may need to send an unbind request to the

spec.md Outdated
to send a deprovision request, Service Brokers MAY automatically delete
any resources associated with the failed provisioning request on their own.
Responses with any other status code MUST be interpreted as a failure. If a
failure occurs, then the Platform MAY need to send a deprovision request to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
failure occurs, then the Platform MAY need to send a deprovision request to the
failure occurs, then the Platform may need to send a deprovision request to the

spec.md Outdated
to send a deprovision request, Service Brokers MAY automatically delete
any resources associated with the failed provisioning request on their own.
For asynchronous provision operations, if the response contains
`"state": "failed"`, then the Platform MAY need to send a deprovision request to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`"state": "failed"`, then the Platform MAY need to send a deprovision request to
`"state": "failed"`, then the Platform may need to send a deprovision request to

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we remove the MUST?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while, but I think because we are making this the user's decision, and they may not ever do it (though they should).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the sync case, if the provision fails with a 500 we require orphan mitigation - I believe. I'm not sure why we would do something different for async. Granted, a state==failed doesn't say it's the same thing as a 500, but I think #570 will fix that. Perhaps we should look to combine the two PRs. At this point, I'm more concerned about the inconsistency between sync and async.

To address what I think is the point of this PR, orphan mitigation might be delayed or done by the end-user, I think we could solve that with additional text in the orphan mitigation section instead. Perhaps something that talks about how orphan mitigation just means that at some point the Platform needs to do some kind of clean-up to ensure both sides are in-sync. Whether that happens immediately or is delayed, giving the user a chance to talk to the broker to debug what's going on, is an implementation detail. But the overall requirement is that there must be some kind of guarantee that both side's view of the world is consistent at some point - eventual consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is throughout the spec we don't always just refer to the Orphan Mitigation section; we specifically talk about sending deprovision/unbind requests. This PR is getting a bit big and tricky now (especially after having not looked at it for a while), but fundamentally I think we agree; Orphan Mitigation MAY not happen immediately. I'm up for merging PRs or whatever as long as we can make that clearer.

spec.md Outdated
success.
success. Service Brokers MAY automatically delete any resources they believe to
be orphaned on their own. Note that the Platform MAY allow end users to
determine when orphan mitigation occurs, in order to allow end users to
Copy link
Contributor

@gberche-orange gberche-orange Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
determine when orphan mitigation occurs, in order to allow end users to
decide to sometimes postpone orphan mitigation, in order to allow end users to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to allow orphan mitigation to be skipped. We just want to allow it to be triggered by a user (so it may not happen immediately).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean a user can postpone an orphan migration in order to inspect a failing service instance on the broker side ?

I've updated the suggested change above if this might help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so.

Technically, in CF, orphan mitigation will never occur until the developer has run cf delete-service after a failed provision. So it could never occur, but in reality, the user is in charge of determining when it will occur.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok makes sense. Thanks Matt.

spec.md Outdated

Platforms SHOULD initiate orphan mitigation in the following scenarios:

| Status Code Of Service Broker Response | Platform Interpretation Of Response | Platform Initiates Orphan Mitigation? |
| --- | --- | --- |
| 200 | Success | No |
| 200 with malformed response | Failure | No |
| 200 with `"state": "failed"` (applicable to [Polling Last Operation for Service Instances](#polling-last-operation-for-service-instances) and [Polling Last Operation for Service Bindings](#polling-last-operation-for-service-bindings) endpoints) | Failure | Yes |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| 200 with `"state": "failed"` (applicable to [Polling Last Operation for Service Instances](#polling-last-operation-for-service-instances) and [Polling Last Operation for Service Bindings](#polling-last-operation-for-service-bindings) endpoints) | Failure | Yes |
| 200 with `"state": "failed"` (applicable to [Polling Last Operation for Service Instances provisionning/deprovisionning](#polling-last-operation-for-service-instances) and [Polling Last Operation for Service binding/unbinding](#polling-last-operation-for-service-bindings) endpoints) | Failure | Yes |
| 200 with `"state": "failed"` (applicable to [Polling Last Operation for Service Instances update](#polling-last-operation-for-service-instances) | Update failure | No |

Adding a line to explicitly cover failed updates which repeatedly required clarifications, such as in the use case reported in #508

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point. Fixing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear... this one will be fixed in #591, not here, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused! Is the text as it is in this PR now OK? Or does that overlap with #591?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm the one who is confused :-) But didn't @gberche-orange, in his comment above, suggest an edit to 1667 and a new 1668 line? If so, you said "Ah good point. Fixing" so I assumed you were going to pick up those edits - which would then solve #591, I think. But, if you want to leave the current PR 'as is' and have me do his suggested edits in #591 I'm ok with that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattmcneeney

I'm confused! Is the text as it is in this PR now OK?

I believe #598 still misses clarification that failed updates to service instance and service binding should not lead to orphan mitigation. See my suggested change from Oct 18th at https://github.com/openservicebrokerapi/servicebroker/pull/598/files#r226194972

spec.md Outdated

Platforms SHOULD initiate orphan mitigation in the following scenarios:

| Status Code Of Service Broker Response | Platform Interpretation Of Response | Platform Initiates Orphan Mitigation? |
| --- | --- | --- |
| 200 | Success | No |
| 200 with malformed response | Failure | No |
| 200 with `"state": "failed"` (applicable to [Polling Last Operation for Service Instances](#polling-last-operation-for-service-instances) and [Polling Last Operation for Service Bindings](#polling-last-operation-for-service-bindings) endpoints) | Failure | Yes |
| 201 | Success | No |
| 201 with malformed response | Failure | Yes |
| All other 2xx | Failure | Yes |
| 408 | Client timeout failure (request not received at the server) | No |
| All other 4xx | Request rejected | No |
| 5xx | Service Broker error | Yes |
| Timeout | Failure | Yes |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| Timeout | Failure | Yes |
| Timeout | Failure | Yes |
| Timeout (applicable to [Service Instance Update](#updating-a-service-instance) or [Polling Last Operation for Service Instances update](#polling-last-operation-for-service-instances) | Update Failure | No |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot. Fixed.

@mattmcneeney mattmcneeney force-pushed the orphan-mitigation-could-be-end-user-controlled branch 2 times, most recently from a9a08f6 to 6901158 Compare October 18, 2018 09:42
@mattmcneeney
Copy link
Contributor Author

I believe this is ready for review again!

@duglin
Copy link
Contributor

duglin commented Nov 27, 2018

@mattmcneeney there are comment in there - did you want to address those?

@mattmcneeney
Copy link
Contributor Author

@duglin replied!

@mattmcneeney
Copy link
Contributor Author

Reviews please @openservicebrokerapi/osbapi-pmc

@mattmcneeney mattmcneeney force-pushed the orphan-mitigation-could-be-end-user-controlled branch from 6901158 to 382a951 Compare December 6, 2018 10:43
spec.md Outdated
Note that per the [Orphans](#orphans) section, this error response does not
cause orphan mitigation to be initiated. Therefore, Platforms receiving this
error response SHOULD resend the request at a later time.
Upon reciving this error response, Platforms MUST NOT perform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/reciving/receiving/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

spec.md Outdated
to send a deprovision request, Service Brokers MAY automatically delete
any resources associated with the failed provisioning request on their own.
For asynchronous provision operations, if the response contains
`"state": "failed"` then the Platform MAY need to perform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the MAY here is appropriate because MAY is used to denote an optional thing that the Platform may or may not choose to do - it's up to it. However, in this case it's not up to the Platform, it's actually up to the orphan mitigation section. So I think saying something more like Orphan Mitigation might be necessary, see the [Orphan Mitigation section](#orphan-migitiation) for more details. might be better. Same for the cases below too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes complete sense. Thanks, changed!

spec.md Outdated
request times out, the Service Broker might eventually succeed in provisioning
a Service Instance after the Platform considers the request a failure. This
results in an orphan Service Instance on the Service Broker's side.
Orphaned Service Instances and Service Bindings MAY be created by in any of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/MAY be/might have been/
Not a proper use of RFC2119

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mattmcneeney mattmcneeney force-pushed the orphan-mitigation-could-be-end-user-controlled branch from 382a951 to 58b7fbf Compare December 19, 2018 09:43
@mattmcneeney
Copy link
Contributor Author

Thanks for the comments @duglin

Ready for review again

@mattmcneeney mattmcneeney force-pushed the orphan-mitigation-could-be-end-user-controlled branch from 58b7fbf to ef99445 Compare December 19, 2018 12:37
spec.md Outdated

Platforms SHOULD initiate orphan mitigation in the following scenarios:
The following table aims to describe when Orphan Mitigation should be initiated:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/should/is to/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks!

@mattmcneeney mattmcneeney force-pushed the orphan-mitigation-could-be-end-user-controlled branch from ef99445 to 1b6a3aa Compare December 19, 2018 13:04
spec.md Outdated

Platforms SHOULD initiate orphan mitigation in the following scenarios:

| Status Code Of Service Broker Response | Platform Interpretation Of Response | Platform Initiates Orphan Mitigation? |
| --- | --- | --- |
| 200 | Success | No |
| 200 with malformed response | Failure | No |
| 200 with `"state": "failed"` (applicable to [Polling Last Operation for Service Instances](#polling-last-operation-for-service-instances) and [Polling Last Operation for Service Bindings](#polling-last-operation-for-service-bindings) endpoints) | Failure | Yes |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattmcneeney

I'm confused! Is the text as it is in this PR now OK?

I believe #598 still misses clarification that failed updates to service instance and service binding should not lead to orphan mitigation. See my suggested change from Oct 18th at https://github.com/openservicebrokerapi/servicebroker/pull/598/files#r226194972

gberche-orange
gberche-orange previously approved these changes Dec 20, 2018
Copy link
Contributor

@gberche-orange gberche-orange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarification that failed updates to service instance and service binding should not lead to orphan mitigation would be moved to a distinct PR to cover use case reported in #508
https://github.com/openservicebrokerapi/servicebroker/pull/598/files#r226194972

@mattmcneeney mattmcneeney force-pushed the orphan-mitigation-could-be-end-user-controlled branch 2 times, most recently from 5c997a5 to 43be1ae Compare December 20, 2018 09:22
@mattmcneeney mattmcneeney force-pushed the orphan-mitigation-could-be-end-user-controlled branch from 43be1ae to a740358 Compare January 15, 2019 14:03
spec.md Outdated

The following table aims to describe when Orphan Mitigation is to be initiated:

| Request | Service Broker Response Status Code | Platform Interpretation Of Response | Orphan Mitigation MUST be performed for Service Instances | Orphan Mitigation MUST be performed for Service Bindings |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...by the Platform


To mitigate orphan Service Instances and Service Bindings, the Platform SHOULD
To mitigate orphaned Service Instances and Service Bindings, the Platform SHOULD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use SHOULD here, but MUST down in the table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I can't introduce a MUST here.

@mattmcneeney mattmcneeney force-pushed the orphan-mitigation-could-be-end-user-controlled branch from a740358 to 6583346 Compare January 17, 2019 09:50
@mattmcneeney
Copy link
Contributor Author

Ready for re-review @openservicebrokerapi/osbapi-pmc

jberkhahn
jberkhahn previously approved these changes Jan 22, 2019
@mattmcneeney mattmcneeney force-pushed the orphan-mitigation-could-be-end-user-controlled branch 2 times, most recently from 4980cd2 to d36b136 Compare January 22, 2019 15:18
@mattmcneeney mattmcneeney force-pushed the orphan-mitigation-could-be-end-user-controlled branch from d36b136 to 5a5e2c1 Compare January 22, 2019 15:20
jberkhahn
jberkhahn previously approved these changes Jan 22, 2019
fmui
fmui previously approved these changes Jan 22, 2019
@mattmcneeney mattmcneeney dismissed stale reviews from fmui and jberkhahn via 0bbd601 January 23, 2019 11:57
@mattmcneeney
Copy link
Contributor Author

@jberkhahn @fmui I had to fix a travis error - need reviews again!

@cfdreddbot
Copy link

✅ Hey mattmcneeney! The commit authors and yourself have already signed the CLA.

@mattmcneeney mattmcneeney added this to the 2.15 milestone Jan 23, 2019
@mattmcneeney mattmcneeney merged commit f1fb0aa into openservicebrokerapi:master Feb 5, 2019
@mattmcneeney mattmcneeney modified the milestone: 2.15 Jun 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should the platform clean up resources when an async provision/bind returns state failed?
7 participants