-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race Conditions for Requests #359
Comments
There are a couple of requirements in SDPi that could be relevant to this:
Do these requirements address the race you condition you raised? Some consumers are using the mdib version to check message ordering (R1005), though I think there are some problems with that. The solution proposed in that issue doesn't anticipate multiple connections though. Nonetheless, could it help resolve the race condition you raised while allowing improved performance? |
Thank you very much for pointing these out! We think R1001 is meant to be applied to reports only, as indicated by its leading text, the Chapter and the fact that it can not be fulfilled for services, where the provider serves as host: R1003 is not sufficient, as one may still set up multiple HTTP 1.1 connections. |
The proposed solution of introducing a mechanism for "transmitted reports counting" for partial subscriptions does not solve the issue, as it applies to reports only, for which the issue does not apply, given R1001. |
There's tricky language in all this, but I think you are right about R1001 being for out-bound connections only.
Strictly, once a connection's been established for notifications I think this requirement, as written now, says that one connection should be used for both sending notifications from to a consumer and receiving operation commands from that consumer. I wonder if that's the intent? I suspect it isn't, but if so would create a bunch of new questions, e.g.:
I agree that R1003 isn't sufficient to avoid race conditions if the consumer is permitted to setup multiple connections to a single provider; and I can't see anything that bans that. However, this may be the same race condition that happens when multiple consumers each setup a single connection to the provider and send conflicting commands. I'm only just starting to look at invoked operations now, so I'm not sure how that's resolved. There are a bunch of requirements in 10207, §7.4.2 that may be relevant though. |
Problem:
SDC Messages are subject to race condition and their order is not guaranteed. If implemented without proper safeguards, this may lead to hazardous situations with unexpected behavior.
There is no guarantee, that messages are transmitted sequentially over the network.
Simultaneously, there is no requirement, that SDC participants wait for a request to finish before starting the next request.
Exemplary error-case scenario
A standard-compliant consumer may issues a series of set operations on a metric without waiting for a confirmation before issuing the next set operation. Intuitively, the consumer may assume that the metric will be set to the value of the last set operation. However, due to the lack of guaranteed message ordering, this assumption might not hold true.
Notably, when using the regular HTTP request/response pattern over a single socket (as most implementations presumably do), this issue does not arise. Requests are buffered until a response is received, and only then can the next request be processed. However, problems may occur when communication is established over multiple sockets, for example, to improve performance. Similarly, this issue is relevant when using gRPC instead of MDPWS.
Additionally, this error case cannot be detected from the OperationInvoked reports. These reports provide information about the metric, operation handles, and invocation state, but not the value being set.
Likewise, setXXXResponse messages do not carry sufficient information to determine execution order, as the reported mdibVersion represents the version before applying the set request.
Solutions:
1) Add a requirement, stating that the request/response pattern is to be followed at all times.
Most manufacturers likely already implement this pattern. It is a simple, safe, and backward-compatible solution that does not require significant changes.
On the other hand, this introduces limitations performance-wise:
The queries-per-seconds (Qps) are limited to 1 / round-trip-time, with the latter being at least twice the latency. With a latency of 100ms, this limits the queries-per-second to 5. Depending on the application, this could be a critical safety issue.
In general, it doubles the distributed system's vulnerability to latency compared to solution 2).
2) Add a message-counter for each connection for the recipient to order the incoming messages.
This requires introducing additional fields.
3) Adapt the OperationInvoked report, such that out-of-order executions can be detected.
By reporting the new mdibVersion, too, the order of execution could be determined.
Alternatively, the actual value could be included in the report.
This solves the issue just for set requests, not in general. It may still arise with other services, even though those already existent are not affected by it (needs confirmation).
4) Add an optional requirement, to make manufacturers aware of this issue, but maintain maximal flexibility in implementation
As mentioned in 1), not waiting for a response improves overall system responsiveness. As such, it may be desired behavior.
Manufacturers are hinted at this issue/opportunity by the requirement, reducing its risk.
Open question: If some manufacturers adhere to the requirement, and some don't, is the overall system still safe? Does this needs to be communicated during handshake?
The text was updated successfully, but these errors were encountered: