The repository for the DS project.
Authors:
- Gradle
- Java
- Clone the repository:
git clone https://github.com/LeonardoDeFaveri/DistributedSystemsProject
- Enter the folder with the terminal:
cd DistributedSystemsProject
- Run the application with gradle (if gradle is in the environment variables):
gradle run
- Both replicas and clients hold a list of all the replicas in the system:
- Replicas need to know which are the other replicas
- Clients pick the replica to contact from that list. Each client has a favorite replica and keeps contacting it until it crashes. When that happens, it picks another one
- Initially the coordinator is the first replica created
- Replicas keep track of the last time they heard anythig from the coordinator. HearbetMsg and any other kind of message received from the coordinator makes them reset this time of last contact
- Start the system with
gradle run
Replicas and clients are created - Press
ENTER
to send a start signal to all hosts:- Clients begin sending requests to replicas
- Replicas begin expecting heartbeat messages (or ay kind of message) from the coordinator
- Coordinator begin sending heartbeat messages to replicas
- Press
ENTER
again to send a stop signal to clients forcing them to stop producing more requests - Press
ENTER
a third time to terminate the system
Crash detection works by means of periodic checks of arrival of messages.
-
Read requests:
ReadMsg
Each clients identifies univocally a read request by means of an incrementing index that's placed withing the request itself. Upon sending the request to a replica, the client registers the sent request into
readMsgs
(maps read indexes to replicas) and starts a timeout (READOK_TIMEOUT
). When the timeout expires aReadOkReceivedMsg
is sent by the client to itself.A client expects to receive a
ReadOkMsg
from the contacted replica and that message should contain the read value and the Id of the served read request. On arrival, the client removes fromreadMsgs
the request with the ID found in theReadOkMsg
. AReadOkReceivedMsg
also contains the ID of the associated request and the handler checks ifreadMsgs
holds a value. If that's the case, then it means that noReadOkMsg
has been received so the replica found in the map associated to that ID is presumed to be crashed and removed from the list of active replicas. Otherwise, if the map has no value it means that the request has been served and the replica was fine up to that point. -
Update requests:
UpdateRequestMsg
An update request is identified by the
ActorRef
of the client and another incrementing ID. Such pair is handled by theUpdateRequestId
class. As for read requests, when a client sends un update to a replica, it registers the request and the destination replica in a map (writeMsgs
) using the ID as key and starts a timeout (UpdateRequestOkReceivedMsg
).The replica, once the request has been served, responds with an
UpdateRequestOkMsg
holding the index of the served request. On arrival, the index found is used to remove the associated value fromwriteMsgs
and on arrival of the correspondingUpdateRequestOkMsg
the same check done for read requests is performed. So, a no-value means that the request has been served and otherwise that the contacted replica has taken too long to answer and is identified as crashed.On replica side, when an update request is received the ID (pair
<client, index>
) is saved in the setupdateRequests
. This ID is also put into all associatedWriteMsg
s andWriteOkMsg
s. When the update protocol terminates and a replica receives aWriteOkMSg
, it takes the update request ID carried by the message and checks if it is present intoupdateRequests
. If that's the case, it means that this replica is the one that has received the request from the client and thus it has to send back an ACK, namely anUpdateRequestOkMsg
to that client. This message holds the local update request index of that client.
-
On update requests:
UpdateRequestMsg
When a replica that's not the coordinator receives an update request, forwards it to the coordinator and expects to recive back a
WriteMsg
. So, upon forwarding the request the replica start a timeout (WRITEMSG_TIMEOUT
) and registers the ID of the request into thependingUpdateRequests
set.When the coordinator sends a
WriteMsg
it embeds in it theupdateRequestId
of the update request that's being served. Upon reception of this message by the coordinator, a replica removes theupdateRequestId
frompendingUpdateRequests
and adds theWriteId
of the message intowriteRequests
to register the serving of that write request.When
WRITEMSG_TIMEOUT
expires aWriteMsgReceivedMsg
is sent by the replica to itself and if the expectedWriteMsg
has not been received, so ifpendingUpdateRequests
hasupdateRequestId
in it, the coordinator is considered to be crashed. Alive, otherwise.On coordinator side, when an
UpdateRequestMsg
is received, the associationWriteId->UpdateRequestId
,WriteId
being a class representing the pair<epoch, write_index>
, is saved intowritesToUpates
maps and is later used to build theWriteOkMsg
s. -
On write message:
WriteMsg
When a replica receives a
WriteMsg
it sends an ACK back to the coordinator and expects to receive aWriteOkMsg
in response. So, again a timer (WRITEOK_TIMEOUT
) is set and aWriteOkReceivedMSg
is sent by a replica to itself at expiration. On receipt of aWriteOk
, the associatedWriteId
is removed fromwriteRequests
and on receipt ofWriteOkRecivedMsg
the same kind of cheks done for the othersReceivedMsg
s is done. So, ifwriteRequests
still holdsWriteId
, the request has not been served yet, and the coordinator is set to crashed.On coordinator side, when enough ACKs are received for a request and the corresponing
WriteOkMsg
can be sent, theWriteId
associated by current values forepoch
andcurrentWriteToAck
is used to retrive fromwritesToUpdates
toUpdateRequestId
of originating request and such ID is put into theWriteOk
.
-
On ACK message:
WriteAckMsg
Thanks to the assumption on the minimum available number of a replicas, no crash check needs to be perfomed by coordinator side on recipt of ACK messages.