-
Notifications
You must be signed in to change notification settings - Fork 100
Replication subsystem enhancements #314
Comments
Fail-overAn additional requirement, that is not yet covered by the above system model and its extension, is the possibility to switch to another replication partner with a filtered replication connection. A typical use case is fail-over to another replication partner if the current replication partner remains unavailable for too long. For example, location
With the above system model, Assuming that with every event replication batch (even if empty), transmitted from Connections that have been established under these conditions over time don't need to be projected onto a current replication network view and therefore don't introduce a cycle in which |
Redundant filtered connectionsA generalization to fail-over is having filtered replication connections from
In this case, it is important that both connections, An event Although there is now a cycle A further generalization is possible. Assuming
Events replicated from Assuming that the
then a network partition between data center 1 and data center 2 still allows these two applications to communicate within their data center. |
Before describing the Current issues of Eventuate's event replication subsystem, a brief description of the underlying System model is given. This sets the context for the solution proposals in section Extensions.
Introduction
Eventuate applications store events in local event logs that can be connected to each other to form a replicated event log. The owner of a local event log is called location and event replication occurs across locations that share a replicated event log. Event replication is reliable and preserves causal ordering of events which is tracked with vector clocks. From a consumer perspective, a replicated event log provides a causal reliable broadcast (CRB) of events across a group of locations.
Event-sourced components may consume events from and produce events to a local event log. Events produced at one location can be consumed at other locations if these locations share a replicated event log. Event-sourced components that consume from a local event log receive events in an order that is consistent with causal order i.e. they will never see an effect before its cause. Event-sourced components that collaborate over a shared replicated event log can therefore provide causal consistency for replicated mutable state. A special example are operation-based CRDTs that use causality metadata for global state convergence.
System model
In the following, it is assumed that a location has only a single local event log that is connected to the local event logs of other locations, to form a replicated event log (see also replication networks). Of course, multiple local event logs per location are supported too but this is not relevant in context of this discussion.
Replication networks
Two locations,
A
andB
can be connected by an uni-directional replication connection,A -> B
orA <- B
, where the arrow indicates the event replication direction. Bi-directional event replication,A <-> B
(orA - B
) is realized by two independent, uni-directional replication connections in opposite directions. With replication connections, locations can be connected to a replication network which may also contain cycles, as show in the following example:Strictly speaking, a bi-directional replication connection between two locations is also a cycle but this is not relevant in context of the system model. When discussing replication network cycles, bi-directional replication connections need not be considered. Also, if there is an uni- or bi-directional replication connection between two locations, these two locations are said to be directly connected.
Connecting locations to a replication network means connecting their local event logs to a replicated event log. Hence, the terms replication network and replicated event log are used interchangeably here, assuming there is only one local event log per location. Also, the terms location and local event log are used interchangeably.
Potential causality
Each local event log maintains a vector clock. The size of the clock scales with the number of local event logs in the replication network. An event that is written to a local event log is assigned a vector timestamp taken from the current time of the vector clock of that event log. A vector clock generates a partial order of events, the happened-before relation
→
or potential causality of events. Vector timestamps can be used to determine whether any two events have a potential causal relationship or are concurrent:e1 → e2
means thate1
causally precedese2
wherease1 ↔︎ e2
means thate1
ande2
are concurrent and don't have a causal relationship.Storage order
The storage order of events in a local event log is consistent with the potential causality of events:
e1 → e2
then the sequence number ofe2
is greater than the sequence number ofe1
in all local event logs of a replicated event log.e1 ↔︎ e2
then the relative order ofe1
ande2
in a local event log is not defined:e1
may have a lower sequence number thane2
in one local event log but a higher sequence number thane2
in another local event log.More formally, a given local event log is one of several possible linear extensions of the partial order of events. To preserve the linear extension invariant during replication, replicated events that are in the causal past of a target event log are excluded from being written to that event log.
The causal past of an event log is determined by its current version vector (CVV) which is the least upper bound of the vector timestamps of all events stored in that log. If the vector timestamp of a replicated event causally precedes or is equal to the CVV of the target event log, it is excluded from being written. This mechanism is referred to as causality filter.
In order to reduce network bandwidth usage, CVVs of target event logs are also transferred to source event logs during replication so that most events can already be filtered there.
Replication progress
The progress of event replication from a source log
A
to a target logB
is recorded atB
as (A
,snr
) tuple wheresnr
is the sequence number of the last event that has been read fromA
. The progress is stored in a separate transaction, after the events fromA
have been written toB
.When
B
crashes after the events have been written but before the progress has been stored, andB
later recovers, it will retry replication from a previously stored progress. Events from this retry however will be detected as duplicates by the causality filter and dropped. This makes event replication reliable and idempotent.Dynamic changes
Causality filters allow for dynamic changes in a replication network. A location may connect to different other locations over time without breaking the linear extension invariant. Please note that this feature is not yet available in the public API yet but the technical basis (= causality filters) already exists.
Current issues
Eventuate provides some features that are not yet compatible with the above system model. These are
The Appendix gives some examples of these issues. The next section describes required extensions and restrictions to the system model, needed to make it compatible with these features.
Extensions
Replication network history
When discussing replication network topologies, not only the current topology of a replication network must be considered, but also the complete history of its changes. For example, an acyclic replication network with locations
A
,B
andC
and the following topology at timet1
and another topology at time
t2
must actually be described as cyclic replication network when projecting its history:
For reconstructing a replication network topology from its history, replication connection changes must be recorded at the involved locations. Replication network examples in the following sections always assume topologies that include the complete history of changes.
Replication filters
Replication filters are not allowed on replication network cycles (an exception are redundant filtered connections). For example, given a cycle
ABC
, a locationD
attached toC
and another locationE
attached toB
replication filters are only allowed between locations
C
andD
as well asB
andE
. Now, consider a replacement of locationE
with a cyclic sub replication networkXYZ
:In this case, replication filters are still allowed between
B
andX
because they are not part of a cycle. Removing the replication connections betweenY
andZ
would additionally allow replication filters between
X
andY
as well asX
andZ
.In addition to replication network changes, the history of replication filters must be recorded too. A replication connection that has been filtered in the past will still be considered as filtered in the future, even if the current version of the replication connection has no replication filter.
Event deletion
Event deletion distinguishes logical deletion from physical deletion as explained in Deleting events. Event deletion is always performed up to a given position in a local event log. For physical deletion, a deletion version vector (
DVV
) must be introduced. It is the least upper bound of the vector timestamps of all physically deleted events. There is oneDVV
per local event log. TheDVV
of an event logA
always causally precedes or is equal to that event log's current version vector (CVV
, see also Storage order) i.e.DVV-A → CVV-A
orDVV-A = CVV-A
.Events may be deleted from a local event log
A
only if for all local event logsi
that are directly connected toA
the following condition holds after deletion:DVV-A → CVV-i
orDVV-A = CVV-i
. In other words, all events that should be physically deleted from a logA
must have already been replicated to all directly connected logsi
. This does not only ensure that all events are replicated to directly connected event logs but also prevents the replication anomaly described in Cyclic replication networks and event deletion (see Appendix).Disaster recovery
A disaster is defined as total or partial event loss at a given location. In contrast to event deletion, event loss always starts from a given position in a local event log (see also section Disaster recovery in the user documentation). Goal of disaster recovery is to recover local event log state from directly connected locations in the replication network. Disaster recovery has a mandatory metadata recovery phase and an optional event recovery phase.
Metadata recovery
With a disaster, not only events but also the local time of the local vector clock is lost. Without properly recovering the clock, it may start running again in the past i.e. at a time before the disaster happened. This may lead to local time values that are perceived as duplicates in the replication network which must be prevented because it breaks causality and interferes with causality filters.
Therefore, during metadata recovery, the local time of the local vector clock must be set to maximum of values seen at all other directly connected locations. Furthermore, the replication progress, recorded at directly connected locations, must be reset to the sequence number of the last event in the local event log or to zero if the local event log is completely lost.
Event recovery
Event recovery means replicating events back from locations that are directly connected to the location to be recovered. These directly connected locations must have unfiltered bi-directional replication connections with the location to be recovered. For example, a location
A
can only be recovered from a locationB
if both replication connectionsA -> B
andA <- B
are unfiltered.Event recovery over filtered connections is only supported for terminal locations i.e. locations that are connected to only one other location. When replicating events to the terminal location during recovery, application-specific replication filters must be OR-ed with a filter that accepts events that have been initially emitted at that terminal location.
Restrictions also apply to event recovery from locations with deleted events. Assuming that location
A
should be recovered from locationB
, event recovery is only allowed if the deletion version vector ofB
causally precedes or is equal to the current version vector ofA
i.e.DVV-B → CVV-A
orDVV-B = CVV-A
.Assuming that location
A
can be recovered from locationsB
andC
but onlyB
meets this condition, a first round of event recovery must be attempted fromB
. After recovery fromB
has completed,CVV-A
has an updated value and the condition must be evaluated forC
again. If it is met, event recovery must be attempted fromC
in a second round.Hint: Applications that want to delete events older than n days from their local event logs should consider local event log backups at intervals of m days with m < n in order to meet the conditions for event recovery from locations with deleted events.
Location addition
Location addition requires additional rules as a new location may introduce a new cycle which may conflict with ongoing event deletion as outlined in Cyclic replication networks and event deletion (see Appendix). For example, consider an acyclic replication network
ABC
:A
emits eventse1
,e2
ande3
withe1 → e2 → e3
which are replicated toB
but not yet toC
. ThenA
deletes eventse1
ande2
which is allowed by the rules established so far. Now, locationD
is addedand two bi-directional replication connections,
A - D
andC - D
, are established. This may lead to a situation wheree3
is replicated fromA
toD
and then toC
. As a consequence, the causality filter atC
will later reject eventse1
ande2
fromB
.To prevent this anomaly, further rules must be defined for event replication over new replication connections. This doesn't only apply to connections to and from new locations but also to new replication connections between existing locations: event replication over a new replication connection from location
X
to locationY
may only start if the deletion version vector ofX
(DVV-X
) causally precedes or is equal to the current version vector ofY
(CVV-Y
) i.e.DVV-X → CVV-Y
orDVV-X = CVV-Y
Applying this rule to the above example, replication between
C
andD
would start immediately in both directions and betweenA
andD
only in direction fromD
toA
. Replication fromA
toD
only starts afterD
received eventse1
ande2
fromC
, preventing the anomaly thatC
rejects eventse1
ande2
.For the special case that location
D
is not even interested in getting all past events fromA
andC
but only wants to receive new events, it must first set both itsCVV
and itsDVV
to the least upper bound of theCVV
s of locationsA
andC
. More generally, a new locationX
that only wants to receive new events from all new directly connected locationsY1
-Yn
must additionally set itsCVV-X
andDVV-X
to theLUB(DVV-Y1, ..., DVV-Yn)
, before applying the previous rule.Location retirement
Locations can be permanently removed from a replication network. After having been permanently removed they are referred to as retired locations. Retired locations do not contribute to a replication network topology even if they have been part of its history. The identifiers of retired locations can also be removed from vector clocks.
Global topology view
In order to enforce the constraints of the extended system model, each location must have a global view of the replication network topology including its full history. Assuming that each location emits system events with information about topology changes, replication filter changes and location retirements, each location can construct such a view with eventual consistency.
Appendix
The following subsections give some examples of the replication anomalies described in section Current issues.
Cyclic replication networks with filtered connections
Context:
A
,B
andC
A -> C
is filtered so that only evente3
is replicated.Scenario:
A
emits eventse1
,e2
ande3
withe1 → e2 → e3
e3
is replicated fromA
toC
e3
is replicated fromC
toB
A
toB
startsProblem:
B
will never storee1
ande2
in its event log because they causally precedee3
Cyclic replication networks and event deletion
Context:
A
,B
andC
Scenario:
C
emits eventse1
,e2
ande3
withe1 → e2 → e3
e1
,e2
ande3
are replicated toB
B
deletese1
ande2
e3
is replicated fromB
toA
C
toA
startsProblem:
A
will never storee1
ande2
in its event log because they causally precedee3
Disaster recovery over filtered connections
Context:
A
,B
andC
A
andB
as well asA
andC
A -> B
is filtered so that only eventse1
ande2
are replicated.A -> C
is filtered so that only evente3
is replicated.Scenario:
A
emits eventse1
,e2
ande3
withe1 → e2 → e3
e1
ande2
are replicated toB
e3
is replicated toC
A
looses all events (disaster)A
is recovered by first replicatinge3
fromC
e1
ande2
fromB
is attemptedProblem:
A
will never storee1
ande2
in its local event log because they causally precedee3
The text was updated successfully, but these errors were encountered: