-
Notifications
You must be signed in to change notification settings - Fork 2
/
arch.tex
128 lines (114 loc) · 6.89 KB
/
arch.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
\section{System Architecture}
\label{sec:architecture}
In this section, we briefly outline an architecture for providing
consistency prediction, monitoring, and verification given our
experiences implementing them in two production NoSQL
stores---Cassandra and Voldemort. The techniques used to implement PBS
are applicable to any system for which we can model the replication
protocol and only require lightweight profiling.
%require only lightweight latency profiling in the underlying data
%store.
%Introductory paragraph explaining the system architecture. Explain that our
%approach is to make PBS a shim layer on top of existing server code. Latency
%measurements etc.
\begin{figure}
\centering
\includegraphics[width=\columnwidth]{figs/cluster-arch.pdf}
\caption{Architecture for integrating PBS metrics with an existing
data store. The PBS prediction and consistency monitoring modules
provide metrics used by applications like dynamic
query tuning, monitoring, and diagnostic tools.}\vspace{-1em}
\label{fig:pbs-sys-arch}
\end{figure}
\subsection{Data Store Architecture}
\label{sec:dbarch}
The lower two layers of Figure~\ref{fig:pbs-sys-arch} show the changes
that are required to integrate PBS-style metrics into an existing
distributed data store. In each replica, or storage node, we perform
lightweight latency profiling by piggybacking timestamps on messages
exchanged between servers. Two separate modules providing consistency
prediction and monitoring subsequently process this data. The
modules are cleanly separable from the main data store implementation
and can be adapted to different systems.\\
%Walk through the system diagram - Explain what each component does. In
%particular explain how the latency collector feeds into the PBS prediciton
%module and how the Monte-Carlo simulation can be run using this data.
\noindent\textbf{PBS Prediction:} The PBS prediction module is
responsible for calculating the $t$-visibility and $k$-staleness of
the data store. This module records the latencies for relevant
operations in the data store (in Dynamo-style systems, the respective
latencies for writes, acknowledgments, reads, and responses---the PBS
\textit{WARS} model) that are sent during replication. The prediction
module tracks a moving window of recent latencies across servers, and,
when an end-user (or higher-level component) requests a consistency
prediction, the module uses the latencies to provide a prediction
using Monte Carlo analysis. The module simulates the interactions
between thousands of read and write requests that behave
stochastically according to the observed distributions. This allows us
to estimate $t$-visibility, $k$-staleness, and latency for the
operations under a range of parameter choices: replication factors,
anti-entropy rates, and node failures. In our Cassandra
implementation, we found that thousands of prediction trials typically
complete within a second.\\
%Talk about how the simulation can be triggered by either the database
%administration tool or the Consistency SLA verifier module. The consistency sla
%verifier accepts SLA specifications from clients and periodically triggers the
%prediction module - This module also completes the loop and adjusts the value of
%N,R and W until the consistency SLA is met.
\noindent{\textbf{Consistency Monitoring:}} While predictions are useful, it is
also beneficial to know what consistency a data store is actually providing.
While PBS gives a probability distribution function (PDF) of reads that are
consistent after a fixed amount of time, consistency monitoring amounts to
integrating over the PDF: what percent of reads are actually consistent?
Black box monitoring is somewhat more difficult than predictions because
determining the latest write effectively requires consensus about the value of
the latest write. This requires substantially more coordination than prediction
but is the subject of considerable ongoing research~\cite{hotdep}.
On the other hand, our verifier for Cassandra uses \textit{white box}
techniques similar to the PBS predictor. Even though an operation
might complete with a reply from a single replica (say $R$=$1$), the
monitoring module asynchronously collects data from all replicas. The
monitoring modules periodically exchange timing metadata for writes,
which is passed to an online analysis algorithm that detects
consistency violations.
\subsection{Userspace Tools}
As we described in Section~\ref{sec:dynamic}, end-users can leverage
consistency prediction to build tools to inform and enforce
consistency SLAs. A simple, standalone tool can track the consistency
and latency profile of queries over time by periodically invoking the
PBS predictor. Database administrators can specify desired SLAs in
terms of acceptable values for a combination of $t$-visibility,
$k$-staleness, and operation latency. Based on the predictions, the
enforcement tool can adjust replication parameters to ensure that the
SLA is met while minimizing observed latency. For typical replication
factors, the number of possible configuration is limited, meaning the
tool can likely perform a complete state space search.
%Finally talk about how the database administration tool can also trigger the pbs
%prediction. Explain how this works with respect to nodetool and the flexibility
%they have with respect to the query inteface.
Database administrators can also leverage PBS predictions for greater
insight into the latency-consistency trade-offs for their
deployment. In our Cassandra implementation, we have added a new
command (\texttt{predictconsistency}) to \texttt{nodetool}, a widely
used administration interface. The \texttt{predictconsistency}
command provides a flexible interface for performing PBS
prediction. Data store administrators specify a hypothetical
replication configuration ($N$, $R$, $W$) along with a write
visibility time and subsequently receive a staleness prediction. This
allows administrators to perform ``what-if'' analysis and project the
impact on latency and consistency as their workload changes. We also
export the consistency metrics over a Java MBean interface, meaning
any tool that can interface with MBeans can receive prediction data.
%Last talk a little bit about the monitoring tools and how one can export the PBS
%metrics in a time series format and alert the users if some trigger has not been
%meet for the given time period.
Finally, administrators can use PBS to monitor the consistency
provided by their data store. In contrast to offline consistency
verification schemes~\cite{podc-hpl}, PBS provides a lightweight
mechanism that can be used for (prediction-based) monitoring. Existing
monitoring solutions like Ganglia or DataStax's OpsCenter periodically
issue PBS prediction requests and plot the values of $t$-visibility
and $k$-staleness as a time series. This can then be integrated with
rule-based alerting systems to page an operator if the value of
$t$-visibility passes beyond a particular threshold.
%for an extended time period.