Skip to content
Frank Schröder edited this page Oct 30, 2017 · 2 revisions

Metrics

This document describes the fabio metrics layer and documents the transition from the go-metrics based layer to a more flexible approach. Once that transition has been completed the documentation for the transition will be removed.

Why change?

Fabio metrics started out with an implementation of the go-metrics library mostly for Graphite since that's what we were using at eCG. This became somewhat more flexible over time but the design doesn't make it easy to add providers like Dogstatd, Prometheus and others which support tagged metrics.

Also, the go-metrics library aggregates histograms internally which does not work well with providers like statsd and Circonus which do the histogram aggregation on the server. Fabio does not support multiple metrics providers simultaneously which makes migration between metrics systems difficult. And last but not least the go-metrics library hasn't seen significant updates in over a year. The last commit is from 28 Nov 2016.

Fabio Metrics <= 1.5.x

Fabio currently supports Graphite, statsd, Circonus and stdout for debugging.

The metric names fall into two groups: service metrics and internal metrics.

Service metric names are generated with the template defined in metrics.names which is by default <service>.<domain>.<path>.<host:port>

Internal metric names are hard-coded like http.status.code.200 or notfound.

All metrics names can have a prefix which can be configured through a template defined in metrics.prefix and which defaults to <hostname>.<exec name>.

The Graphite and statsd providers provide aggregated histograms whereas the Circonus provider sends events to the server.

There are several issues open for additional providers:

Fabio currently provides the following metrics:

Depending on the metrics provider the timer aggregation happens either in the metrics library (go-metrics: statsd, graphite) or in the system of the metrics provider (Circonus)

Name Type Description
http.status.code.${stauts_code} timer aggregation over all http requests per status code
notfound counter counts all http route lookup failures
requests timer aggregation of all http requests
ws.conn gauge current number of open web socket connections
tcp.conn counter counts the number of successful TCP proxy connections
tcp.connfail counter counts the number of failed TCP connections
tcp.noroute counter counts the number of TCP route lookup failures
tcp_sni.conn counter counts the number of successful TCP+SNI proxy connections
tcp_sni.connfail counter counts the number of failed TCP+SNI connections
tcp_sni.noroute counter counts the number of TCP+SNI route lookup failures
{{ metrics.name }} timer
{{ metrics.name }}.rx counter
{{ metrics.name }}.tx counter
  • timer - counts events and provides an average throughput and latency number
  • counter - counts events and provides an monotonically increasing value
  • gauge - current value

New approach

A new metrics layer must be flexible enough support aggregation in process or on the server. It needs to support flat namespaces and tags and it needs to be compatible with existing fabio installations.

These metrics libraries are in use by other projects:

armon/go-metrics supports circonus, graphite, statsd, statsite, datadog and prometheus.

go-kit/kit/metrics supports cloudwatch, dogstatd, expvar, graphite, influx, pcp, prometheus, statsd. Circonus was supported but later removed because of flaky tests.

go-kit/kit/metrics is the best fit for what fabio provides today and what users want. Existing go-metrics implementations could be written as legacy drivers, if necessary.

The problem that go-kit does not solve however is the name generation for the different metrics providers. Providers like Graphite and statsd which do not support tags need a flat name space with the tag values coded into the name of the metric. Tagged providers can have more generic names and provide additional names as tags. Then we also need to support the existing legacy metric names.

Fabio could make these names configurable with sensible defaults for each provider. However, this would add quite a number of config options which would almost never be changed. Also, we need to decide which attributes should be tagged and which should be part of the name and whether those attributes should be configurable at all or even for each provider.

Metrics names could be evaluated at runtime, e.g. through the Go template engine. However, we would need to determine the alloc overhead for this evaluation since this code is in the hot path and is executed a lot.

Since providers are either tagged or not tagged we could provide two names for each metric and depending on which provider is used we use either the one or the other.

Legacy Name Flat name Tagged name
http.status.code.${stauts_code} http.status.code.${status_code} http.status code:${status_code}
notfound http.noroute http.noroute
requests http.requests http.requests
ws.conn ws.conn ws.conn
tcp.conn tcp.conn tcp.conn
tcp.connfail tcp.connfail tcp.connfail
tcp.noroute tcp.noroute tcp.noroute
tcp_sni.conn tcp_sni.conn tcp_sni.conn
tcp_sni.connfail tcp_sni.connfail tcp_sni.connfail
tcp_sni.noroute tcp_sni.noorute tcp_sni.noroute
{{ metrics.name }} {{ metrics.name }} {{ metrics.tagged_name }} service:<svc> host:<host:port>
{{ metrics.name }}.rx {{ metrics.name }}.rx {{ metrics.tagged_name }}.rx service:<svc> host:<host:port>
{{ metrics.name }}.tx {{ metrics.name }}.tx {{ metrics.tagged_name }}.tx service:<svc> host:<host:port>