Skip to content

Commit

Permalink
INSTALL: Add telemetry server installation documentation
Browse files Browse the repository at this point in the history
Add also the dashboard's JSON files for an easy Grafana installation.

Signed-off-by: Yaarit Hatuka <[email protected]>
  • Loading branch information
yaarith committed Mar 17, 2022
1 parent 9593c94 commit e9b09b7
Show file tree
Hide file tree
Showing 34 changed files with 19,002 additions and 3 deletions.
126 changes: 126 additions & 0 deletions INSTALL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Ceph Telemetry Installation

## Minimum requirements
- RHEL 8 based OS
- PostgreSQL version 10.0 and up. Tested up to 14.2
- Grafana open source 8.1 and up
- Apache HTTP Server 2.4 and up
- 16 GB RAM
- 4 cores processor for 2500 reporting clusters
- Disk space - On average 430 KB per cluster per day.

## Clone the Telemetry git repository
We will clone the repository into the user's home directory and will reference this path later in the installation
```bash
cd ~
git clone https://github.com/ceph/ceph-telemetry.git
```

## Install PostgreSQL and Grafana
You can install Postgres and Grafana from RPM or as containers. Below is how to install as containers

1. Create a directories for the grafana container persistent storage, and populate them. In this example we'll use the base dir /opt/telemetry_grafana.
```bash
sudo mkdir /opt/telemetry_grafana
sudo mkdir -p /opt/telemetry_grafana/var/lib/grafana
sudo chmod a+rwx /opt/telemetry_grafana/var/lib/grafana
sudo mkdir -p /opt/telemetry_grafana/etc/grafana/provisioning/dashboards
sudo mkdir -p /opt/telemetry_grafana/etc/grafana_dashboards

sudo cp ~/ceph-telemetry/install/grafana_dashboards_ini.yml /opt/telemetry_grafana/etc/grafana/provisioning/dashboards
sudo cp -a ~/ceph-telemetry/dashboard/private/* /opt/telemetry_grafana/etc/grafana_dashboards
sudo find /opt/telemetry_grafana/etc/grafana_dashboards/ -name "*.json" -exec sed -i "s/\${DS_POSTGRESQL}/PostgreSQL/g" {} \;
```

2. Edit the docker-compose template `install/docker-compose.yml`. Change the following:
- <postgres_password> : Choose a password for the database "postgres" user, which is the PostgreSQL super user
- <postgres_host_storage_path> : persistent storage for the database files
- <telemetry_server_FQDN> : FQDN of the server on which Grafana is running
- /opt/telemetry_grafana : persistent storage basedir for the Grafana database files
3. Run `cd install; docker-compose up -d`

### Provision database
#### Create roles, passwords, and data source names (DSN)
```bash
sudo mkdir -p /opt/telemetry # Stores database passwords and DSNs (connection strings)

# Create passwords for the various database users
uuidgen -r | sudo tee /opt/telemetry/pg_pass_telemetry
uuidgen -r | sudo tee /opt/telemetry/pg_pass_grafana
uuidgen -r | sudo tee /opt/telemetry/pg_pass_grafana_ro
uuidgen -r | sudo tee /opt/telemetry/pg_pass_dashboard
echo host=127.0.0.1 dbname=telemetry user=grafana password=$(cat /opt/telemetry/pg_pass_grafana) |sudo tee /opt/telemetry/grafana.dsn

```
Run in `psql`, replacing the $PG_PASS* with the corresponding passwords generated above
```SQL
CREATE USER telemetry WITH PASSWORD '$PG_PASS_TELEMETRY';
CREATE USER grafana WITH PASSWORD '$PG_PASS_GRAFANA';
CREATE USER grafana_ro WITH PASSWORD '$PG_PASS_GRAFANA_RO';
CREATE USER dashboard WITH PASSWORD '$PG_PASS_DASHBOARD' NOINHERIT;
CREATE DATABASE telemetry OWNER telemetry;
```

#### Import DDLs
```bash
cd ~/ceph-telemetry
psql -v ON_ERROR_STOP=1 -b -h 127.0.0.1 -U telemetry telemetry < tables.txt
psql -v ON_ERROR_STOP=1 -b -h 127.0.0.1 -U postgres telemetry < db_create_cluster.sql
psql -v ON_ERROR_STOP=1 -b -h 127.0.0.1 -U telemetry telemetry < db_create_device.sql
psql -v ON_ERROR_STOP=1 -b -h 127.0.0.1 -U postgres telemetry < db_create_roles.sql
psql -v ON_ERROR_STOP=1 -b -h 127.0.0.1 -U grafana telemetry < db_create_dashboard.sql
psql -v ON_ERROR_STOP=1 -b -h 127.0.0.1 -U grafana telemetry < db_create_dashboard_device.sql
```
### Configure Grafana
1. Login to Grafana via a browser (port 3000) with the default username 'admin' and password 'admin'.
2. Configure a data source of the postgres server
1. Use `grafana_ro` as the database user, with the password that is saved in `/opt/telemetry/pg_pass_grafana_ro`
2. You may need to use the host's IP address (not localhost)

## Install Apache HTTP server
1. run:
```bash
sudo dnf install -y httpd python3-mod_wsgi mod_ssl mod_evasive openssl python3-requests python3-flask python3-flask-restful python3-psycopg2 lz4
sudo cp ~/ceph-telemetry/install/telemetry-ssl.conf /etc/httpd/conf.d/
```
2. Generate web server certificates for the telemetry server's public FQDN. Below instructions are of how to generate self-signed certificates that should not be used in production
```bash
sudo mkdir -p /etc/telemetry/ssl
sudo openssl req -x509 -nodes -newkey rsa:2048 -keyout /etc/telemetry/ssl/telemetry.key -out /etc/telemetry/ssl/telemetry.crt
```
3. Edit `/etc/httpd/conf.d/telemetry-ssl.conf` and change the following to match your environment:
- ServerName
- SSLCertificateFile, SSLCertificateKeyFile
4. You may need to configure SELinux to allow httpd access to the telemetry wsgi using `semanage permissive -a httpd_t`
5. run:
```bash
sudo systemctl enable --now httpd
```

## Install the Telemetry server
run:
```bash
cd ~/ceph-telemetry
sudo cp -a server /opt/telemetry/
sudo cp import_crashes.py import_clusters.py import_devices.py compress_raw_reports_telemetry.sh dbhelper.py compress_raw_reports_telemetry.sh /opt/telemetry/
cd /opt/telemetry
sudo ln -s pg_pass_telemetry pg_pass.txt
sudo mkdir log
sudo chown apache log
sudo mkdir raw
sudo chmod a+rwx raw
```
#### Add Telemetry importers to cron
Create a "telemetry" user unix account and then run:
```bash
sudo crontab -u telemetry install/crontab_telemetry
```

# Backing up the Telemetry server
Backing up the server involves backing up the PostgreSQL database by running
```bash
PGPASSWORD=<postgres_password> pg_dumpall postgres | lz4 -c > telemetry_server.sql.lz4
```
and then copying the resultant file out of the telemetry server host.
Please note that <postgres_password> is the same as in the docker-compose.yml
file.
16 changes: 16 additions & 0 deletions compress_raw_reports_telemetry.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#! /bin/bash

# Deal with spaces in filenames:
# https://www.reeltoreel.nl/wiki/index.php/Dealing_with_spaces_in_filenames
# The default value of IFS is " \t\n" (e.g. <space><tab><newline>)
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")

path="/opt/telemetry/raw/"

for f in `ls -1 $path | grep -v ".gz"`; do
echo $f
gzip $path/$f
done

IFS=$SAVEIFS
202 changes: 202 additions & 0 deletions dashboard/private/cluster/cluster_all_reports_by_cluster_id.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
{
"__inputs": [
{
"name": "DS_POSTGRESQL",
"label": "PostgreSQL",
"description": "",
"type": "datasource",
"pluginId": "postgres",
"pluginName": "PostgreSQL"
}
],
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "8.1.2"
},
{
"type": "datasource",
"id": "postgres",
"name": "PostgreSQL",
"version": "1.0.0"
},
{
"type": "panel",
"id": "table",
"name": "Table",
"version": ""
}
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"target": {
"limit": 100,
"matchAny": false,
"tags": [],
"type": "dashboard"
},
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": null,
"iteration": 1646868391941,
"links": [],
"panels": [
{
"datasource": "${DS_POSTGRESQL}",
"fieldConfig": {
"defaults": {
"custom": {
"align": "left",
"displayMode": "auto"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "id"
},
"properties": [
{
"id": "links",
"value": [
{
"title": "See report",
"url": "/d/hyqCQ97Mk/raw-cluster-report?orgId=1&var-id=${__data.fields[id]}"
}
]
}
]
}
]
},
"gridPos": {
"h": 11,
"w": 24,
"x": 0,
"y": 0
},
"id": 2,
"options": {
"showHeader": true,
"sortBy": [
{
"desc": true,
"displayName": "id"
}
]
},
"pluginVersion": "8.1.2",
"targets": [
{
"format": "table",
"group": [],
"metricColumn": "none",
"rawQuery": true,
"rawSql": "select\nreport_stamp::TEXT, id\nfrom public.report\nwhere \ncluster_id = '$id';",
"refId": "A",
"select": [
[
{
"params": [
"value"
],
"type": "column"
}
]
],
"timeColumn": "time",
"where": [
{
"name": "$__timeFilter",
"params": [],
"type": "macro"
}
]
}
],
"timeFrom": null,
"timeShift": null,
"title": "All reports of cluster_id: $id",
"type": "table"
}
],
"schemaVersion": 30,
"style": "dark",
"tags": [],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "ffffffff-ffff-ffff-ffff-ffffffffffff",
"value": "ffffffff-ffff-ffff-ffff-ffffffffffff"
},
"description": null,
"error": null,
"hide": 0,
"label": "id",
"name": "id",
"options": [
{
"selected": true,
"text": "ffffffff-ffff-ffff-ffff-ffffffffffff",
"value": "ffffffff-ffff-ffff-ffff-ffffffffffff"
}
],
"query": "ffffffff-ffff-ffff-ffff-ffffffffffff",
"skipUrlSync": false,
"type": "textbox"
}
]
},
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
]
},
"timezone": "",
"title": "All reports by cluster id",
"uid": "GJkuC3nMk",
"version": 7
}
Loading

0 comments on commit e9b09b7

Please sign in to comment.