ceph · dmick · Mar 23, 2022 · Mar 8, 2022 · dmick · Mar 15, 2022
diff --git a/INSTALL.md b/INSTALL.md
@@ -0,0 +1,126 @@
+# Ceph Telemetry Installation
+
+## Minimum requirements
+- RHEL 8 based OS
+- PostgreSQL version 10.0 and up. Tested up to 14.2
+- Grafana open source 8.1 and up
+- Apache HTTP Server 2.4 and up
+- 16 GB RAM
+- 4 cores processor for 2500 reporting clusters
+- Disk space - On average 430 KB per cluster per day.
+
+## Clone the Telemetry git repository
+We will clone the repository into the user's home directory and will reference this path later in the installation
+```bash
+cd ~
+git clone https://github.com/ceph/ceph-telemetry.git
+```
+
+## Install PostgreSQL and Grafana
+You can install Postgres and Grafana from RPM or as containers. Below is how to install as containers
+
+1. Create a directories for the grafana container persistent storage, and populate them. In this example we'll use the base dir /opt/telemetry_grafana.
+```bash
+sudo mkdir /opt/telemetry_grafana
+sudo mkdir -p /opt/telemetry_grafana/var/lib/grafana
+sudo chmod a+rwx /opt/telemetry_grafana/var/lib/grafana
+sudo mkdir -p /opt/telemetry_grafana/etc/grafana/provisioning/dashboards
+sudo mkdir -p /opt/telemetry_grafana/etc/grafana_dashboards
+
+sudo cp ~/ceph-telemetry/install/grafana_dashboards_ini.yml /opt/telemetry_grafana/etc/grafana/provisioning/dashboards
+sudo cp -a ~/ceph-telemetry/dashboard/private/* /opt/telemetry_grafana/etc/grafana_dashboards
+sudo find /opt/telemetry_grafana/etc/grafana_dashboards/ -name "*.json" -exec sed -i "s/\${DS_POSTGRESQL}/PostgreSQL/g" {} \;
+```
+
+2. Edit the docker-compose template `install/docker-compose.yml`. Change the following:
+   - <postgres_password> : Choose a password for the database "postgres" user, which is the PostgreSQL super user
+   - <postgres_host_storage_path> : persistent storage for the database files
+   - <telemetry_server_FQDN> : FQDN of the server on which Grafana is running
+   - /opt/telemetry_grafana : persistent storage basedir for the Grafana database files
+3. Run `cd install; docker-compose up -d`
+
+### Provision database
+#### Create roles, passwords, and data source names (DSN)
+```bash
+sudo mkdir -p /opt/telemetry # Stores database passwords and DSNs (connection strings)
+
+# Create passwords for the various database users
+uuidgen -r | sudo tee /opt/telemetry/pg_pass_telemetry
+uuidgen -r | sudo tee /opt/telemetry/pg_pass_grafana
+uuidgen -r | sudo tee /opt/telemetry/pg_pass_grafana_ro
+uuidgen -r | sudo tee /opt/telemetry/pg_pass_dashboard
+echo host=127.0.0.1 dbname=telemetry user=grafana password=$(cat /opt/telemetry/pg_pass_grafana) |sudo tee /opt/telemetry/grafana.dsn
+
+```
+Run in `psql`, replacing the $PG_PASS* with the corresponding passwords generated above
+```SQL
+CREATE USER telemetry WITH PASSWORD '$PG_PASS_TELEMETRY';
+CREATE USER grafana WITH PASSWORD '$PG_PASS_GRAFANA';
+CREATE USER grafana_ro WITH PASSWORD '$PG_PASS_GRAFANA_RO';
+CREATE USER dashboard WITH PASSWORD '$PG_PASS_DASHBOARD' NOINHERIT;
+CREATE DATABASE telemetry OWNER telemetry;
+```
+
+#### Import DDLs
+```bash
+cd ~/ceph-telemetry
+psql -v ON_ERROR_STOP=1 -b -h 127.0.0.1 -U telemetry telemetry < tables.txt
+psql -v ON_ERROR_STOP=1 -b -h 127.0.0.1 -U postgres telemetry < db_create_cluster.sql
+psql -v ON_ERROR_STOP=1 -b -h 127.0.0.1 -U telemetry telemetry < db_create_device.sql
+psql -v ON_ERROR_STOP=1 -b -h 127.0.0.1 -U postgres telemetry < db_create_roles.sql
+psql -v ON_ERROR_STOP=1 -b -h 127.0.0.1 -U grafana telemetry < db_create_dashboard.sql
+psql -v ON_ERROR_STOP=1 -b -h 127.0.0.1 -U grafana telemetry < db_create_dashboard_device.sql
+```
+### Configure Grafana
+1. Login to Grafana via a browser (port 3000) with the default username 'admin' and password 'admin'.
+2. Configure a data source of the postgres server
+   1. Use `grafana_ro` as the database user, with the password that is saved in `/opt/telemetry/pg_pass_grafana_ro`
+   2. You may need to use the host's IP address (not localhost)
+
+## Install Apache HTTP server
+1. run:
+```bash
+sudo dnf install -y httpd python3-mod_wsgi mod_ssl mod_evasive openssl python3-requests python3-flask python3-flask-restful python3-psycopg2 lz4
+sudo cp ~/ceph-telemetry/install/telemetry-ssl.conf /etc/httpd/conf.d/
+```
+2. Generate web server certificates for the telemetry server's public FQDN. Below instructions are of how to generate self-signed certificates that should not be used in production
+```bash
+sudo mkdir -p /etc/telemetry/ssl
+sudo openssl req -x509 -nodes -newkey rsa:2048 -keyout /etc/telemetry/ssl/telemetry.key -out /etc/telemetry/ssl/telemetry.crt
+```
+3. Edit `/etc/httpd/conf.d/telemetry-ssl.conf` and change the following to match your environment:
+   - ServerName
+   - SSLCertificateFile, SSLCertificateKeyFile
+4. You may need to configure SELinux to allow httpd access to the telemetry wsgi using `semanage permissive -a httpd_t`
+5. run:
+```bash
+sudo systemctl enable --now httpd
+```
+
+## Install the Telemetry server
+run:
+```bash
+cd ~/ceph-telemetry
+sudo cp -a server /opt/telemetry/
+sudo cp import_crashes.py import_clusters.py import_devices.py compress_raw_reports_telemetry.sh dbhelper.py compress_raw_reports_telemetry.sh /opt/telemetry/
+cd /opt/telemetry
+sudo ln -s pg_pass_telemetry pg_pass.txt
+sudo mkdir log
+sudo chown apache log
+sudo mkdir raw
+sudo chmod a+rwx raw
+```
+#### Add Telemetry importers to cron
+Create a "telemetry" user unix account and then run:
+```bash
+sudo crontab -u telemetry install/crontab_telemetry
+```
+
+# Backing up the Telemetry server
+Backing up the server involves backing up the PostgreSQL database by running
+```bash
+PGPASSWORD=<postgres_password> pg_dumpall postgres | lz4 -c > telemetry_server.sql.lz4
+```
+and then copying the resultant file out of the telemetry server host.
+Please note that <postgres_password> is the same as in the docker-compose.yml
+file.
diff --git a/compress_raw_reports_telemetry.sh b/compress_raw_reports_telemetry.sh
@@ -0,0 +1,16 @@
+#! /bin/bash
+
+# Deal with spaces in filenames:
+# https://www.reeltoreel.nl/wiki/index.php/Dealing_with_spaces_in_filenames
+# The default value of IFS is " \t\n" (e.g. <space><tab><newline>)
+SAVEIFS=$IFS
+IFS=$(echo -en "\n\b")
+
+path="/opt/telemetry/raw/"
+
+for f in `ls -1 $path | grep -v ".gz"`; do
+  echo $f
+  gzip $path/$f
+done
+
+IFS=$SAVEIFS
diff --git a/dashboard/private/cluster/cluster_all_reports_by_cluster_id.json b/dashboard/private/cluster/cluster_all_reports_by_cluster_id.json
@@ -0,0 +1,202 @@
+{
+  "__inputs": [
+    {
+      "name": "DS_POSTGRESQL",
+      "label": "PostgreSQL",
+      "description": "",
+      "type": "datasource",
+      "pluginId": "postgres",
+      "pluginName": "PostgreSQL"
+    }
+  ],
+  "__requires": [
+    {
+      "type": "grafana",
+      "id": "grafana",
+      "name": "Grafana",
+      "version": "8.1.2"
+    },
+    {
+      "type": "datasource",
+      "id": "postgres",
+      "name": "PostgreSQL",
+      "version": "1.0.0"
+    },
+    {
+      "type": "panel",
+      "id": "table",
+      "name": "Table",
+      "version": ""
+    }
+  ],
+  "annotations": {
+    "list": [
+      {
+        "builtIn": 1,
+        "datasource": "-- Grafana --",
+        "enable": true,
+        "hide": true,
+        "iconColor": "rgba(0, 211, 255, 1)",
+        "name": "Annotations & Alerts",
+        "target": {
+          "limit": 100,
+          "matchAny": false,
+          "tags": [],
+          "type": "dashboard"
+        },
+        "type": "dashboard"
+      }
+    ]
+  },
+  "editable": true,
+  "gnetId": null,
+  "graphTooltip": 0,
+  "id": null,
+  "iteration": 1646868391941,
+  "links": [],
+  "panels": [
+    {
+      "datasource": "${DS_POSTGRESQL}",
+      "fieldConfig": {
+        "defaults": {
+          "custom": {
+            "align": "left",
+            "displayMode": "auto"
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {
+                "color": "green",
+                "value": null
+              },
+              {
+                "color": "red",
+                "value": 80
+              }
+            ]
+          }
+        },
+        "overrides": [
+          {
+            "matcher": {
+              "id": "byName",
+              "options": "id"
+            },
+            "properties": [
+              {
+                "id": "links",
+                "value": [
+                  {
+                    "title": "See report",
+                    "url": "/d/hyqCQ97Mk/raw-cluster-report?orgId=1&var-id=${__data.fields[id]}"
+                  }
+                ]
+              }
+            ]
+          }
+        ]
+      },
+      "gridPos": {
+        "h": 11,
+        "w": 24,
+        "x": 0,
+        "y": 0
+      },
+      "id": 2,
+      "options": {
+        "showHeader": true,
+        "sortBy": [
+          {
+            "desc": true,
+            "displayName": "id"
+          }
+        ]
+      },
+      "pluginVersion": "8.1.2",
+      "targets": [
+        {
+          "format": "table",
+          "group": [],
+          "metricColumn": "none",
+          "rawQuery": true,
+          "rawSql": "select\nreport_stamp::TEXT, id\nfrom public.report\nwhere \ncluster_id = '$id';",
+          "refId": "A",
+          "select": [
+            [
+              {
+                "params": [
+                  "value"
+                ],
+                "type": "column"
+              }
+            ]
+          ],
+          "timeColumn": "time",
+          "where": [
+            {
+              "name": "$__timeFilter",
+              "params": [],
+              "type": "macro"
+            }
+          ]
+        }
+      ],
+      "timeFrom": null,
+      "timeShift": null,
+      "title": "All reports of cluster_id: $id",
+      "type": "table"
+    }
+  ],
+  "schemaVersion": 30,
+  "style": "dark",
+  "tags": [],
+  "templating": {
+    "list": [
+      {
+        "current": {
+          "selected": false,
+          "text": "ffffffff-ffff-ffff-ffff-ffffffffffff",
+          "value": "ffffffff-ffff-ffff-ffff-ffffffffffff"
+        },
+        "description": null,
+        "error": null,
+        "hide": 0,
+        "label": "id",
+        "name": "id",
+        "options": [
+          {
+            "selected": true,
+            "text": "ffffffff-ffff-ffff-ffff-ffffffffffff",
+            "value": "ffffffff-ffff-ffff-ffff-ffffffffffff"
+          }
+        ],
+        "query": "ffffffff-ffff-ffff-ffff-ffffffffffff",
+        "skipUrlSync": false,
+        "type": "textbox"
+      }
+    ]
+  },
+  "time": {
+    "from": "now-6h",
+    "to": "now"
+  },
+  "timepicker": {
+    "refresh_intervals": [
+      "10s",
+      "30s",
+      "1m",
+      "5m",
+      "15m",
+      "30m",
+      "1h",
+      "2h",
+      "1d"
+    ]
+  },
+  "timezone": "",
+  "title": "All reports by cluster id",
+  "uid": "GJkuC3nMk",
+  "version": 7
+}