Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INSTALL: Add telemetry server installation documentation #23

Merged
merged 1 commit into from
Mar 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions INSTALL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Ceph Telemetry Installation

## Minimum requirements
- RHEL 8 based OS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how hard a requirement is this? Will, say, CentOS Stream work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently mod_evasive is not available in CentOS Stream, which is a problem

- PostgreSQL version 10.0 and up. Tested up to 14.2
- Grafana open source 8.1 and up
- Apache HTTP Server 2.4 and up
- 16 GB RAM
- 4 cores processor for 2500 reporting clusters
- Disk space - On average 430 KB per cluster per day.

## Clone the Telemetry git repository
We will clone the repository into the user's home directory and will reference this path later in the installation
```bash
cd ~
git clone https://github.com/ceph/ceph-telemetry.git
```

## Install PostgreSQL and Grafana
You can install Postgres and Grafana from RPM or as containers. Below is how to install as containers

1. Create a directories for the grafana container persistent storage, and populate them. In this example we'll use the base dir /opt/telemetry_grafana.
```bash
sudo mkdir /opt/telemetry_grafana
sudo mkdir -p /opt/telemetry_grafana/var/lib/grafana
sudo chmod a+rwx /opt/telemetry_grafana/var/lib/grafana
sudo mkdir -p /opt/telemetry_grafana/etc/grafana/provisioning/dashboards
sudo mkdir -p /opt/telemetry_grafana/etc/grafana_dashboards

sudo cp ~/ceph-telemetry/install/grafana_dashboards_ini.yml /opt/telemetry_grafana/etc/grafana/provisioning/dashboards
sudo cp -a ~/ceph-telemetry/dashboard/private/* /opt/telemetry_grafana/etc/grafana_dashboards
sudo find /opt/telemetry_grafana/etc/grafana_dashboards/ -name "*.json" -exec sed -i "s/\${DS_POSTGRESQL}/PostgreSQL/g" {} \;
dmick marked this conversation as resolved.
Show resolved Hide resolved
```

2. Edit the docker-compose template `install/docker-compose.yml`. Change the following:
- <postgres_password> : Choose a password for the database "postgres" user, which is the PostgreSQL super user
- <postgres_host_storage_path> : persistent storage for the database files
- <telemetry_server_FQDN> : FQDN of the server on which Grafana is running
- /opt/telemetry_grafana : persistent storage basedir for the Grafana database files
3. Run `cd install; docker-compose up -d`

### Provision database
#### Create roles, passwords, and data source names (DSN)
```bash
sudo mkdir -p /opt/telemetry # Stores database passwords and DSNs (connection strings)

# Create passwords for the various database users
uuidgen -r | sudo tee /opt/telemetry/pg_pass_telemetry
uuidgen -r | sudo tee /opt/telemetry/pg_pass_grafana
uuidgen -r | sudo tee /opt/telemetry/pg_pass_grafana_ro
uuidgen -r | sudo tee /opt/telemetry/pg_pass_dashboard
echo host=127.0.0.1 dbname=telemetry user=grafana password=$(cat /opt/telemetry/pg_pass_grafana) |sudo tee /opt/telemetry/grafana.dsn

```
Run in `psql`, replacing the $PG_PASS* with the corresponding passwords generated above
```SQL
CREATE USER telemetry WITH PASSWORD '$PG_PASS_TELEMETRY';
CREATE USER grafana WITH PASSWORD '$PG_PASS_GRAFANA';
CREATE USER grafana_ro WITH PASSWORD '$PG_PASS_GRAFANA_RO';
CREATE USER dashboard WITH PASSWORD '$PG_PASS_DASHBOARD' NOINHERIT;
CREATE DATABASE telemetry OWNER telemetry;
```

#### Import DDLs
```bash
cd ~/ceph-telemetry
psql -v ON_ERROR_STOP=1 -b -h 127.0.0.1 -U telemetry telemetry < tables.txt
psql -v ON_ERROR_STOP=1 -b -h 127.0.0.1 -U postgres telemetry < db_create_cluster.sql
psql -v ON_ERROR_STOP=1 -b -h 127.0.0.1 -U telemetry telemetry < db_create_device.sql
psql -v ON_ERROR_STOP=1 -b -h 127.0.0.1 -U postgres telemetry < db_create_roles.sql
psql -v ON_ERROR_STOP=1 -b -h 127.0.0.1 -U grafana telemetry < db_create_dashboard.sql
psql -v ON_ERROR_STOP=1 -b -h 127.0.0.1 -U grafana telemetry < db_create_dashboard_device.sql
```
### Configure Grafana
1. Login to Grafana via a browser (port 3000) with the default username 'admin' and password 'admin'.
2. Configure a data source of the postgres server
1. Use `grafana_ro` as the database user, with the password that is saved in `/opt/telemetry/pg_pass_grafana_ro`
2. You may need to use the host's IP address (not localhost)

## Install Apache HTTP server
1. run:
```bash
sudo dnf install -y httpd python3-mod_wsgi mod_ssl mod_evasive openssl python3-requests python3-flask python3-flask-restful python3-psycopg2 lz4
sudo cp ~/ceph-telemetry/install/telemetry-ssl.conf /etc/httpd/conf.d/
```
2. Generate web server certificates for the telemetry server's public FQDN. Below instructions are of how to generate self-signed certificates that should not be used in production
```bash
sudo mkdir -p /etc/telemetry/ssl
sudo openssl req -x509 -nodes -newkey rsa:2048 -keyout /etc/telemetry/ssl/telemetry.key -out /etc/telemetry/ssl/telemetry.crt
```
3. Edit `/etc/httpd/conf.d/telemetry-ssl.conf` and change the following to match your environment:
- ServerName
- SSLCertificateFile, SSLCertificateKeyFile
4. You may need to configure SELinux to allow httpd access to the telemetry wsgi using `semanage permissive -a httpd_t`
dmick marked this conversation as resolved.
Show resolved Hide resolved
5. run:
```bash
sudo systemctl enable --now httpd
```

## Install the Telemetry server
run:
```bash
cd ~/ceph-telemetry
sudo cp -a server /opt/telemetry/
sudo cp import_crashes.py import_clusters.py import_devices.py compress_raw_reports_telemetry.sh dbhelper.py compress_raw_reports_telemetry.sh /opt/telemetry/
cd /opt/telemetry
sudo ln -s pg_pass_telemetry pg_pass.txt
sudo mkdir log
sudo chown apache log
sudo mkdir raw
sudo chmod a+rwx raw
```
#### Add Telemetry importers to cron
Create a "telemetry" user unix account and then run:
```bash
sudo crontab -u telemetry install/crontab_telemetry
```

# Backing up the Telemetry server
Backing up the server involves backing up the PostgreSQL database by running
```bash
PGPASSWORD=<postgres_password> pg_dumpall postgres | lz4 -c > telemetry_server.sql.lz4
```
and then copying the resultant file out of the telemetry server host.
Please note that <postgres_password> is the same as in the docker-compose.yml
file.
16 changes: 16 additions & 0 deletions compress_raw_reports_telemetry.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#! /bin/bash

# Deal with spaces in filenames:
# https://www.reeltoreel.nl/wiki/index.php/Dealing_with_spaces_in_filenames
# The default value of IFS is " \t\n" (e.g. <space><tab><newline>)
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
dmick marked this conversation as resolved.
Show resolved Hide resolved

path="/opt/telemetry/raw/"

for f in `ls -1 $path | grep -v ".gz"`; do
Copy link

@jeffvance jeffvance Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to be concerned with directories matching the ls|grep? If so then consider using something like for f in $(find . -type f -not -name "*.gz"); do ...
A side benefit is find may not need you to set and restore IFS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. This directory is private to the telemetry server and is always flat with nothing but the raw reports.
Maybe the cleanest would be:
find . -type f -not -name "*.gz" -exec "gzip..."
this way we don't need to iterate at all.

echo $f
gzip $path/$f
done

IFS=$SAVEIFS
dmick marked this conversation as resolved.
Show resolved Hide resolved
202 changes: 202 additions & 0 deletions dashboard/private/cluster/cluster_all_reports_by_cluster_id.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
{
"__inputs": [
{
"name": "DS_POSTGRESQL",
"label": "PostgreSQL",
"description": "",
"type": "datasource",
"pluginId": "postgres",
"pluginName": "PostgreSQL"
}
],
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "8.1.2"
},
{
"type": "datasource",
"id": "postgres",
"name": "PostgreSQL",
"version": "1.0.0"
},
{
"type": "panel",
"id": "table",
"name": "Table",
"version": ""
}
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"target": {
"limit": 100,
"matchAny": false,
"tags": [],
"type": "dashboard"
},
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": null,
"iteration": 1646868391941,
"links": [],
"panels": [
{
"datasource": "${DS_POSTGRESQL}",
"fieldConfig": {
"defaults": {
"custom": {
"align": "left",
"displayMode": "auto"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "id"
},
"properties": [
{
"id": "links",
"value": [
{
"title": "See report",
"url": "/d/hyqCQ97Mk/raw-cluster-report?orgId=1&var-id=${__data.fields[id]}"
}
]
}
]
}
]
},
"gridPos": {
"h": 11,
"w": 24,
"x": 0,
"y": 0
},
"id": 2,
"options": {
"showHeader": true,
"sortBy": [
{
"desc": true,
"displayName": "id"
}
]
},
"pluginVersion": "8.1.2",
"targets": [
{
"format": "table",
"group": [],
"metricColumn": "none",
"rawQuery": true,
"rawSql": "select\nreport_stamp::TEXT, id\nfrom public.report\nwhere \ncluster_id = '$id';",
"refId": "A",
"select": [
[
{
"params": [
"value"
],
"type": "column"
}
]
],
"timeColumn": "time",
"where": [
{
"name": "$__timeFilter",
"params": [],
"type": "macro"
}
]
}
],
"timeFrom": null,
"timeShift": null,
"title": "All reports of cluster_id: $id",
"type": "table"
}
],
"schemaVersion": 30,
"style": "dark",
"tags": [],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "ffffffff-ffff-ffff-ffff-ffffffffffff",
"value": "ffffffff-ffff-ffff-ffff-ffffffffffff"
},
"description": null,
"error": null,
"hide": 0,
"label": "id",
"name": "id",
"options": [
{
"selected": true,
"text": "ffffffff-ffff-ffff-ffff-ffffffffffff",
"value": "ffffffff-ffff-ffff-ffff-ffffffffffff"
}
],
"query": "ffffffff-ffff-ffff-ffff-ffffffffffff",
"skipUrlSync": false,
"type": "textbox"
}
]
},
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
]
},
"timezone": "",
"title": "All reports by cluster id",
"uid": "GJkuC3nMk",
"version": 7
}
Loading