Skip to content

Commit

Permalink
[feat] Adding Tracing(otel) to csi driver (#309)
Browse files Browse the repository at this point in the history
- added tracing for the GRPC operations in the controller server
- updated docs on how to opt-in for tracing and how it works
- added make targets for easier installation
  • Loading branch information
prajwalvathreya authored Dec 16, 2024
1 parent 75e2366 commit 2fe6257
Show file tree
Hide file tree
Showing 27 changed files with 922 additions and 94 deletions.
3 changes: 2 additions & 1 deletion .golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,8 @@ linters:
- usestdlibvars
- varnamelen
- whitespace

disable:
- spancheck
presets:
- bugs
- unused
Expand Down
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -215,3 +215,7 @@ install-grafana:
.PHONY: setup-dashboard
setup-dashboard:
KUBECONFIG=test-cluster-kubeconfig.yaml ./hack/setup-dashboard.sh --namespace=monitoring --dashboard-file=observability/metrics/dashboard.json

.PHONY: setup-tracing
setup-tracing:
KUBECONFIG=test-cluster-kubeconfig.yaml ./hack/setup-tracing.sh
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@
- [Contributing](docs/contributing.md)
- [Observability](docs/observability.md)
- [Metrics](docs/metrics-documentation.md)
- [How to opt-in for Metrics](docs/observability.md#steps-to-opt-in-for-the-csi-driver-metrics)
- [Tracing](docs/tracing-documentation.md)
- [How to opt-in for Tracing](docs/observability.md#steps-to-opt-in-for-tracing-in-the-csi-driver)
- [License](#license)
- [Disclaimers](#-disclaimers)
- [Community](#-join-us-on-slack)
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/example-images/tracing/create-volume.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/example-images/tracing/landing-page.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
79 changes: 75 additions & 4 deletions docs/observability.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Observability with Grafana Dashboard
# Observability for CSI Driver

This document explains how to use the `grafana-dashboard` make target to install and configure observability tools, including Prometheus and Grafana, on your Kubernetes cluster. The setup uses Helm charts to install Prometheus and Grafana, provides a Prometheus data source, and applies a Grafana dashboard configuration.
This document explains how to use the `grafana-dashboard` and `setup-tracing` make targets to install and configure observability tools.

## Prerequisites

Expand Down Expand Up @@ -32,7 +32,7 @@ helm template linode-csi-driver \
helm-chart/csi-driver --namespace kube-system > csi.yaml
```

### 2. Delete the Existing Release of the CSI Driver
### 2. Delete the Existing Release of the CSI Driver (Needed only if the CSI driver is already installed on your cluster)

Before applying the new configuration, you need to delete the current release of the Linode CSI driver. This step is necessary because the default CSI driver installation does not have metrics enabled, and Helm doesn’t handle changes to some components gracefully without a clean reinstall.

Expand Down Expand Up @@ -183,4 +183,75 @@ kubectl logs <grafana-pod-name> -n monitoring

This setup provides a quick and easy way to enable observability using Grafana dashboards, ensuring that you have visibility into your Kubernetes cluster and CSI driver operations.

---
---

## Steps to Opt-In for Tracing in the CSI Driver

To enable the tracing for the Linode CSI driver, follow the steps below. These steps involve exporting a new Helm template with tracing enabled, deleting the current CSI driver release, and applying the newly generated configuration.

### 1. Export the Helm Template for the CSI Driver with Tracing Enabled

First, you need to generate a new Helm template for the Linode CSI driver with the `enableTracing` flag set to `true`. You will also have to specify an address that isn't in use for the otel server to run on. By default, the port is set to `4318`.

```bash
helm template linode-csi-driver \
--set apiToken="${LINODE_API_TOKEN}" \
--set region="${REGION}" \
--set enableTracing="true" \
--set tracingPort="4318" \
helm-chart/csi-driver --namespace kube-system > csi.yaml
```

### 2. Delete the Existing Release of the CSI Driver (Needed only if the CSI driver is already installed on your cluster)

Before applying the new configuration, you need to delete the current release of the Linode CSI driver. This step is necessary because the default CSI driver installation does not have tracing enabled, and Helm doesn’t handle changes to some components gracefully without a clean reinstall.

```bash
kubectl delete -f csi.yaml --namespace kube-system
```

### 3. Apply the Newly Generated Template

Once the old CSI driver installation is deleted, you can apply the newly generated template that includes the tracing configuration.

```bash
kubectl apply -f csi.yaml
```

Now, that we have the configuration ready, we must install otel and jaeger to visualize the traces.

## Steps to Install otel and jaeger for visualizing traces

### 1. Run the Tracing setup

The make target `setup-tracing` installs `otel-collector` and `jaeger` for visualizing the traces.

```bash
make setup-tracing
```

### 2. Access the Jaeger Dashboard

Once the setup is complete, you can access the jaeger dashboard through the configured LoadBalancer service. After the setup script runs, the external IP of the LoadBalancer is printed, and you can access Jaeger by opening the following URL in your browser:

```
http://<LoadBalancer-EXTERNAL-IP>:16686
```

### 3. Development Setup (Optional)

In case you want to use Jaeger in a dev environment run the following port-forward command:

```bash
kubectl port-forward svc/jaeger-collector 16686:16686 -n kube-system
```

You can access jaeger now by opening the following URL in your browser:

```
http://localhost:16686
```

Note: If you have made changes to the port, ensure that you change them while running this command.

---
237 changes: 237 additions & 0 deletions docs/tracing-documentation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
# Using the Jaeger Dashboard for Linode CSI Driver

This guide provides a step-by-step explanation of how to use the Jaeger dashboard to analyze traces in the Linode CSI Driver. It includes visual examples for both the **landing page** and an example trace for the `createvolume` operation.

---

## 1. Accessing the Jaeger Dashboard

To access the Jaeger dashboard:
1. Open the Jaeger dashboard in your browser using the external IP (e.g., `http://<external-ip>:16686`).
2. The landing page will appear, providing options to search and analyze traces.

---

## 2. Landing Page Overview

The landing page is the first screen you see upon accessing the Jaeger dashboard. Here's an example:

**Example Landing Page Screenshot**:
![Landing Page](example-images/tracing/landing-page.jpg)

### Key Features of the Landing Page:
- **Search Panel**:
- **Service**: Select the service you want to analyze (e.g., `linode-csi-driver`).
- **Operation**: Choose a specific operation to filter traces, such as `createvolume` or `listvolumes`. By default, all operations are shown.
- **Tags**: Filter traces by tags like `http.status_code=200` or other metadata.
- **Lookback**: Select a time range for trace results (e.g., "Last Hour").
- **Max/Min Duration**: Specify duration filters for traces to focus on slow or fast requests.
- **Limit Results**: Set the maximum number of traces to display.

- **Results Table**:
- Lists all traces matching the search criteria.
- Displays the following information:
- **Service and Operation**: The service (e.g., `linode-csi-driver`) and the operation (e.g., `createvolume` or `listvolumes`).
- **Duration**: Total time taken by the trace.
- **Spans**: Number of sub-operations (spans) in the trace.
- **Timestamp**: The time the trace started.

### Example Analysis:
From the landing page example:
- Two traces are displayed:
1. **Trace ID: 042abeb**:
- **Operation**: `csi.v1.controller/createvolume`.
- **Duration**: `3.37s`.
- **Spans**: `9`.
2. **Trace ID: a039cb1**:
- **Operation**: `csi.v1.controller/listvolumes`.
- **Duration**: `77.35ms`.
- **Spans**: `1`.

To analyze a trace in detail, click on its row (e.g., `042abeb` for `createvolume`).

---

## 3. Viewing a Trace for `createvolume`

Clicking on a trace opens a detailed view of all operations (spans) involved in the request. Here's an example trace for `createvolume`:

**Example `createvolume` Trace Screenshot**:
![Create Volume Trace](example-images/tracing/create-volume.jpg)
![Create Volume Trace Continued](example-images/tracing/create-volume-continued.jpg)

### Trace View Key Features:
1. **Trace Timeline**:
- Visualizes the entire flow of the request as a timeline.
- Horizontal bars represent spans, showing the relative time and duration of each operation.
- The black line represents the critical path of the selected operation.
- Total trace duration is displayed at the top (e.g., `3.37s`).

2. **Service & Operation Breakdown**:
- Displays a hierarchical list of operations executed during the trace.
- **Parent Span**: Represents the top-level operation (e.g., `csi.v1.controller/createvolume`).
- **Child Spans**: Nested operations under the parent span.

### Example Breakdown:
For the `createvolume` trace:
- **Parent Span**:
- **Operation**: `csi.v1.controller/createvolume`.
- **Duration**: `3.37s`.
- Includes the following sub-operations:
1. **`validatecreatevolumerequest`**:
- **Duration**: `2µs`.
- **Purpose**: Validates the incoming request for required parameters.
2. **`preparevolumeparams`**:
- **Duration**: `2µs`.
- **Purpose**: Prepares necessary parameters for volume creation.
3. **`getcontentsourcevolume`**:
- **Duration**: `1µs`.
- **Purpose**: Retrieves existing content sources (if applicable).
4. **`createandwaitforvolume`**:
- **Duration**: `3.37s`.
- **Purpose**: Creates the volume in Linode and waits for the operation to complete.
- Sub-operations include:
- **`attemptcreatelinodevolume`**:
- **Duration**: `232.64ms`.
- **Purpose**: It checks for existing volumes with the same label and either returns the existing volume or creates a new one, optionally cloning from a source volume.
- **`createLinodeVolume`**:
- **Duration**: `155.47ms`.
- **Purpose**: creates a new Linode volume with the specified label, size, and tags. It returns the created volume or an error if the creation fails.
5. **`createvolumecontext`**:
- Prepares the context for the created volume and adds necessary attributes.
- - **Duration**: `4µs`.
6. **`preparecreatevolumeresponse`**:
- **Duration**: `4µs`.
- **Purpose**: Prepares the response to return to the caller.

---

# Updating spans to provide additional information

If you want to track additional information in a span, you can utilize the functions `TraceFunctionData` and `SerializeObject` in `pkg/observability/tracker.go` to your advantage.

## 1. `TraceFunctionData`: Tracing Function Calls

The `TraceFunctionData` function simplifies the process of tracing the behavior of your functions. It captures key information about function execution, including parameters, success or error status, and error details (if any).

### **Function Signature**

```go
func TraceFunctionData(span tracer.Span, operationName string, params map[string]string, err error) error
```

### **Key Features**
- **Span Attributes**:
- Adds key-value pairs from the `params` map as attributes to the span for better trace details.
- **Success or Error Handling**:
- Sets the span status to `codes.Ok` for successful execution or `codes.Error` for failures.
- Logs the result (`success` or `error`) along with the `operationName` and `params`.
- **Error Recording**:
- Captures error details in the span using `span.RecordError`.

### **Example Usage**

You can use `TraceFunctionData` in any function to add tracing with custom parameters:

```go
observability.TraceFunctionData(span, "ValidateCreateVolumeRequest", map[string]string{
"volume_name": req.GetName(),
"requestBody": observability.SerializeObject(req)}, err)
```

Here:
- `span`: The current tracing span.
- `"ValidateCreateVolumeRequest"`: The name of the operation being traced.
- `map[string]string`: A map of custom parameters to include in the trace. Add any details you want to capture, like volume names, request IDs, or serialized objects returned by API calls.
- `err`: The error object (if any) from the function being traced.

---

## 2. `SerializeObject`: Serializing Objects for Tracing

The `SerializeObject` function converts complex objects into JSON strings, making it easier to include them in trace parameters or logs.

### **Function Signature**

```go
func SerializeObject(obj interface{}) string
```

### **Key Features**
- Converts any object (`struct`, `map`, etc.) into a JSON string.
- Handles serialization errors gracefully and logs the issue.
- Useful for including large or complex objects in the trace parameters.

### **Example Usage**

You can serialize objects like a request body and append them to the `params` map:

```go
observability.TraceFunctionData(span, "CreateVolume", map[string]string{
"requestBody": observability.SerializeObject(req),
"volume_type": "block-storage",
}, nil)
```
Here:
- The request object `req` is serialized into a JSON string using `SerializeObject`.
- The serialized string is added to the `params` map as `"requestBody"`.

---

## 3. Adding tracing to a function

To integrate `TraceFunctionData` and `SerializeObject` into your function:

1. Create a Span:
- Use the `StartFunctionSpan` function from tracker.go to create a span at the beginning of your function
2. Capture Parameters:
- Use a `map[string] string` to include parameters you want to capture.
- Serialize objects using `SerializeObject` if needed.
3. Call `TraceFunctionData`:
- Pass the span, operation name, parameters, and any error to `TraceFunctionData` wherever necessary.

### **Example**

```go
func CreateVolumeRequest(ctx context.Context, req *csi.CreateVolumeRequest) error {
// Step 1: Create a Span
_, span := observability.StartFunctionSpan(ctx)
defer span.End() // Ensure the span ends when the function exits

// Step 2: Capture Parameters
// Initialize a map to hold custom trace parameters
params := map[string]string{
"volume_name": req.GetName(),
"capacity_range": observability.SerializeObject(req.GetCapacityRange()),
"parameters": observability.SerializeObject(req.GetParameters()),
}

// Simulate parameter validation
if req.GetName() == "" {
err := fmt.Errorf("volume name is missing")

// Step 3: Call TraceFunctionData with error
observability.TraceFunctionData(span, "ValidateCreateVolumeRequest", params, err)
return err
}

// On success
// Step 3: Call TraceFunctionData with no error
observability.TraceFunctionData(span, "CreateVolumeRequest", params, nil)
return nil
}
```

---

## Benefits of Using This Approach

- **Detailed Traces**:
- Include all relevant details about function execution, making it easier to debug issues.
- **Error Visibility**:
- Automatically records errors and logs them with context.
- **Flexibility**:
- Add or modify parameters dynamically based on your function's needs.
- **Serialization**:
- Handles complex objects seamlessly without additional manual string conversion.
---
Loading

0 comments on commit 2fe6257

Please sign in to comment.