Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(apps): export logs to open telemetry endpoint #1617

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 62 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,11 +126,12 @@ These health checks are integrated with Azure Container Apps' health probe syste

## Observability with OpenTelemetry

This project uses OpenTelemetry for distributed tracing and metrics collection. The setup includes:
This project uses OpenTelemetry for distributed tracing, metrics collection, and logging. The setup includes:

### Core Features
- Distributed tracing across services
- Runtime and application metrics
- Log aggregation and correlation
- Integration with Azure Monitor/Application Insights
- Support for both OTLP and Azure Monitor exporters
- Automatic instrumentation for:
Expand All @@ -157,15 +158,72 @@ OpenTelemetry is configured through environment variables that are automatically
### Local Development

For local development, the project includes a docker-compose setup with:
- OpenTelemetry Collector
- Grafana
- Other supporting services
- OpenTelemetry Collector (ports 4317/4318 for OTLP receivers)
- Grafana (port 3000)
- Jaeger (port 16686)
- Loki (port 3100)
- Prometheus (port 9090)

To run the local observability stack:
```bash
podman compose -f docker-compose-otel.yml up
```

### Accessing Observability Tools

Once the local stack is running, you can access the following tools:

#### Distributed Tracing with Jaeger
- URL: http://localhost:16686
- Features:
- View distributed traces across services
- Search by service, operation, or trace ID
- Analyze timing and dependencies
- Debug request flows and errors

#### Metrics with Prometheus
- URL: http://localhost:9090
- Features:
- Query raw metrics data
- View metric targets and service discovery
- Debug metric collection

#### Log Aggregation with Loki
- Direct URL: http://localhost:3100
- Grafana Integration: http://localhost:3000 (preferred interface)
- Features:
- Search and filter logs across all services
- Correlate logs with traces using trace IDs
- Create log-based alerts and dashboards
- Use LogQL to query logs:
```logql
# Example: Find all error logs
{container="web-api"} |= "error"

# Example: Find logs with specific trace ID
{container=~"web-api|graphql"} |~ "trace_id=([a-f0-9]{32})"
```

#### Metrics and Dashboards in Grafana
- URL: http://localhost:3000
- Features:
- Pre-configured dashboards for:
- Application metrics
- Runtime metrics
- HTTP request metrics
- Data sources:
- Prometheus (metrics)
- Loki (logs)
- Jaeger (traces)
- Create custom dashboards
- Set up alerts

#### OpenTelemetry Collector Endpoints
- OTLP gRPC receiver: localhost:4317
- OTLP HTTP receiver: localhost:4318
- Prometheus metrics: localhost:8888
- Prometheus exporter metrics: localhost:8889

### Request Filtering

The telemetry setup includes smart filtering to:
Expand Down
23 changes: 23 additions & 0 deletions docker-compose-otel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,12 @@ services:
- "14250:14250" # Model used by collector
environment:
- COLLECTOR_OTLP_ENABLED=true
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "localhost:16686"]
interval: 3s
timeout: 3s
retries: 10
start_period: 10s

# Prometheus for metrics
prometheus:
Expand All @@ -31,6 +37,21 @@ services:
ports:
- "9090:9090"

# Loki for log aggregation
loki:
image: grafana/loki:3.2.2
ports:
- "3100:3100"
volumes:
- ./local-otel-configuration/loki-config.yaml:/etc/loki/local-config.yaml
command: -config.file=/etc/loki/local-config.yaml
healthcheck:
test: ["CMD-SHELL", "wget -q --tries=1 -O- http://localhost:3100/ready"]
interval: 3s
timeout: 3s
retries: 10
start_period: 10s

# Grafana for metrics visualization
grafana:
image: grafana/grafana:11.4.0
Expand All @@ -43,3 +64,5 @@ services:
- ./local-otel-configuration/grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml
- ./local-otel-configuration/grafana-dashboards.yml:/etc/grafana/provisioning/dashboards/dashboards.yml
- ./local-otel-configuration/dashboards:/etc/grafana/provisioning/dashboards
depends_on:
- loki
9 changes: 8 additions & 1 deletion local-otel-configuration/grafana-datasources.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,11 @@ datasources:
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
isDefault: true

- name: Loki
type: loki
access: proxy
url: http://loki:3100
jsonData:
maxLines: 1000
45 changes: 45 additions & 0 deletions local-otel-configuration/loki-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
auth_enabled: false

server:
http_listen_port: 3100

common:
path_prefix: /tmp/loki

compactor:
working_directory: /tmp/loki/compactor
compaction_interval: 10m

ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 5m
chunk_retain_period: 30s

schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h

storage_config:
tsdb_shipper:
active_index_directory: /tmp/loki/tsdb-index
cache_location: /tmp/loki/tsdb-cache
cache_ttl: 24h
filesystem:
directory: /tmp/loki/chunks

limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h
allow_structured_metadata: true
4 changes: 3 additions & 1 deletion local-otel-configuration/otel-collector-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ exporters:
verbosity: detailed
sampling_initial: 5
sampling_thereafter: 200
otlphttp:
endpoint: "http://loki:3100/otlp"

extensions:
health_check:
Expand All @@ -49,4 +51,4 @@ service:
logs:
receivers: [otlp]
processors: [batch]
exporters: [debug]
exporters: [otlphttp, debug]
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.Logging;

namespace Digdir.Domain.Dialogporten.WebApi.Common.Middleware;

public class RequestLoggingMiddleware
{
private readonly RequestDelegate _next;
private readonly ILogger<RequestLoggingMiddleware> _logger;

public RequestLoggingMiddleware(RequestDelegate next, ILogger<RequestLoggingMiddleware> logger)
{
_next = next;
_logger = logger;
}

public async Task InvokeAsync(HttpContext context)
{
try
{
await _next(context);
}
finally
{
_logger.LogInformation(
"HTTP {RequestMethod} {RequestPath} responded {StatusCode}",
context.Request.Method,
context.Request.Path,
context.Response.StatusCode);
}
Comment on lines +25 to +30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance log data for better observability

The current logging implementation captures basic request details, but for effective monitoring and debugging in a distributed system, consider including additional telemetry data:

  • Request duration
  • Request/Response sizes
  • User Agent
  • Client IP (with appropriate PII handling)
  • Trace context correlation

Here's a suggested implementation:

 public async Task InvokeAsync(HttpContext context)
 {
+    var sw = Stopwatch.StartNew();
     try
     {
         await _next(context);
     }
     finally
     {
+        sw.Stop();
         _logger.LogInformation(
-            "HTTP {RequestMethod} {RequestPath} responded {StatusCode}",
+            "HTTP {RequestMethod} {RequestPath} responded {StatusCode} in {ElapsedMilliseconds}ms - Size: {RequestLength}/{ResponseLength} - {UserAgent} - {ClientIP} - {TraceId}",
             context.Request.Method,
             context.Request.Path,
-            context.Response.StatusCode);
+            context.Response.StatusCode,
+            sw.ElapsedMilliseconds,
+            context.Request.ContentLength,
+            context.Response.ContentLength,
+            context.Request.Headers.UserAgent,
+            context.Connection.RemoteIpAddress,
+            Activity.Current?.TraceId);
     }
 }

Committable suggestion skipped: line range outside the PR's diff.

}
}

public static class RequestLoggingMiddlewareExtensions
{
public static IApplicationBuilder UseRequestLogging(this IApplicationBuilder app)
=> app.UseMiddleware<RequestLoggingMiddleware>();
}
45 changes: 8 additions & 37 deletions src/Digdir.Domain.Dialogporten.WebApi/Program.cs
Original file line number Diff line number Diff line change
Expand Up @@ -24,48 +24,17 @@
using NSwag;
using Serilog;
using Microsoft.Extensions.Options;
using Digdir.Domain.Dialogporten.WebApi.Common.Middleware;

// Using two-stage initialization to catch startup errors.
var telemetryConfiguration = TelemetryConfiguration.CreateDefault();
Log.Logger = new LoggerConfiguration()
.MinimumLevel.Warning()
.Enrich.WithEnvironmentName()
.Enrich.FromLogContext()
.WriteTo.Console(formatProvider: CultureInfo.InvariantCulture)
.WriteTo.ApplicationInsights(telemetryConfiguration, TelemetryConverter.Traces)
.CreateBootstrapLogger();
var builder = WebApplication.CreateBuilder(args);

try
{
BuildAndRun(args, telemetryConfiguration);
}
catch (Exception ex) when (ex is not OperationCanceledException)
{
Log.Fatal(ex, "Application terminated unexpectedly");
throw;
}
finally
{
Log.CloseAndFlush();
}

static void BuildAndRun(string[] args, TelemetryConfiguration telemetryConfiguration)
{
var builder = WebApplication.CreateBuilder(args);

builder.WebHost.ConfigureKestrel(kestrelOptions =>
{
kestrelOptions.Limits.MaxRequestBodySize = Constants.MaxRequestBodySize;
});

builder.Host.UseSerilog((context, services, configuration) => configuration
.MinimumLevel.Warning()
.ReadFrom.Configuration(context.Configuration)
.ReadFrom.Services(services)
.Enrich.WithEnvironmentName()
.Enrich.FromLogContext()
.WriteTo.ApplicationInsights(telemetryConfiguration, TelemetryConverter.Traces));

builder.Configuration
.AddAzureConfiguration(builder.Environment.EnvironmentName)
.AddLocalConfiguration(builder.Environment);
Expand Down Expand Up @@ -154,11 +123,8 @@ static void BuildAndRun(string[] args, TelemetryConfiguration telemetryConfigura

var app = builder.Build();

app.MapAspNetHealthChecks()
.MapControllers();

app.UseHttpsRedirection()
.UseSerilogRequestLogging()
.UseRequestLogging()
.UseDefaultExceptionHandler()
.UseJwtSchemeSelector()
.UseAuthentication()
Expand Down Expand Up @@ -222,6 +188,11 @@ static void BuildAndRun(string[] args, TelemetryConfiguration telemetryConfigura

app.Run();
}
catch (Exception ex) when (ex is not OperationCanceledException)
{
Console.WriteLine($"Application terminated unexpectedly: {ex}");
throw;
}

static void IgnoreEmptyCollections(JsonTypeInfo typeInfo)
{
Expand Down
10 changes: 10 additions & 0 deletions src/Digdir.Library.Utils.AspNet/AspNetUtilitiesExtensions.cs
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
using OpenTelemetry.Exporter;
using System.Diagnostics;
using Azure.Monitor.OpenTelemetry.Exporter;
using OpenTelemetry.Logs;
using Microsoft.Extensions.Logging;

namespace Digdir.Library.Utils.AspNet;

Expand Down Expand Up @@ -156,6 +158,14 @@ public static WebApplicationBuilder ConfigureTelemetry(
});
}
});

if (!builder.Environment.IsDevelopment())
{
// Clear existing logging providers. If development, we want to keep the console logging.
builder.Logging.ClearProviders();
}

telemetryBuilder.WithLogging();
}
else
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
<PackageReference Include="Microsoft.ApplicationInsights.AspNetCore" Version="2.22.0" />
<PackageReference Include="Microsoft.Extensions.Http" Version="9.0.0" />
<PackageReference Include="AspNetCore.HealthChecks.UI.Client" Version="8.0.1" />
<PackageReference Include="OpenTelemetry.Exporter.Console" Version="1.10.0" />
<PackageReference Include="OpenTelemetry.Exporter.OpenTelemetryProtocol" Version="1.10.0" />
<PackageReference Include="OpenTelemetry.Instrumentation.EntityFrameworkCore" Version="1.10.0-beta.1" />
<PackageReference Include="System.Text.Json" Version="9.0.0" />
Expand Down
Loading