Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrumentation.SqlClient Span has Duration of 0 sec #1736

Open
Mario-Hofstaetter opened this issue Nov 23, 2020 · 9 comments
Open

Instrumentation.SqlClient Span has Duration of 0 sec #1736

Mario-Hofstaetter opened this issue Nov 23, 2020 · 9 comments
Labels
bug Something isn't working comp:instrumentation.sqlclient Things related to OpenTelemetry.Instrumentation.SqlClient

Comments

@Mario-Hofstaetter
Copy link

Mario-Hofstaetter commented Nov 23, 2020

Bug Report

      <PackageReference Include="Microsoft.Extensions.Configuration.Abstractions" Version="3.1.9" />
      <PackageReference Include="Microsoft.Extensions.DependencyInjection.Abstractions" Version="3.1.2" />
      <PackageReference Include="OpenTelemetry.Exporter.Jaeger" Version="1.0.0-rc1.1" />
      <PackageReference Include="OpenTelemetry.Extensions.Hosting" Version="1.0.0-rc1.1" />
      <PackageReference Include="OpenTelemetry.Instrumentation.AspNetCore" Version="1.0.0-rc1.1" />
      <PackageReference Include="OpenTelemetry.Instrumentation.Http" Version="1.0.0-rc1.1" />
      <PackageReference Include="OpenTelemetry.Instrumentation.SqlClient" Version="1.0.0-rc1.1" />

Runtime versionnetcoreapp3.1

Symptom

I tested SqlClient instrumentation with a dummy app and noticed
the Duration of the SqlClient Span is always 0ms or 1ms. (See Screenshot)

My test SQL Statement has a sleep (WAITFOR DELAY) in it, so it takes at least a couple seconds even on localhost.
It looks like the span is created after all sql work is done.

What is the expected behavior?

I expected to observe a span duration that is reflecting the actual execution time of the SQL Statement in the database.
Since the README does not contain any details / example,

I was expecting the Span to start on calling Connection.Open() or command.ExecuteReader() (?)

What is the actual behavior?

The autoinstrumented activity (span) has a duration of 0ms or 1ms on localhost. After closely looking at the reported Start Times one can see the span is started after opening the Connection and before command.ExecuteReader()):

Starttime Event
139.31ms Post Open Connection
139.34ms SqlClient Span Demo, service localhost, Duration 1ms
140.47ms Post ExecuteReader

It looks like the span is instantly stopped?

Reproduce

// Startup.cs

services.AddTransient<IDbConnection>(sqlConnection => new SqlConnection(
                @"Server=localhost\SQL2017DEV;Database=Demo;Trusted_Connection=True;Application Name=""Weather BackendService"""));
                
// Controller

using (var command = _sqlConnection.CreateCommand())
{
    var rowCount = new Random().Next(1500, 70000);
    var tableName = @"DummyData";
    command.CommandText = $@"SELECT TOP({rowCount}) * FROM {tableName}; WAITFOR DELAY '00:00:{rowCount/10000:00}'";
    command.CommandTimeout = 10; // seconds
    
    using var dbActivity = Startup.Source.StartActivity($"SQL DB {_sqlConnection.Database} - Get Table {tableName}");

    try
    {
        Activity.Current?.AddEvent(new ActivityEvent("Pre Open Connection"));
        
        command.Connection.Open();
        
        Activity.Current?.AddEvent(new ActivityEvent("Post Open Connection"));
        using (var sqlReader = command.ExecuteReader())
        {
            Activity.Current?.AddEvent(new ActivityEvent("Post ExecuteReader"));
            while (sqlReader.Read())
            {
                try
                {
                    // do something
                }
                catch (Exception ex)
                {
                    _logger.LogError(ex, "Parsing Sql Table Row Failed");
                }
            }
            Activity.Current?.AddEvent(new ActivityEvent("Post While Read()"));
        }
    }
    finally
    {
        command.Connection.Close();
        Activity.Current?.AddEvent(new ActivityEvent("Post Close Connection"));
        dbActivity?.Stop();
        Activity.Current?.AddEvent(new ActivityEvent("Post Stop Activity"));
    }
}
Activity.Current?.AddEvent(new ActivityEvent("Post using CreateCommand"));

Thread.Sleep(30);
// ...

Result

Sql-Span-Duration
Sql-Span-Logs
Log-Post-Activity

@Mario-Hofstaetter Mario-Hofstaetter added the bug Something isn't working label Nov 23, 2020
@mbakalov
Copy link

Thanks @Mario-Hofstaetter! This looks suspicious, I'll take a look!

The Activity created by the SqlClient instrumentation should start on ExecuteReader() and stop when ExecuteReader() returns.

I just tried a very small example, and I do get span duration in Jaeger as expected:

using (var con =
    new Microsoft.Data.SqlClient.SqlConnection("Data Source=localhost;Initial Catalog=master;Integrated Security=true;"))
{
    con.Open();

    using (var cmd = con.CreateCommand())
    {
        cmd.CommandText = "select database_id from sys.databases; waitfor delay '00:00:05'";

        var sw = new Stopwatch();
        sw.Start();
        using (var reader = cmd.ExecuteReader())
        {
            sw.Stop();

            Console.WriteLine("ExecuteReader took: {0}", sw.ElapsedMilliseconds);

            sw.Restart();
            while (reader.Read())
            {
            }

            sw.Stop();
            Console.WriteLine("reader.Read() took: {0}", sw.ElapsedMilliseconds);
        }
    }
}

Console output:
image

Jaeger:
image

I'll see if I can reproduce with a bigger example more similar to yours.

@Mario-Hofstaetter
Copy link
Author

Mario-Hofstaetter commented Nov 25, 2020

Edit: @mbakalov I've nailed it down.
It has something to do with rowcount and/or datalength received. I guess there is some streaming / paging going on behind the curtains.

Use this query, which I used to populate my demotable in the first place:

SELECT TOP(1000) o1.object_id, o1.name FROM sys.objects o1 CROSS JOIN sys.objects o2 CROSS JOIN sys.objects o3; WAITFOR DELAY '00:00:04'

Your select database_id from sys.databases wasn't enough rows.

I went from 5000 down until 300 Rows in TOP until the Duration was correct.

grafik

grafik

Using this particular query and / or my database setup, the tipping point is exactly 306 Rows. With 305 Rows the Duration is correct.

@mbakalov
Copy link

mbakalov commented Jan 5, 2021

@Mario-Hofstaetter - sorry, missed the edit!

You are right, this is a result of datareader's buffering behavior. It seems that cmd.ExecuteReader() returns as soon as either:

  • Client-side network buffer fills for the first time, or
  • There is no more data for SQL Server to send.

Using code like below I did some measurements on different queries using the StackOverflow DB:

using var con =
    new SqlConnection("Data Source=tcp:localhost;Initial Catalog=StackOverflow2013;Integrated Security=true;");
con.Open();

using var cmd = con.CreateCommand();
cmd.CommandText = "select top 500000 * from Users; waitfor delay '00:00:05'";

var sw = new Stopwatch();
sw.Start();
using var reader = cmd.ExecuteReader();
sw.Stop();

Console.WriteLine("ExecuteReader took: {0}", sw.ElapsedMilliseconds);

sw.Restart();
while (reader.Read())
{
}

sw.Stop();
Console.WriteLine("reader.Read() took: {0}", sw.ElapsedMilliseconds);
Query ExecuteReader (ms) while (reader.Read()) (ms)
select * from Users 49 3158
select * from Users; waitfor delay '00:00:05' 77 9249
waitfor delay '00:00:05'; select * from Users 5060 2457
select top 1 * from Users 45 6
select top 1 * from Users; waitfor delay '00:00:05' 5049 4
select top 500000 * from Users 60 955
select top 500000 * from Users; waitfor delay '00:00:05' 51 5867

So this seems consistent with the buffering behavior mentioned here on StackOverflow and in MS docs: for a simple query like select * from Users, the SQL Server can start returning results right away. As soon as the client buffer fills up, cmd.ExecuteReader() returns, before the SQL Server gets to executing the waitfor part of the batch.

For some additional info I did some network captures to observe this more closely. Here is a capture for select top 500000 * from Users; waitfor delay '00:00:05':
image

We send the query batch to the server in packet 26 and get roughly 8k-sized data packet back (no. 28) almost right away.

And in another example for select top 1 * from Users; waitfor delay '00:00:05':
image

Packet no. 28 is now 5 seconds delayed because the data size is only 1.7k and the SQL server executed the waitfor statement in the batch, since the buffer wasn't full.

@Mario-Hofstaetter
Copy link
Author

Mario-Hofstaetter commented Jan 11, 2021

@mbakalov Thanks for looking into this.

Since I guess this is not something that can be fixed , what do you recommend?

It is very unintuitive for the user that some spans of ExecuteReader will have wrong duration, making it quite useless.
Users would have to always create a custom span until while (reader.Read()) is finished.

IDataReader implements IDisposable, so I guess the span could be stopped when the DataReader is closed / disposed?
This would mean however that the span is not only SQL time but also business logic that is certainly within that block.

PS: Happy new year

@mbakalov
Copy link

Thanks @Mario-Hofstaetter, Happy New Year to you as well :)

I agree, this behavior can sometimes be a bit confusing. Ultimately, I think you want to instrument both:

  • Call to sql server and how long it took to "execute" there,
  • The entire process of "draining" the data reader.

Even regardless of reader's "streaming" semantics, I still found the current instrumentation behavior to be super useful. A lot of (most?) queries I come across typically take a long time to even return the first row. Maybe due to a bad execution plan, or lack of indexes, or lots of I/O and CPU required to JOIN or sort through some intermediate results. The stuff DBAs usually look at. Current instrumentation works very well for that.

There are of course a lot if interesting things happening within while (reader.Read()) too, but like you said, there is usually a lot of project-specific business logic there. It would be hard for a "generic" instrumentation to be useful for everybody, I think. But it is definitely interesting to explore! On a few projects I had some success creating my own decorators - wrappers around IDataReader and DbCommand that create/stop my own Activities. Time permitting, I'll see if I can add something like that to our contrib repo if anyone else finds it useful.

BTW, one exception to the "query taking time to return the first row" I typically see is with queries that return multiple datasets (that you go over with reader.MoveNext()). Sometimes the first dataset is something simple (but large enough), but a subsequent one has a much more complicated query behind it. I suspect in this case our ExecuteReader timing would indeed be misleading and we'd need to make sure the user code is instrumented as well. So this is definitely something to keep in mind!

@cijothomas
Copy link
Member

@martinjt
Copy link
Member

This looks like it's not something we're planning on, or can be, fixed, so I'm closing this as Not Planned.

@Mario-Hofstaetter
Copy link
Author

@martinjt It's not relevant for me anymore :)
but the issue has not been closed?

@reyang reyang transferred this issue from open-telemetry/opentelemetry-dotnet May 13, 2024
@reyang reyang added the comp:instrumentation.sqlclient Things related to OpenTelemetry.Instrumentation.SqlClient label May 13, 2024
@martinjt martinjt closed this as not planned Won't fix, can't repro, duplicate, stale Jun 2, 2024
@cijothomas
Copy link
Member

Since the issue is not fixed, nor documented, reopening this for consideration before stable release of this component.

@cijothomas cijothomas reopened this Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working comp:instrumentation.sqlclient Things related to OpenTelemetry.Instrumentation.SqlClient
Projects
None yet
Development

No branches or pull requests

5 participants