Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[azservicebus] Enable distributed tracing #23860

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

karenychen
Copy link

@karenychen karenychen commented Dec 11, 2024

  • The purpose of this PR is explained in this or a referenced issue.
  • The PR does not update generated files.
  • Tests are included and/or updated for code changes.
  • Updates to module CHANGELOG.md are included.
  • MIT license headers are included in each file.

@github-actions github-actions bot added Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. Service Bus labels Dec 11, 2024
Copy link

Thank you for your contribution @karenychen! We will review the pull request and get back to you soon.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently a couple of other packages have something similar to validate the spans in unit tests. We should move this to azcore/tracing so more libraries can use this fake trace provider in their tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably go in the sdk/internal module so it can be internally shared but not publicly surfaced.

@karenychen
Copy link
Author

Hi @lmolkova ! I had a small question regarding diagnostic-id in https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-end-to-end-tracing?tabs=net-standard-sdk-2

The .NET SDK seems to be hooking them up to a ReceiveMessages trace when the users set the diagnostic-id and linking the list of diagnostic ids from the messages to the Receive trace (code here). I might be misunderstanding how .NET is doing it, but I am wondering what we enable with the diagnostic-ids?

Copy link
Member

@richardpark-msft richardpark-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far, got some questions for the other experts in the crowd, but from an SB perspective it looks great.

@@ -45,14 +46,45 @@ func TestSender_UserFacingError(t *testing.T) {

var asSBError *Error

err = sender.SendMessage(context.Background(), &Message{}, nil)
msgID := "testID"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still want the original test ( err = sender.SendMessage(context.Background(), &Message{}, nil) here, but otherwise loving the addition of tracing to the mix.

sdk/messaging/azservicebus/client_test.go Show resolved Hide resolved
return creds.fullyQualifiedNamespace
}

parts := strings.Split(creds.connectionString, "/")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an actual parser for the connection string, we should definitely use that. Look in internal/conn.

}

func getSpanAttributesForMessage(message *Message) []tracing.Attribute {
attrs := []tracing.Attribute{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
attrs := []tracing.Attribute{}
var attrs []tracing.Attribute

I swear there's some linter that complains if you pre-init the slice and it's not technically needed anyways.

)
defer func() { endSpan(err) }()

err = s.links.Retry(ctx, EventSender, "SendMessageBatch", func(ctx context.Context, lwid *internal.LinksWithID, args *utils.RetryFnArgs) error {
return lwid.Sender.Send(ctx, batch.toAMQPMessage(), nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmolkova, in cases like this where I have retries, should the span reporting be in the retry loop, or outside, like we have here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way it works in our HTTP SDKs is that the method span is above the retry layer, and its child HTTP span is below the retry layer. So you'd have spans like this.

Some method call span
  HTTP span retry 1
  HTTP span retry 2

I presume we'd want to do the same thing here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhendrixMSFT I did a bit of digging -- the semantic conventions for HTTP has examples for how the spans should look like for retries: https://opentelemetry.io/docs/specs/semconv/http/http-spans/#http-client-authorization-retry-examples. However, for messaging systems there is not much information besides this chunk:

image

I am not sure if this implies the messaging spans are 1 per operation?

ctx, endSpan := s.startSpan(ctx, "ScheduleAMQPAnnotatedMessages", tracing.ScheduleOperationName,
tracing.Attribute{Key: tracing.BatchMessageCount, Value: int64(len(messages))},
)
defer func() { endSpan(err) }()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhendrixMSFT, would it be worth building this pattern (via a callback, probably) into the tracing library? It can be internal, but it seems like everyone's going to do the "last error gets passed to endSpan before block ends" pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, @karenychen, we can build a helper function - maybe we'd stick it right in the retry function to make things easier since we're passing very similar information to both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it goes in the sdk/internal module?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced with Richard offline -- we are moving this to the Retry() layer :)

Comment on lines 242 to 243
spanName = fmt.Sprintf("Sender.%s", spanName)
attributes := []tracing.Attribute{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should define these all the operation constants as 'Sender.ScheduledMessages' (ie, preformatted) and avoid the formatting operation/concat that we have to do here. That's easily for a future PR, I'm sure there's other spots I do the same thing.

Copy link
Author

@karenychen karenychen Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, did we want to define the span names when we are calling the span, or did we want to have a set of pre-defined span names (similar to the operation type below) where we can directly grab and use?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"When calling" is fine, but having predefined 'const''s could be nice from a documentation perspective (ie: these are all the spans that exist).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(leaving it up to you, I'm good with either approach)

@@ -186,10 +187,11 @@ func deleteSubscription(t *testing.T, ac *admin.Client, topicName string, subscr
// and fails tests otherwise.
func peekSingleMessageForTest(t *testing.T, receiver *Receiver) *ReceivedMessage {
var msg *ReceivedMessage
// TODO
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question: is it possible for me to test the traces in outside of the local unit tests too? Are there instructions on how I can run the live tests (and potentially the stress tests)?

ctx, endSpan := s.startSpan(ctx, "ScheduleAMQPAnnotatedMessages", tracing.ScheduleOperationName,
tracing.Attribute{Key: tracing.BatchMessageCount, Value: int64(len(messages))},
)
defer func() { endSpan(err) }()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced with Richard offline -- we are moving this to the Retry() layer :)

)
defer func() { endSpan(err) }()

err = s.links.Retry(ctx, EventSender, "SendMessageBatch", func(ctx context.Context, lwid *internal.LinksWithID, args *utils.RetryFnArgs) error {
return lwid.Sender.Send(ctx, batch.toAMQPMessage(), nil)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhendrixMSFT I did a bit of digging -- the semantic conventions for HTTP has examples for how the spans should look like for retries: https://opentelemetry.io/docs/specs/semconv/http/http-spans/#http-client-authorization-retry-examples. However, for messaging systems there is not much information besides this chunk:

image

I am not sure if this implies the messaging spans are 1 per operation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. Service Bus
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants