Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add run job traces #292

Merged
merged 29 commits into from
Sep 3, 2024
Merged

feat: Add run job traces #292

merged 29 commits into from
Sep 3, 2024

Conversation

bgins
Copy link
Contributor

@bgins bgins commented Aug 13, 2024

Review Type Requested (choose one):

  • Logic - thorough check (from everybody doing review)

Summary

This pull request implements the following changes:

  • Add OpenTelemetry apparatus
    • Add tracer provider
      • Add HTTP exporter
      • Add resource attributes, including service name, version, OS, architecture, GPU info, network (e.g. "testnet"), and address
    • Add --disable-telemetry CLI flag
    • Add telemetry_url config
      • Set endpoints for local development and production
    • Add telemetry_token config
    • Add GetOTelServiceName helper function and table test
  • Add Resource Provider run job trace
    • Add span attributes with information about associated deal
    • Add span events for run job steps
    • Log deal ID
    • Debug log trace ID on starting the trace
  • Disable telemetry in integration tests
  • Move Version to system package to make it accessible as a resource attribute
  • Redact private key from logs
  • Debug log public address at initialization (all services)
  • Debug log Job Offer ID in job creator (to correlate job creator run with resource provider run job trace)
  • Upgrade to Go 1.22
  • Add debug logs to test GitHub action workflow

Task/Issue reference

Closes: #293

Details

This pull request adds our initial tracing implementation with a single trace over resource provider job runs.

We collect the following information about resource providers:

  • Operating system
  • Architecture
  • GPU (if any)
  • Wallet address
  • Network (dev, devnet, or testnet)

This information will also be collected for any service where we add tracing.

We collect the following information about jobs:

  • Deal ID
  • Job creator address
  • Resource provider address
  • Job offer CID
  • Resource offer CID
  • Module repo
  • Module hash (the git hash, also maybe a tag)
  • Result ID
  • Transaction hash from adding the result on chain

The public address, job offer ID, deal ID, trace ID logs are identifiers for looking up traces.

How to test this code?

Run our forthcoming observability implementation (https://github.com/Lilypad-Tech/observability/tree/dev-ronelcanaria/feat-add-alloy):

./stack observe-up

Navigate to the Grafana Dashboard at http://localhost:3000. Start the full Lilypad stack, then run a job.

The resource provider should send the trace to the observability stack where it can be viewed in Home > Explore, select "Tempo" at the top, and search. It's possible to filter on run_job for the span name, but shouldn't be needed because we only have one trace.

@bgins bgins changed the title Bgins/feat add run job traces feat: Add run job traces Aug 13, 2024
@bgins bgins force-pushed the bgins/feat-add-run-job-traces branch 5 times, most recently from 9a90017 to 1edaf9b Compare August 14, 2024 17:10
@bgins bgins marked this pull request as ready for review August 14, 2024 19:10
go.mod Show resolved Hide resolved
pkg/jobcreator/run.go Show resolved Hide resolved
pkg/options/configs/dev.toml Show resolved Hide resolved
pkg/options/otel.go Show resolved Hide resolved
pkg/system/gpu.go Show resolved Hide resolved
pkg/system/otel.go Outdated Show resolved Hide resolved
pkg/system/otel.go Show resolved Hide resolved
pkg/web3/sdk.go Show resolved Hide resolved
pkg/web3/sdk.go Show resolved Hide resolved
@bgins bgins force-pushed the bgins/feat-add-run-job-traces branch 18 times, most recently from 17c73f2 to bd89a0d Compare August 15, 2024 22:46
@bgins bgins requested a review from alvin-reyes August 23, 2024 18:50
Copy link
Collaborator

@walkah walkah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and runs locally! :shipit: 🚀

Comment on lines +48 to +46

- name: Display resource provider logs
run: docker logs resource-provider

- name: Display solver logs
run: docker logs solver

- name: Display chain logs
run: docker logs chain
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose / value of having all these logs in the integration tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to help debug when the integration tests fail. While working on this pull request, there was an issue where there were failures in the runner environment only. Had to add logs to get to the bottom of it.

I think there's no harm in printing the logs? Makes it easier to see what may have failed.

"github.com/rs/zerolog/log"
)

func GetGPUInfo() []string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be great to have for cross-referencing 🙏

@bgins bgins force-pushed the bgins/feat-add-run-job-traces branch from ce7096d to 215e802 Compare August 29, 2024 18:52
Copy link

cla-bot bot commented Sep 3, 2024

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have the users @narbs91 on file. In order for us to review and merge your code, please open a new pull request and sign the Contributor License Agreement following the instructions in our contributing guide. Once the pull request is merged, ping a maintainer who will summon me again.

@cla-bot cla-bot bot added the cla-signed label Sep 3, 2024
@narbs91 narbs91 merged commit bff7bcf into main Sep 3, 2024
3 checks passed
@narbs91 narbs91 deleted the bgins/feat-add-run-job-traces branch September 3, 2024 15:32
@bgins bgins self-assigned this Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add resource provider run job trace
3 participants