-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tx-generator MVar deadlock reporting #5850
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
NadiaYvette
force-pushed
the
nadia.chambers/txgen-mvar-004
branch
2 times, most recently
from
May 22, 2024 04:11
b5d6a4c
to
3ab03e2
Compare
NadiaYvette
changed the title
Nadia.chambers/txgen mvar 004
tx-generator MVar deadlock reporting
May 22, 2024
NadiaYvette
force-pushed
the
nadia.chambers/txgen-mvar-004
branch
8 times, most recently
from
May 28, 2024 15:16
cfa09f4
to
e874653
Compare
mgmeier
force-pushed
the
nadia.chambers/txgen-mvar-004
branch
from
May 28, 2024 15:31
e874653
to
8ab6c14
Compare
NadiaYvette
force-pushed
the
nadia.chambers/txgen-mvar-004
branch
from
June 3, 2024 16:57
8ab6c14
to
271a966
Compare
mgmeier
force-pushed
the
nadia.chambers/txgen-mvar-004
branch
from
June 4, 2024 14:58
271a966
to
a20a4cf
Compare
NadiaYvette
force-pushed
the
nadia.chambers/txgen-mvar-004
branch
3 times, most recently
from
June 26, 2024 18:48
7545743
to
0c8a34f
Compare
NadiaYvette
force-pushed
the
nadia.chambers/txgen-mvar-004
branch
3 times, most recently
from
June 27, 2024 22:22
df78323
to
7f111d0
Compare
mgmeier
reviewed
Jun 28, 2024
NadiaYvette
force-pushed
the
nadia.chambers/txgen-mvar-004
branch
2 times, most recently
from
July 1, 2024 08:47
8f010dc
to
3962aa7
Compare
NadiaYvette
force-pushed
the
nadia.chambers/txgen-mvar-004
branch
from
July 1, 2024 10:37
3962aa7
to
bcbb945
Compare
mgmeier
approved these changes
Jul 1, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the huge chunk of work @NadiaYvette
NadiaYvette
force-pushed
the
nadia.chambers/txgen-mvar-004
branch
from
July 1, 2024 15:44
f76820e
to
5644eea
Compare
mgmeier
force-pushed
the
nadia.chambers/txgen-mvar-004
branch
from
July 1, 2024 15:45
5644eea
to
f76820e
Compare
This is to speed up the development cycle esp. for rapid tests.
This adds a keepalive interval field for NixServiceOptions and then handles defaulting it to 30 like now when it's absent from the JSON and defaulted to 10. It also omits it when it's the default when rendering it into JSON.
[email protected] wrote the nix code and the beginnings of the Haskell side. So this commit goes about creating node aliases, represented in Haskell in the NodeDescription structure, and using them in reporting, which happens in the exception handlers. The node aliases contribute to thread labels reported in the exception handlers where possible; older base versions lack accessors for thread labels. handleTxSubmissionClientError is the main exception handler involved. txSubmissionClient, consumeTxsNonBlocking and newTpsThrottle are spawned as threads within walletBenchmark and labelled accordingly, with the txSubmissionClient labels further elaborated upon with the node submitted to. SubmitMode's threadName field for the Benchmark alternative is produced by the compiler and corresponds to a label for a set of threads tracked in an AsyncBenchmarkControl structure as opposed to a label for an individual thread, hence the effort to generate thread labels by other means.
This installs a signal handler in Cardano.Benchmarking.Command.runCommand It borrows heavily from the signal handler from cardano-node.
I. make AsyncBenchmarkControl a record The type alias of a tuple was not very mnemonic or self-explanatory. This replaces it with a record and haddock documents its fields. II. use ABC to cancel threads The AsyncBenchmarkControl that should be initialized by the time a signal is received is fetched from the TVar and unpacked to be used to throw exceptions to the other threads. The other threads can now catch the exceptions in order to carry out orderly shutdowns in the sequel. III. use TVar for Env AsyncBenchmarkControl In order to thread the AsyncBenchmarkControl through the contexts surrounding the creation and destruction of the Async structures and overall container, this stores a TVar (Maybe AsyncBenchmarkControl) as a value in a Map where previously it was just AsyncBenchmarkControl. The idea is to use the reference to it to be able to use it in the context of a signal handler by packaging the reference data with the code pointer in a partial application or monadic context or similar. With that data in hand, it's just a matter of iterating over the threads and reaping them all.
This took a fair amount of rearrangement to broaden the constant environment in order to pass the keepalive interval in the NixServiceOptions around. So a few different things happened: I. create EnvConsts structure encompassing A. AsyncBenchmarkControl TVar (potentially changing to IORef) B. IOManager C. Maybe NixServiceOptions This moves the mutable reference in A. to the Reader environment from the State of the ExceptionT Env.Error (RWST EnvConsts () Env IO) ActionM monad. The reference stays constant though the referenced data changes. II. pass EnvConsts to runScript and runSelftest III. update Env.hs and NixService.hs accessors Some of it represents changing a little of the design of the Env and ActionM once again even after the prior commits, so a fair amount of squashing commits that entirely redo earlier commits' changes and rewriting commit messages will need to be done in the sequel.
I. Make tracers potentially available within signal handlers. This logs the event better. II. killThread weak main TID. Killing the main thread if the signal is received in a secondary thread makes sense as a back-up strategy.
The changelog is new.
NadiaYvette
force-pushed
the
nadia.chambers/txgen-mvar-004
branch
from
July 1, 2024 15:47
f76820e
to
dd32da1
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
In order to provide diagnostics surrounding MVar deadlocks seen during termination of the tx-generator, the following commit series provides some element of instrumentation & deliberate signal handling. This installs a signal handler for SIGINT and SIGTERM which issues trace messages and shuts down the entire process to the best of its ability. Some of the hope is that the explicit signal catching allows for deadlocks during termination of the tx-generator to be counteracted by sending signals and that that will log trace messages of the same kind as are handled by performance team tooling.
Checklist
See Runnings tests for more details
There should not be new tests required.
CHANGELOG.md
for affected packageThis change doesn't appear worthy of changelogging, as it doesn't cause upstream API changes.
.cabal
files are updatedNo incompatibilities are introduced, so it doesn't appear that version bounds in cabal files need updating.
hlint
. See.github/workflows/check-hlint.yml
to get thehlint
versionstylish-haskell
. See.github/workflows/stylish-haskell.yml
to get thestylish-haskell
versionghc-8.10.7
andghc-9.2.7
Note on CI
If your PR is from a fork, the necessary CI jobs won't trigger automatically for security reasons.
You will need to get someone with write privileges. Please contact IOG node developers to do this
for you.