-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1.0.2] SHiP: Fix hang on shutdown #816
Conversation
Note:start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified SHiP
boost::beast::websocket
to use the SHiP thread instead of the main application thread executor strand
I think really what this changes is what I'd call the "default executor" of the websocket (I am not sure what the proper terminology is). My understanding was that the default executor was never used based on how the code is structured because the bound executor for every async call on the websocket was a use_awaitable
, and that will use the coroutine's executor (which is the strand created from ship thread, in this case). The websocket certainly uses the ship thread and strand prior to this change. But if this does fix the problem there must be some other case where the default executor matters. And I do think it's good to make the default executor match up.
@@ -101,7 +101,7 @@ struct state_history_plugin_impl { | |||
void create_listener(const std::string& address) { | |||
const boost::posix_time::milliseconds accept_timeout(200); | |||
// connections set must only be modified by main thread; run listener on main thread to avoid needing another post() | |||
fc::create_listener<Protocol>(app().get_io_service(), _log, accept_timeout, address, "", [this](Protocol::socket&& socket) { | |||
fc::create_listener<Protocol>(thread_pool.get_executor(), _log, accept_timeout, address, "", [this](Protocol::socket&& socket) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will need to be additional changes below here otherwise connections
will be modified in the ship thread which is not allowed.
But that's really still not enough to get it 100% correct if the intent is to make the default executor of the stream to be what we want -- the stream should be using a per-stream strand instead of the thread's executor. The problem is that fc::create_listener
doesn't give access to the variant of async_accept()
that allows creating the new socket with a different default executor than the listening socket. This is a real shortcoming and without refactoring fc::create_listener
I'm not sure off hand how to fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case there is only one SHiP thread and its implicit strand. I agree that it would be better if fc::create_listener
took a strand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah for 1.0.x that's fine but we should really get it right on main -- the code is currently structured with the assumption that increasing ship threads 'just works'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems passing a strand works.
fc::create_listener<Protocol>(strand, _log, accept_timeout, address, "", [this, strand](Protocol::socket&& socket) { | ||
boost::asio::post(app().get_io_service(), [this, strand, socket{std::move(socket)}]() mutable { | ||
catch_and_log([this, &socket, &strand]() { | ||
connections.emplace(new session(std::move(socket), std::move(strand), chain_plug->chain(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes here seem to make only a single strand for all connections. And then this single strand is moved here? Doesn't look right. I think what you had before was better even though it still didn't make the websocket's default executor be a strand.
}, _log)); | ||
// connections set must only be modified by main thread; run listener on ship thread so sockets use default executor of the ship thread | ||
fc::create_listener<Protocol>(thread_pool.get_executor(), _log, accept_timeout, address, "", [this](Protocol::socket&& socket) { | ||
boost::asio::post(app().get_io_service(), [this, socket{std::move(socket)}]() mutable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more, I think this creates a (fantastically remote) possibility of the socket being destroyed after its executor is destroyed. This occurs when this callback is post()
ed after the main thread is stopped, and then the plugin is destroyed, and then appbase is destroyed which will destroy pending callbacks. Obviously something we can tolerate for 1.0.x but another good reason to improve create_listener
state_history_plugin
(SHiP) would sometimes hang on shutdown. Thenodeos
process would get stuck trying to lock a mutex of the main applicationio_context
strand while destroying aboost::beast::websocket::stream
.Created a test that reproduces the failure (thanks @taokayan for #813). The failure was not reliably repeatable, but it did fail on every CI/CD run for at least one of the platforms. See for example: https://github.com/AntelopeIO/spring/actions/runs/11017331583/job/30595880305
Modified SHiP
boost::beast::websocket
to use the SHiP thread instead of the main application thread executor strand.Resolves #815