-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decrement thread count on exception. #1929
Conversation
cc @danieldoglas since IIRC you looked at this code this year - #1767 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great investigation and explanation! 👍
SWARN("Caught system_error starting _replicate thread with " << _replicationThreadCount.load() << " threads. e.what()=" << e.what()); | ||
_changeState(SQLiteNodeState::SEARCHING, message.calcU64("NewCount") - 1); | ||
STHROW("Error starting replicate thread so giving up and reconnecting."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens after we throw here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should go synchronizing as if the node was just turned on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that counter existed when I changed this, thanks for finding and fixing!
Details
So, we increment the thread count here, before starting a replication thread:
Bedrock/sqlitecluster/SQLiteNode.cpp
Line 1644 in a30c7e1
Normally, we will decrement this inside the replication thread when it exits, which is handled here:
Bedrock/sqlitecluster/SQLiteNode.cpp
Line 199 in a30c7e1
However, if the thread fails to start and throws
system_error
, we will get to this point:Bedrock/sqlitecluster/SQLiteNode.cpp
Line 1655 in a30c7e1
Without ever having run the thread, and thus, without the thread count ever being decremented. Further, change state will wait for the thread count to be 0 before it completes, which means the above line blocks and we never log the warning or throw the error.
Here's where _changeState blocks on this:
Bedrock/sqlitecluster/SQLiteNode.cpp
Lines 1874 to 1880 in a30c7e1
The fix is to decrement the counter if we hit the exception case. I've also moved logging the warning to above the
_changeState
call so that it will be visible sooner that we've hit this exception.Fixed Issues
Fixes https://github.com/Expensify/Expensify/issues/440475
Tests
Internal Testing Reminder: when changing bedrock, please compile auth against your new changes