Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bookie process still alive when bookie startup is not successful and shutdown #4111

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

TakaHiR07
Copy link
Contributor

Motivation

When bookie startup encounter IOException in BookieImpl#readJournal(), bookie startup is not successful and then trigger shutdown(). However, bookie process is still alive.

The reason is IOException is caught in BookieImpl and trigger shutdown. Bookie actually is not running. But the exception do not throw to startComponent.

So BookieServer.main / server.Main.doMain still wait for the startComponent future to complete.

The relevant code is as following:

try {
readJournal();
} catch (IOException | BookieException ioe) {
LOG.error("Exception while replaying journals, shutting down", ioe);
shutdown(ExitCode.BOOKIE_EXCEPTION);
return;
}

public void start() throws InterruptedException, IOException {
this.bookie.start();
// fail fast, when bookie startup is not successful
if (!this.bookie.isRunning()) {
exitCode = bookie.getExitCode();
this.requestProcessor.close();
return;
}

public static CompletableFuture<Void> startComponent(LifecycleComponent component) {
CompletableFuture<Void> future = new CompletableFuture<>();
final Thread shutdownHookThread = new Thread(
new ComponentShutdownHook(component, future),
"component-shutdown-thread"
);
// register a shutdown hook
Runtime.getRuntime().addShutdownHook(shutdownHookThread);
// register a component exception handler
component.setExceptionHandler((t, e) -> {
log.error("Triggered exceptionHandler of Component: {} because of Exception in Thread: {}",
component.getName(), t, e);
// start the shutdown hook when an uncaught exception happen in the lifecycle component.
shutdownHookThread.start();
});
component.publishInfo(new ComponentInfoPublisher());
log.info("Starting component {}.", component.getName());
component.start();
log.info("Started component {}.", component.getName());
return future;
}

Changes

throw exception to startComponent when bookie start encounter error.

@@ -821,6 +882,8 @@ public void testBookieStartException() throws Exception {
loggerOutput.expect((List<LoggingEvent> logEvents) -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better assert startFuture and startFuture2 separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have done.

@@ -127,7 +127,7 @@ public void start() throws InterruptedException, IOException {
if (!this.bookie.isRunning()) {
exitCode = bookie.getExitCode();
this.requestProcessor.close();
return;
throw new InterruptedException("fail fast, bookie startup is not successful");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would better define another Exception; the InterruptedException is for the thread case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define it "BookieStartException", extends the BookieException ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have done.

@hangc0276
Copy link
Contributor

@TakaHiR07 Could you take a look at the check style issue? Thanks.

@TakaHiR07
Copy link
Contributor Author

@TakaHiR07 Could you take a look at the check style issue? Thanks.

have fix checkstyle.

@hangc0276 hangc0276 closed this Dec 6, 2023
@hangc0276 hangc0276 reopened this Dec 6, 2023
Copy link
Member

@horizonzy horizonzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@hangc0276 hangc0276 closed this Jan 10, 2024
@hangc0276 hangc0276 reopened this Jan 10, 2024
@hangc0276 hangc0276 closed this Feb 21, 2024
@hangc0276 hangc0276 reopened this Feb 21, 2024
@hangc0276
Copy link
Contributor

@TakaHiR07 Would you please help fix the failed tests? Thanks.

@TakaHiR07 TakaHiR07 force-pushed the fix_bookie_process_alive_when_start_not_success branch from 69c8557 to 108d258 Compare March 6, 2024 08:10
@TakaHiR07
Copy link
Contributor Author

@TakaHiR07 Would you please help fix the failed tests? Thanks.

@hangc0276 I have fix the test since powermock is removed in master branch.

@shoothzj shoothzj force-pushed the fix_bookie_process_alive_when_start_not_success branch from 108d258 to 3e48ecb Compare May 15, 2024 05:56
@shoothzj
Copy link
Member

update the branch

@xiang092689
Copy link
Contributor

Normally, "Main" thread will wait for "component-shutdown-thread" thread to complete the "component start future". Your edition will cause future complete too soon, and main thread exit, which may cause shutdown not finished and process exit?

Copy link
Member

@shoothzj shoothzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TakaHiR07
Copy link
Contributor Author

Normally, "Main" thread will wait for "component-shutdown-thread" thread to complete the "component start future". Your edition will cause future complete too soon, and main thread exit, which may cause shutdown not finished and process exit?

"component start future" would complete only when exception throw. if bookie.start() encounter exception, it may catch some exception and go into shutdown(). only shutdown finish, then bookie.isRunning() become false, then go into the edition code.

@TakaHiR07
Copy link
Contributor Author

I think it's not an APIException, should we define this exception in https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/ExitCode.java ?

ExitCode looks like pointing out the exact reason why bookie start fail. It seems not appropriate to put this exception here, this exception only point out that bookie start fail.

@TakaHiR07
Copy link
Contributor Author

Many failed test shows that in previous design, bookieServer start failed, but not throw exception. I think these test is also unreasonable and should be fixed. Looking forward to more views, then I would fix these test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants