-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible regression in rcl preshutdown callbacks - context invalid? #2547
Comments
I don't know if its related, but also point out this issue as well discovered which points to issues in |
No worries, shit happens. Another good reason Nav2's around using all these bells and whistles as a canary in the coalmine. Only used up like a day on this so not bad at all in the scheme of things. Lovely that there's already a solution! Thanks @fujitatomoya |
#2545 fixes But if the context is gracefully shutdown (e.g deferred signal handler via SIGINT, which is the exactly case here), the context is already shutdown, and that makes the lifecycle node dtor fails to publish the transition (lifecyclenode should be shutdown to avoid unknow state for sensors and devices but fails since context is not valid anymore). #2545 bypass this situation at this moment, shutdown is called only when the context is valid. this is tagged with #2545 CI is running now, once i have green light, i will merge it to rolling. |
BTW: this worked fine those pre-24.04 Rolling (22.04 Rolling didn't exhibit this) so there was something that changed that caused this, it was fully functioning for ~a year prior. I think my understanding of your response is that there's a bit of a catch-22, so mentioning this as a "I think there should be a solution, no?" |
@SteveMacenski i think there are 2 problems here,
I am sure these messages are generated by #2528 as explained #2547 (comment), and #2545 can fix it. note: #2528 IS NOT merged to jazzy. (#2542 staging but merged), so we should not meet these message with jazzy.
these 3 issues are with rmw_fastrtps, right? this looks the same problem with ros2/rmw_fastrtps#761 |
The second comment is definitely the same as this. The first is perhaps? Its also another 'things failing on shutdown' example and you see similar logs with the pre-shutdown (the third is a summarized version of the same). I didn't assume it was a Fast-DDS specific problem, but 🤷
|
#2545 has been merged to rolling, the following waning and error should not be printed even with graceful shutdown.
|
Great! Currently in the middle of other firefighting with getting Nav2 out for Jazzy, but I'll test this out once the nightly docker containers update and I'm between burning buildings. If you're confident that it addresses the issue, feel free to close |
@SteveMacenski after #2545 is merged in rolling, are you guys still having the other issue? [INFO] [1716243365.746525522] [costmap]: Running Nav2 LifecycleNode rcl preshutdown (costmap)
[INFO] [1716243365.746611543] [costmap]: Destroying bond (costmap) to lifecycle manager.
Failed to delete datawriter, at ./src/publisher.cpp:45 during '__function__'
cannot publish data, at ./src/rmw_publish.cpp:66 during '__function__'
Fail in delete datareader, at ./src/rmw_service.cpp:89 during '__function__' |
@fujitatomoya sorry for the delay, the 24.04 & GZ migrations took longer than we expected in Nav2-ville. Is there a docker base container available that contains these updates? If so, I can test this tomorrow and let you know! |
Bug report
Required Info:
Description
I've noticed in the Nav2 transition in both unit tests and now at the system level that there appears to be a regression in the handling of the context and the pre-shutdown callbacks starting with the transition to 24.04 and seen in both Jazzy and current rolling. There has been some extended discussion wrt the unit tests in the comments:
But I'm also now seeing it on shutdown of Nav2 at the system level now that we have Gazebo Harmonic working with Nav2. We run our rcl preshutdown callbacks to transition the nodes that may not have been properly transitioned to shutdown before Control+C was hit. This is executing as expected with logs:
But then also contain traces from rcl which seem related to publishers failing to do their thing because the context is invalid at the stage of pre-shutdown. We see only publisher failures when we have the system properly cycled down, but if we have the Nav2 lifecycle nodes still active so services are called to transition them, then we see the service eq as well.
I clipped the logs to show one server in particular, but this repeats for all instances.
This can be reproduced in Nav2 using the 24.04/Rolling + the simulator in ros-navigation/navigation2#3634. Drive around for a moment, stop, and hit control+C. We previously had a clean shutdown without crash. It can also be more minimally reproduced in the PR https://github.com/ros-navigation/navigation2/pull/4298/files
TEST(WPTest, test_dynamic_parameters)
test changes -- where we manually shut things down & reset the object internally to the test to avoid the need for the rcl pre-shutdown to do work, thereby bypassing itThe text was updated successfully, but these errors were encountered: