-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subscriber does not attain/maintain ownership of received loaned messages #679
Comments
Hi @verderog, What you are seeing is the expected behavior when using Data Sharing with Loans with the default QoS configuration, in particular KEEP_LAST_HISTORY_QOS or BEST_EFFORT_RELIABILITY_QOS. Both DataWriter and DataReader share the exact same history so in the event that the reading rate is significantly slower than the publication rate it may lead to sample overwriting. You can read a bit more on the issue including some ways to prevent this in this section of Fast DDS documentation. |
@jsantiago-eProsima Thank you for the reply! I've read (and re-read) the section you linked, along with the ROS2 info on QoS. Even so, I still don't quite have a conceptual understanding of how loaned messaging should be working. To add to my confusion, I know I'm interfacing with FastRTPS through ROS2 and not invoking DataWriter/DataReader APIs directly -- so I'm one layer removed from the operations described by the FastRTPS documentaiton. If I describe my line of thinking, hopefully you can clarify how things really are intended to work and where I may be going off the rails. In my mind, I understand that the middleware creates a pool of loaned messages (ie. samples). The publisher can come along and execute At this point, things are little bit more muddled for me. The middleware in some way will notify all subscribers that there is a sample with updated data (ie. a sample has been published.) I doubt ownership is transferred away from the middleware as you most likely cannot have multiple subscribers "own" the sample. If I understand the FastRTPS documentation correctly, this sample exists in the history of both the middleware and the subscribers as a shared memory location. The callback function for the subscribers is invoked and references this shared memory location. The invocation of the callback is enough to acknowledge that the sample has been processed, which makes that sample fair game for the middleware to reuse. This will happen even if the subscriber hasn't technically completed its processing of the sample. Is this correct? Is there a way for a subscriber to determine if a loaned message has been updated before it has completed processing it? My current approach is for each sample to include a sequence number as part of its data. This value is uniquely set by the publisher before publishing. If the subscriber observes that the sequence number has not changed after it is done processing the data from the sample, it assumes the rest of the sample also did not change. This feels like a naive approach, but it seems to work as a first-pass. I don't think it is possible to setup a QoS profile for data sharing to prevent the publisher and multiple subscribers from trying to operate on data at the same time. Unless I somehow bypass ROS2 and interface with the FastRTPS DataWriter/DataReader directly, I'm not sure how I can guarantee these concurrent access operations won't happen. Sorry for the length of my reply. I do appreciate your help! |
I believe, as far as i can see from the doc, DataWriter does not even get acknowledged by DataReaders, that means DataWriter can do reuse that sample whenever necessary.
I would like to ask the same question. Even with QoS KEEP_LAST_HISTORY_QOS or BEST_EFFORT_RELIABILITY_QOS, this would lead the DataReaders to read the unknown data could be corrupted. I can see the we can change the QoS into |
As an experiment, I updated my XML with the following changes to the reliability QoS settings. I observed no change in behavior with my sub. With a 15 second delay in the sub's topic callback, it detected shared data changing so it does not appear that the pub/DataWriter was blocked:
|
According to previous design, there are 2 policies while overflow happened. User can set it. But there is no At this commit, it removed 2 policies. Now it adopted 'overwrite' way may consider the performance and complexity. About QoS We think it is important to let users know when an 'overwrite' occurs, so that they can adjust the size of shared memory to a suitable value. |
@verderog In the case of ROS 2, there are certain QoS policies that can never be configured by XML, and where the settings from rclcpp will always be honored. So to ensure a RELIABLE, KEEP_ALL communication, you need to create the publisher and subscriber this way:
|
@Barry-Xu-2018 The shared memory transport is completely unrelated to this, we are talking about data-sharing here. Most of the relevant code is here, though some relevant parts are inside |
@MiguelCompany Thank you for correcting me. |
@MiguelCompany sorry that is correct, missed it. thanks for pointing that out.
this behavior is expected behavior currently, right? I do not think we can rely on this behavior? cz data would be corrupted w/o notification? just checking my understanding is correct.
This behavior makes sense to me, to guarantee that DataReaders read the delivered message, and if necessary it blocks DataWriter to publish the new messages because there is no freed samples. |
@MiguelCompany Thanks for the help! I implemented your recommendation and am faced with a new issue relating to segment size:
I've got an element in the message data that is 8MB in size (a 1920x1080x4 image buffer) that seems to be the source of this problem. If I reduce this size down to 768KB, then I'm able to run the publisher without the above error. Is there a method available to adjust the Fast-RTPS segment size from ROS2? |
@MiguelCompany Thanks for the quick response! I've updated my XML configuration with To recap my test configuration to "force" the issue:
I had it in mind that the QoS updates you recommended would force the pub to block at some point due to running out of unallocated samples but that doesn't seem to be the case. So, either I'm completely misunderstanding something or the behavior I'm looking for, unchanging data using a shared memory approach, simply isn't possible. |
Bug report
Required Info:
ROS_DISTRO=humble
,ROS_VERSION=2
Steps to reproduce issue
I'm using a fairly straightforward publisher/subscriber node setup. Publisher publishers to topic, subscriber subscribes to that same topic with a standard callback. Publisher and subscriber nodes are invoked as separate processes.
The issue occurs such that if a long enough delay is induced on the subscriber's callback as it's processing an incoming loaned message, the publisher is not prevented from updating the message contents. It does not seem that the subscriber attains or maintains ownership of incoming loaned messages.
Publisher creation:
Publisher publishing:
Subscriber creation:
Subscriber callback processing:
Env config:
XML config:
Expected behavior
Subscriber operates on message data that does not change.
Actual behavior
Publisher is not prevented from updating message while it is being processed by one or more subscribers.
Additional information
See points 2 & 3 in this unanswered thread: https://answers.ros.org/question/401385/loaned-messages-and-zero-copy/
I'm not sure if I'm missing something in terms of loaned message life cycle (ie. "That's just the way it is"), operational behavior, or if there is a mechanism to prevent this sort of issue from occurring.
On a somewhat related note, I've also observed segmentation faults if the publisher node process terminates while a subscribe node process is accessing loaned message data. That might warrant another issue.
If it helps, I can post a simplified test that demonstrates the problem.
The text was updated successfully, but these errors were encountered: