Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service discovery not working reliable #792

Open
jmachowinski opened this issue Nov 29, 2024 · 4 comments
Open

Service discovery not working reliable #792

jmachowinski opened this issue Nov 29, 2024 · 4 comments
Assignees

Comments

@jmachowinski
Copy link

Bug report

Required Info:

  • Operating System:
    • Ubuntu 22.04
  • Installation type:
    • Jazzy from source

Steps to reproduce issue

This issue is reproducible by starting our software stack. I did not manage to
get a reproducer that I can share.

Expected behavior

All services are returned as available

Actual behavior

Some services are discovered some not.

Additional information

I already tried to debug this down. My findings so far :
rclcpp;:Client::service_is_ready() returns false, as

common_context->graph_cache.get_reader_count(pub_topic_name, &number_of_request_subscribers);
if (ret != RMW_RET_OK) {
// error
return ret;
}

is hit.

I added some debug, and found out, that entities are added to the rmw_dds_common::GraphCache, but not all.

As far as I can see, the graph is populated by


But I can't figure out how the discovery exactly works, and where the problem might originate from.

I found a potential bug here :

std::unique_ptr<rmw_subscription_t, std::function<void(rmw_subscription_t *)>>

The subscriber has currently a depth of 1. Is this depth per publisher ? If yes this should be fine. If not, we got a potential race.

Note, that we have a BIG stack > 80 Nodes.
Also note, that everything works as expected if cycloneDDS is used.

@fujitatomoya
Copy link
Collaborator

Note, that we have a BIG stack > 80 Nodes.

we have more than 100 nodes running in the embedded system. we have not had this issue yet.

The subscriber has currently a depth of 1. Is this depth per publisher ? If yes this should be fine. If not, we got a potential race.

for subscription, QoS history is set with RMW_QOS_POLICY_HISTORY_KEEP_ALL. (depth is only honored if the history is set to RMW_QOS_POLICY_HISTORY_KEEP_LAST.) so this should be no problem.
besides, rmw_cyclonedds also has the same QoS configuration with rmw_fastrtps.

did you actually try to increase the depth and that mitigates the problem reproduction rate?

@jmachowinski
Copy link
Author

did you actually try to increase the depth and that mitigates the problem reproduction rate?

Yes, I changes the depth to 100, that did not change anything. This reader seems only to be used for node discovery. The readers and writers are discovered by something within fastDDS.

@fujitatomoya
Copy link
Collaborator

@MiguelCompany builtin discovery traffic is always done with UDP/TCP by default, is that correct? so this is really discovery issue without SHM transport, right?

@jmachowinski
Copy link
Author

jmachowinski commented Dec 6, 2024

@MiguelCompany How can I narrow this down ?
I can see, that ParticipantListener is not called for the missing subscribers and publishers.
What is the counter part in rtps, that announces them ? I would like to add trace code there,
to make sure, that its not something in the graph code...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants