-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU spikes of existing nodes when starting new node #741
Comments
With some experimentation I also noticed that the higher the number of existing nodes, the higher the CPU rise is when an extra node is added to the network |
@tonynajjar thanks for creating issue. we have been meeting the similar situation... a couple of things,
I am not sure if you can use ROS 2 Fast-DDS Discovery Server since it changes the architecture, either acceptable or not this will reduce the discovery cost significantly. |
Thanks for your answer @fujitatomoya.
|
Your comment reminded me to clarify that all the nodes are running on one machine so I guess the issue can't be caused by a suboptimal network. |
i think you can create <participant profile_name="participant_profile_simple_discovery">
<rtps>
<builtin>
<discovery_config>
<initialAnnouncements>
<count>1</count>
<period>
<nanosec>500000000</nanosec>
</period>
</initialAnnouncements>
</discovery_config>
</builtin>
</rtps>
</participant> my expectation here is,
anyway, i would like to have the opinion from eProsima. |
Thanks @fujitatomoya, this is indeed what I would have suggested to try out as well. Please @tonynajjar do let us know how it goes. |
Thank you for your recommendation. Unfortunately it did not work. All the nodes in my localhost network are running this configuration:
I still get the CPU spike |
@tonynajjar i am curious, what command did you use for this verification? e.g |
I just started some custom teleop node. But I think 'ros2 topic echo xxx' would also cause the spike; it has in the past |
Any alternative solutions I could try? Could someone of the maintainers try to reproduce this so that we at least know for sure that this is not a local/configuration issue? If we can confirm this, I think this bug deserves some high-prio attention, as for applications already reaching the limits of CPU consumption, this bug would be a deal breaker for using fastdds |
@tonynajjar
i think there is still spike after the configuration is applied, but expecting spike period should be mitigated and CPU consumption comes down quicker than before? if you are seeing the no difference, maybe configuration is not applied. make sure that DEFAULT_FASTRTPS_PROFILES.xml in the running directory where you issue ros2 run xxx. something else i would try is to disable the shared memory transport. <?xml version="1.0" encoding="UTF-8" ?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
<transport_descriptors>
<transport_descriptor>
<transport_id>udp_transport</transport_id>
<type>UDPv4</type>
</transport_descriptor>
</transport_descriptors>
<participant profile_name="UDPParticipant">
<rtps>
<userTransports>
<transport_id>udp_transport</transport_id>
</userTransports>
<useBuiltinTransports>false</useBuiltinTransports>
</rtps>
</participant>
</profiles> if anything above does not work, that is out of my league... |
Thank you for your answer. I'm pretty sure that the configuration was applied; I made sure by making a typo and seeing errors when I launch the nodes. Regarding disabling Shared Memory, I think I tried that already but I can't remember for sure; I'll give it another shot in the next few days. I'd appreciate if someone could try reproducing it. I'll try to create a minimal reproducible launch file, e.g. launching 40 talkers and 40 listeners. |
from launch import LaunchDescription
from launch_ros.actions import Node
def generate_launch_description():
# Initialize an empty list to hold all the nodes
nodes = []
# Define the number of talkers and listeners
num = 40
# Create talker nodes
for i in range(num):
talker_node = Node(
package='demo_nodes_cpp',
executable='talker',
namespace='talker_' + str(i), # Use namespace to avoid conflicts
name='talker_' + str(i)
)
nodes.append(talker_node)
# Create listener nodes
for i in range(num):
listener_node = Node(
package='demo_nodes_cpp',
executable='listener',
namespace='listener_' + str(i), # Use namespace to avoid conflicts
name='listener_' + str(i),
remappings=[
(f"/listener_{str(i)}/chatter", f"/talker_{str(i)}/chatter"),
],
)
nodes.append(listener_node)
# Create the launch description with all the nodes
return LaunchDescription(nodes) Here is a launch file for you to reproduce the issue. After this is launched, run |
@fujitatomoya or @EduPonz were you able to reproduce the issue with the example I provided? It would be already useful if I can confirm whether or not this is a bug or suboptimal configuration from my side |
@tonynajjar sorry for being late to get back to you. we have know this situation, i did not use your example, but having more than 100 nodes generates the CPU spike for a few seconds. as we already know, this is because of the participant discovery. i am not sure any other configuration would work to mitigate this transient CPU load... |
We have the same issue. We use SHM since we have components in a container that are exchanging large pointcloud data, it seems to perform better and more efficient with that. However, if we launch other nodes later on, for example debug tools, UI, etc. it does cause huge CPU spikes and causes timings to go off / heartbeats to die and software to go into error due to this. would be good to have some solution for it. |
Hi @tonynajjar, We were wondering if the CPU usage spike could be related to the fact of being synchronously waiting for sockets to send the data buffers. Would it be possible for you to test with the following configuration ? Thanks in advance |
Bug report
Required Info:
Operating System:
Installation type:
Version or commit hash:
DDS implementation:
Steps to reproduce issue
Expected behavior
No considerable CPU spike for the existant nodes
Actual behavior
CPU spikes for a few seconds for all the nodes to about double their consumption! I'm guessing it has to do with discovery?
Additional information
I quickly tried with Cyclone and did not witness the CPU spike but I would like to fix it with Fastdds if possible (otherwise will have to switch)
The text was updated successfully, but these errors were encountered: