Multithreading in Ignition Gazebo #363
-
Hello! ** This comment is slightly outdated. See the one posted below. ** I have some questions for @diegoferigo regarding multithreading in Ignition Gazebo and #15. I'm trying to do something very similar in order to be able to use the First, I see in that PR that you included an example for cartpole in C++. However, I see that the cartpole example is missing in the latest version of Second, I checked out commit ab841a3a3025a503699878090ff4eaf52f7bb67a and was looking around at the code. If I understand it correctly, you do the following in
Am I understanding these steps correctly? Rather than using the Python wrapper for The way I'm trying to go about the multithreading issue is similar to what you're doing (i.e. I setup the world and I scope the model exactly as you do in the cartpole example). For example, I have an
where This actually runs without error. The only issue is that when I try to set the joint position targets for the robot (
It does work, however, when everything is running on a single thread (i.e. without the I guess my final question is - do you have any advice for how I could go about debugging this, or did you ever encounter a similar issue or other weird problems when trying to get multithreading working for your application using Ignition Gazebo? Thank you very much! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 10 replies
-
** Update: ** However, this has presented another issue. To debug, I tried running just two separate worlds/robots in parallel where their execution happens on seperate threads. It seems like when I insert my robot model + the
Which I presume comes from DART? To work around this, I tried two things:
where, here The downside to this approach is that, while it resolves that
I noticed PR #340 in this repo - what work needs to be done on this/what is the status of this PR before using PyBullet with Ignition/ScenarIO? Thanks again and sorry for the long read! P.S. Unfortunately I am unable to share the entirety of my code. Sorry about that, but I very much appreciate the help/discussion. |
Beta Was this translation helpful? Give feedback.
-
Hi @nicholaspalomo, Thanks for opening this discussion! I remember there are few infos scattered in old issues (including #12, #13, #15), but there's no centralized location where is clearly discussed. I take advantage of this discussion to clarify a bit the overall situation, it will be definitely useful also to other users. I'll start with some of the choices developer have to implement concurrent simulations, then list some of the possible directions worth taking / exploring, and finally providing a brief overview of what we're currently using. As you might have noticed, the related issues and PRs are quite old, they all refer to the very early stage of this project. There are no (public) further updates after that. I want to keep this explanation as short as possible. The first two questions to answer are the following:
As you might already know, or I hope will be clear soon, these two questions are tightly related. Let's start with Python. Let's assume we use the Python bindings of ScenarIO, without any gym-ignition resources. Using Discarding Python, let's try doing things with C++ (as #15). Working on a lower-level language is much better as you noticed, and it is indeed possible to fully exploit the entire hardware. With some basic knowledge of thread synchronization and async computation, it's fairly easy to draft a prototype to run concurrent simulations and it works well. However, problems are behind the corner also using C++. In fact, as mentioned in #12 (comment) and reported in gazebosim/gz-sim#18, DART uses ODE as default contact detector, and ODE exploits singletons that gets messed up when within the same process there are multiple concurrent instances. This is the cause of the ODE assertion you experienced. In the past there have been attempts to use a static mutex (here be dragons) but eventually it never really worked reliably. Unfortunately, there's no way out. Before jumping to new features and possible related unexplored directions, I want to clarify few more things. Users need to be aware that a single instance of Ignition Gazebo spawns multiple threads, that are roughly proportional to the number of plugins that have been loaded plus a constant number for the basic functionalities. My point with this is that, it's not trivial to fully exploit the hardware, because knowing in advance the number of threads used by a single instance is not easy, and a concurrent setup with much more threads than the max machine affinity would produce a massive context switch that would kill performance. Tuning the number of concurrent simulation is very important. In this section I report what I believe could be the direction worth trying. As I described above, a multithread setup with ODE as contact detector is not suitable. Similarly, a multi-world simulation (an advanced feature of Ignition Gazebo) is not suitable as well because, similarly, it is implemented with multithreading. So:
To conclude this section, you mentioned #340. This works for simple enough simulations, however for my personal research (legged locomotion) it is not yet ready. The bullet implementation (gazebosim/gz-physics#44) is still not even with DART and some important features are still missing. You could try exploring this direction, but if you need contact rich scenarios, I suspect that you'll quickly get stuck as well. Finally, I want to quickly describe our setup. We had to go towards multiprocessing since the bullet contact detector is a very new feature. After an extensive investigation, we found out that the parallelization of environments is often done by the RL frameworks. Since ScenarIO and gym-ignition aim is developing environments, the most important thing was to make them compatible with most of the architectures implemented by the RL frameworks (e.g. MPI, multiprocessing, ...). This is the current status. We selected ray/rllib for our experiments. The project is very active and it has a huge community. They implemented massively parallel architectures that suite all the needs. Also, the algorithms selection and their configuration is very complete. It allows a very large horizontal scaling if you have enough hardware resources, and it also supports creating clusters of multiple machines (even though in my experience this is not often necessary). This is to say that if you carefully pick a proper RL framework that is compatible with This became longer than I originally thought, but I think a proper explanation was necessary also to clarify this argument that is likely paramount for many users of this project. |
Beta Was this translation helpful? Give feedback.
Hi @nicholaspalomo,
Thanks for opening this discussion! I remember there are few infos scattered in old issues (including #12, #13, #15), but there's no centralized location where is clearly discussed. I take advantage of this discussion to clarify a bit the overall situation, it will be definitely useful also to other users. I'll start with some of the choices developer have to implement concurrent simulations, then list some of the possible directions worth taking / exploring, and finally providing a brief overview of what we're currently using.
As you might have noticed, the related issues and PRs are quite old, they all refer to the very early stage of this project. There are no (pub…