Multithreading in Ignition Gazebo #363

nicholaspalomo · 2021-06-28T21:43:32Z

nicholaspalomo
Jun 28, 2021

Hello!

** This comment is slightly outdated. See the one posted below. **

I have some questions for @diegoferigo regarding multithreading in Ignition Gazebo and #15. I'm trying to do something very similar in order to be able to use the VecEnv class available in Gym, i.e. so that I can run multiple robots in parallel for reinforcement learning where the integration of the dynamics for each robot is happening on a separate thread.

First, I see in that PR that you included an example for cartpole in C++. However, I see that the cartpole example is missing in the latest version of master for gym-ignition. Is there a particular reason for this? Is multithreading something still supported in Ignition?

Second, I checked out commit ab841a3a3025a503699878090ff4eaf52f7bb67a and was looking around at the code. If I understand it correctly, you do the following in examples/cpp/LaunchParallelCartPole.cpp:

You setup the Gazebo world,
You setup/spawn the model where the name of the model is scoped according to the ID of the environment, i.e. id::model_name becomes the name of each robot added to the simulation.
For each environment, you create a thread on which the environment's run method is processed.

Am I understanding these steps correctly?

Rather than using the Python wrapper for gym-ignition, I'm using the scenario library and I created my own gym setup in C++ based on that.

The way I'm trying to go about the multithreading issue is similar to what you're doing (i.e. I setup the world and I scope the model exactly as you do in the cartpole example). For example, I have an action, that I send to my robot representing the joint position targets:

            /* Step environment */
            async_pool<float> pool_step;
            for(int i = 0; i < numEnvs_; i++)
                pool_step.push_back(std::async(std::launch::async, &ChildEnvironment::step, &*environments_[i], action.row(i)));

            /* Get the reward from the environment after all threads have finished executing */
            for(int i = 0; i < numEnvs_; i++)
                reward[i] = pool_step[i].get();

where pool_step is an std::vector<std::future<float>> and environments_[i] is a pointer to the i-th environment.

This actually runs without error. The only issue is that when I try to set the joint position targets for the robot (std::shared_ptr<scenario::gazebo::Model>) and step forward the simulation, nothing happens:

            for(int i = 0; i < n_steps; i++) {
                robot_->setJointPositionTargets(target_positions, joint_names);
                gazebo_->run();
            }
            gazebo_->run(True);

It does work, however, when everything is running on a single thread (i.e. without the std::async or std::future, etc.) and the robot joints go to the correct positions, as expected. I'm also making sure to set the PID gains for the joints correctly first, of course.

I guess my final question is - do you have any advice for how I could go about debugging this, or did you ever encounter a similar issue or other weird problems when trying to get multithreading working for your application using Ignition Gazebo?

Thank you very much!

Answered by diegoferigo

Jun 29, 2021

Hi @nicholaspalomo,

Thanks for opening this discussion! I remember there are few infos scattered in old issues (including #12, #13, #15), but there's no centralized location where is clearly discussed. I take advantage of this discussion to clarify a bit the overall situation, it will be definitely useful also to other users. I'll start with some of the choices developer have to implement concurrent simulations, then list some of the possible directions worth taking / exploring, and finally providing a brief overview of what we're currently using.

As you might have noticed, the related issues and PRs are quite old, they all refer to the very early stage of this project. There are no (pub…

View full answer

nicholaspalomo · 2021-06-29T05:59:45Z

nicholaspalomo
Jun 29, 2021
Author

** Update: **
This afternoon I was able to get multithreading working with ScenarIO + Ignition Gazebo in C++, resulting in a dramatic speed increase.

However, this has presented another issue. To debug, I tried running just two separate worlds/robots in parallel where their execution happens on seperate threads. It seems like when I insert my robot model + the ground_plane, as soon as the robot and the ground make contact, I get the following error:

ODE INTERNAL ERROR 1: assertion "dIN_RANGE(index, 0, getMeshTriangleCount())" failed in fetchMeshTriangle() [collision_trimesh_opcode.cpp:553]

Which I presume comes from DART?

To work around this, I tried two things:

I placed thread locks around calls to gazebo_->run() like so:

m_mutex.lock();
// Integrate the simulation for n_steps number of time steps
for(int i = 0; i < n_steps; i++) {
    callback();
    gazebo_->run();
}
pauseGazebo();
m_mutex.unlock();

where, here gazebo_ is a pointer to a scenario::gazebo::GazeboSimulator object.

The downside to this approach is that, while it resolves that ODE Internal Error I was getting above, it now makes the simulation run about 5x slower than the second option which is:

I removed the ground_plane and ran the simulations in parallel with just the robot models. This is, however, a generally unacceptable solution since I need to detect when my robot strikes the ground. I haven't tried with any other models in the simulation besides just the robot and the ground, but I assume that they would trigger the same/a similar error to that which I mentioned above.

I noticed PR #340 in this repo - what work needs to be done on this/what is the status of this PR before using PyBullet with Ignition/ScenarIO?

Thanks again and sorry for the long read!

P.S. Unfortunately I am unable to share the entirety of my code. Sorry about that, but I very much appreciate the help/discussion.

0 replies

diegoferigo · 2021-06-29T07:28:52Z

diegoferigo
Jun 29, 2021
Maintainer

Hi @nicholaspalomo,

Thanks for opening this discussion! I remember there are few infos scattered in old issues (including #12, #13, #15), but there's no centralized location where is clearly discussed. I take advantage of this discussion to clarify a bit the overall situation, it will be definitely useful also to other users. I'll start with some of the choices developer have to implement concurrent simulations, then list some of the possible directions worth taking / exploring, and finally providing a brief overview of what we're currently using.

As you might have noticed, the related issues and PRs are quite old, they all refer to the very early stage of this project. There are no (public) further updates after that. I want to keep this explanation as short as possible. The first two questions to answer are the following:

Should the concurrency be implemented in C++ or Python?
Should the concurrency be implemented with multiple thread or multiple processes?

As you might already know, or I hope will be clear soon, these two questions are tightly related.

Let's start with Python. Let's assume we use the Python bindings of ScenarIO, without any gym-ignition resources. Using threading is very simple implementing a setup that could run multiple concurrent simulations. However, if you implement it and benchmark, you'll notice that the threads won't execute the simulations in parallel. The culprit is the GIL, which explanation is out of scope, you just need to know that is a Python "feature" and prevents parallel CPU intensive tasks. Our bindings are built with SWIG and it acquires the GIL before executing the code. Similarly to pybind11, there are ways to prevent it, but it's untested and in any case there are other problems that are detailed below.

Discarding Python, let's try doing things with C++ (as #15). Working on a lower-level language is much better as you noticed, and it is indeed possible to fully exploit the entire hardware. With some basic knowledge of thread synchronization and async computation, it's fairly easy to draft a prototype to run concurrent simulations and it works well.

However, problems are behind the corner also using C++. In fact, as mentioned in #12 (comment) and reported in gazebosim/gz-sim#18, DART uses ODE as default contact detector, and ODE exploits singletons that gets messed up when within the same process there are multiple concurrent instances. This is the cause of the ODE assertion you experienced. In the past there have been attempts to use a static mutex (here be dragons) but eventually it never really worked reliably. Unfortunately, there's no way out.

Before jumping to new features and possible related unexplored directions, I want to clarify few more things. Users need to be aware that a single instance of Ignition Gazebo spawns multiple threads, that are roughly proportional to the number of plugins that have been loaded plus a constant number for the basic functionalities. My point with this is that, it's not trivial to fully exploit the hardware, because knowing in advance the number of threads used by a single instance is not easy, and a concurrent setup with much more threads than the max machine affinity would produce a massive context switch that would kill performance. Tuning the number of concurrent simulation is very important.

In this section I report what I believe could be the direction worth trying.

As I described above, a multithread setup with ODE as contact detector is not suitable. Similarly, a multi-world simulation (an advanced feature of Ignition Gazebo) is not suitable as well because, similarly, it is implemented with multithreading. So:

The most obvious choice if you want to keep ODE in the stack, is switching to multiple processes. Processes introduce an overhead over threads but considering the weight of RL computations for robot learning, the overhead can be neglected. Just to be clear, with overhead I mean process creation overhead and the need to use IPC for synchronization. Python has a great multiprocessing module that makes code look like multithreading. This is a great solution to bypass the GIL while maintaining the easiness to prototype in Python.
This month Set collision detector and solver from SDF gazebosim/gz-sim#684 has been merged upstream, targeting Edifice (our current nightly branch). It allows DART to use its bullet collision detector instead of ODE. I suppose that ScenarIO already supports it being a feature that can be enabled in the SDF. Of course having programmatic APIs would simplify the switch, but for testing purposes, SDF is good enough. Note that the simulation is affected by the detector, and selecting which one is more accurate is not trivial. However, this seems very promising to me in order to solve your problem if you want to keep using threads.

To conclude this section, you mentioned #340. This works for simple enough simulations, however for my personal research (legged locomotion) it is not yet ready. The bullet implementation (gazebosim/gz-physics#44) is still not even with DART and some important features are still missing. You could try exploring this direction, but if you need contact rich scenarios, I suspect that you'll quickly get stuck as well.

Finally, I want to quickly describe our setup. We had to go towards multiprocessing since the bullet contact detector is a very new feature. After an extensive investigation, we found out that the parallelization of environments is often done by the RL frameworks. Since ScenarIO and gym-ignition aim is developing environments, the most important thing was to make them compatible with most of the architectures implemented by the RL frameworks (e.g. MPI, multiprocessing, ...). This is the current status.

We selected ray/rllib for our experiments. The project is very active and it has a huge community. They implemented massively parallel architectures that suite all the needs. Also, the algorithms selection and their configuration is very complete. It allows a very large horizontal scaling if you have enough hardware resources, and it also supports creating clusters of multiple machines (even though in my experience this is not often necessary). This is to say that if you carefully pick a proper RL framework that is compatible with gym.Env environments, you could avoid implementing concurrency on your own.

This became longer than I originally thought, but I think a proper explanation was necessary also to clarify this argument that is likely paramount for many users of this project.

10 replies

nicholaspalomo Jul 1, 2021
Author

(Also, for anyone else reading this, don't forget to make sure your IGN_GAZEBO_SYSTEM_PLUGIN_PATH is set correctly, i.e. it might be helpful to put:

export IGN_GAZEBO_SYSTEM_PLUGIN_PATH=$IGN_GAZEBO_SYSTEM_PLUGIN_PATH:/usr/local/lib/scenario/plugins/

in your .bashrc)

nicholaspalomo Jul 1, 2021
Author

@diegoferigo

Update:

I updated my source installation of Edifice (i.e. cd ~/workspace/src, vcs pull, and then rebuilding with colcon).
Updated the devel branch of my fork of gym-ignition to be level with yours in this repo.
Rebuilt the code for the ScenarIO C++ according to the instructions given in this repo (made sure to first source my colcon workspace where I built/installed Edifice).
Rebuilt my custom gym code which depends on ScenarIO.
Launched my training with 2 environments executed on different threads. For the world file, I used the one from Set collision detector and solver from SDF gazebosim/gz-sim#684, i.e.:

<?xml version="1.0" ?>
<!--
  Demo using custom physics options
-->
<sdf version="1.8">
  <world name="shapes">
    <physics name="1ms" type="ignored">
      <max_step_size>0.001</max_step_size>
      <real_time_factor>1.0</real_time_factor>
      <dart>
        <collision_detector>bullet</collision_detector>
        <solver>
          <solver_type>pgs</solver_type>
        </solver>
      </dart>
    </physics>

    <light type="directional" name="sun">
      <cast_shadows>true</cast_shadows>
      <pose>0 0 10 0 0 0</pose>
      <diffuse>0.8 0.8 0.8 1</diffuse>
      <specular>0.2 0.2 0.2 1</specular>
      <attenuation>
        <range>1000</range>
        <constant>0.9</constant>
        <linear>0.01</linear>
        <quadratic>0.001</quadratic>
      </attenuation>
      <direction>-0.5 0.1 -0.9</direction>
    </light>
  </world>
</sdf>

Now I have a new error:

double free or corruption (fasttop)
Segmentation fault (core dumped)

and the simulation crashes.
So perhaps we can say a new/different error is an improvement here. :)

diegoferigo Jul 1, 2021
Maintainer

I just tried switching the collision detector to bullet in the panda manipulation example (single world, single thread) and it seems running fine, even if much slower wrt ODE. Then, I developed a new test that simulates two parallel worlds with a bouncing ball enabling the bullet collision detector (both with pgs and dantzig solvers), and also this setup seems running fine (#368).

I'm not sure that could cause the segfault in your case, I would not be surprised if this is not related to this feature. On the training setup for our simulations (big and powerful machine) I remember I saw recently that error when the simulator closes, maybe there's an error in memory management somewhere in the Ignition stack. At which stage of the simulation does that error occur in your setup?

nicholaspalomo Jul 1, 2021
Author

Hi, @diegoferigo,

You're definitely right - I was calling gazebo_->run(/*paused=*/true) in between integrating the environment (stepping the environment X number of steps and then recording the observation, I mean) and policy updates which, I understand now, is unnecessary. That may have been causing my problem. Once I removed that, the seg fault went away and the multithreading using bullet for collision detection and no mutex.lock()/unlock() around calls to gazebo_->run() works for multiple parallel environments (I'm running 8 parallel worlds on my machine at the moment). I also turned on visualizing contacts in the Gazebo GUI and indeed the collision detection is working. Thank you very much again for the discussion. It would seem that for now this resolves my issue. :)

diegoferigo Jul 2, 2021
Maintainer

Awesome! I'm glad it works now 🚀 As said, DART with the bullet collision detector is slower than DART with the ODE collision detector. Though, in your case the switch enables running parallel simulations, and it definitely better having slow(er) but concurrent simulations rather than just one fast(er) instance.

Again, for those that reach this point without reading all the previous details, which are a lot, this is necessary because @nicholaspalomo is working in a single-process multi-threaded setup, in which the ODE collision detector segfaults. A multi-process setup should work with any combination of physics engines and collision detectors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multithreading in Ignition Gazebo #363

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 10 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Multithreading in Ignition Gazebo #363

nicholaspalomo Jun 28, 2021

Replies: 2 comments · 10 replies

nicholaspalomo Jun 29, 2021 Author

diegoferigo Jun 29, 2021 Maintainer

nicholaspalomo Jul 1, 2021 Author

nicholaspalomo Jul 1, 2021 Author

diegoferigo Jul 1, 2021 Maintainer

nicholaspalomo Jul 1, 2021 Author

diegoferigo Jul 2, 2021 Maintainer

nicholaspalomo
Jun 28, 2021

Replies: 2 comments 10 replies

nicholaspalomo
Jun 29, 2021
Author

diegoferigo
Jun 29, 2021
Maintainer

nicholaspalomo Jul 1, 2021
Author

nicholaspalomo Jul 1, 2021
Author

diegoferigo Jul 1, 2021
Maintainer

nicholaspalomo Jul 1, 2021
Author

diegoferigo Jul 2, 2021
Maintainer