Skip to content

A collection of optimized ray cast methods for 2D occupancy grids including the CDDT algorithm. Written in C++ and CUDA with Python wrappers.

License

Notifications You must be signed in to change notification settings

droemer7/range_libc

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RangeLibc

Foreword

This is a fork of the abandoned range_libc repo that has been converted to a ROS package to better align with my use case. Some corrections to the range_libc code base were necessary to make this work with ROS C++, and some additional refactoring was done to clean up the implementation. Since making these changes, the CDDT implementation has been checked for correctness in C++ and Python 3, and Bresenham's Line and Ray Marching have been similarly checked in Python 3. The Ray Marching GPU implementation has not been tested in any form.

Overview

This library provides for different implementations of 2D raycasting for 2D occupancy grids, including the Compressed Directional Distance Transform (CDDT) algorithm as proposed in this publication. The code is written and optimized in C++, and Python wrappers are also provided.

Building the Code

This fork has been tested on Ubuntu 18.04 and 20.04 using gcc with C++14. The original author has tested it on Ubuntu 14.04, OSX 10.10, and Ubuntu 16.06, compiling with C++11.

C++ code

Again, it is assumed you are using ROS. If not, you will need to pull the headers and modify the CMakeLists.txt to suit your needs.

roscd
cd ../src
git clone https://github.com/droemer7/range_libc

Include range_libc/RangeLib.h in your code to use the raycasters, then invoke catkin_make to build. For optimization and assuming you are not distributing binaries, use -DCMAKE_CXX_FLAGS=-march=native -O3. The original author of range_libc also used -ffast-math and -fno-math-errno but I have not seen any improvement with these.

Python Wrappers

This fork has updated the original repo to compile on systems that do not have python 2 (such as Ubuntu 20.04). If you have both versions and you want to build the wrappers for a specific version of python, replace python below with the version you want to use (e.g. python3).

You will also need to install Cython if you do not already have it on your system.

# clone the repository
git clone https://github.com/droemer7/range_libc
cd range_libc_dist/pywrapper
# for an in place build, do this:
python setup.py build_ext --inplace
# for a system wide install, do this:
python setup.py install
# to compile with the GPU kernels, do this:
WITH_CUDA=ON python setup.py install
# this should take a few seconds to run
python test.py

License

This code is licensed under Apache 2.0. Copyright 2017 Corey H. Walsh.

You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

Enjoy!

Cite

This library accompanies the following publication.

@article{walsh17,
    author = {Corey Walsh and 
              Sertac Karaman},
    title  = {CDDT: Fast Approximate 2D Ray Casting for Accelerated Localization},
    volume = {abs/1705.01167},
    url    = {http://arxiv.org/abs/1705.01167},
    year   = {2017}}

Code structure

range_libc_dist/
├── build
│   └── bin          # this is where compiled binaries will be placed
├── CMakeLists.txt   # compilation rules - includes, etc
├── includes
│   ├── lru_cache.h  # implementation of LRU_cache, optionally used
│   ├── RangeLib.h   # main RangeLib source code
│   ├── CudaRangeLib.h # cuda function header file
│   ├── kernels.cu   # cuda kernels for super fast 2D ray casting
│   └── RangeUtils.h # various utility functions
├── license.txt
├── main.cpp         # example c++ usage and simple benchmarks
├── make_plots.py    # turns fine-grained benchmark information into violin plots
├── tmp/             # make this directory for saving fine-grained timing information
├── maps             # example PNG maps
│   └── [various .png files]
├── pywrapper
│   ├── RangeLibc.pyx # wrapper file for using RangeLib from Python
│   ├── setup.py     # compilation rules for Cython
│   └── test.py      # example Python usage
├── README.md
└── vendor           # various dependencies, see in here for licenses
    ├── distance_transform.h # for computing euclidean distance transform
    ├── gflags       # command line flag library from Google
    └── lodepng      # simple PNG loading/saving library

RangeLibc Algorithms Overview

Range Method Performance Comparison

The above benchmarks were performed on an NVIDIA Jetson TX1 with a single thread. For a better treatment, see the paper associated with this library: http://arxiv.org/abs/1705.01167

Bresenham's Line (BL)

Bresenham's line algorithm [1] is one of the most widely used methods for two dimensional ray casting in occupancy grids. The algorithm incrementally determines the set of pixels that approximate the trajectory of a query ray starting from the query point (x,y){query} and progressing in the theta{query} direction one pixel at a time. The algorithm terminates once the nearest occupied pixel is discovered, and the euclidean distance between that occupied pixel and (x,y)_{query} is reported. This algorithm is widely implemented in particle filters due to its simplicity and ability to operate on a dynamic map. The primary disadvantage is that it is slow, potentially requiring hundreds of memory accesses for a single ray cast. While average performance is highly environment dependent, Bresenham's Line algorithm is linear in map size in the worst case.

Ray Marching (RM/RMGPU)

Ray marching [2] is a well known algorithm, frequently used to accelerate fractal or volumetric graphical rendering applications. The basic idea can be understood very intuitively. Imagine that you are in an unknown environment, with a blindfold on. If an oracle tells you the distance to the nearest obstacle, you can surely move in any direction by at most that distance without colliding with any obstacle. By applying this concept recursively, one can step along a particular ray by the minimum distance to the nearest obstacle until colliding with some obstacle. The following figure demonstrates this idea graphically (from [3]).

Ray Marching

In the occupancy grid world, it is possible to precompute the distance to the nearest obstacle for every discrete state in the grid via the euclidean distance transform.

This method is implemented both for the CPU and GPU in RangeLibc. The GPU implementation is the fastest available method for large batches of queries.

Pseudocode:

# compute the distance transform of the map
def precomputation(omap):
	distance_transform = euclidean_dt(omap)

# step along the (x,y,theta) ray until colliding with an obstacle
def calc_range(x,y,theta):
	t = 0.0
	coeff = 0.99
	while t < max_range:
		px, py = x + cos(theta) * t, y + sin(theta) * t

		if px or py out of map bounds:
			return max_range

		dist = distance_transform[px,py]
		if dist == 0.0:
			return sqrt((x - px)^2 + (y - py)^2)

		t += max(dist*coeff, 1.0)

Analysis

Precomputation: O(|theta_discretization|*|edge pixels in occupancy grid|+|theta_discretization|*|occupied pixels|*log(|occupied pixels|)) Pruning: O(|map width|*|map height|*|theta_discretization|*log(min(|occupied pixels|, longest map dimension))) Calc range: O(log(min(|occupied pixels|, longest map dimension))) Memory: O(|theta_discretization|*|edge pixels|) - in practice much smaller, due to pruning

Pros

  • Fast calc_range on average
  • Space efficent
  • Easy to implement, and easy to implement on a GPU
  • Fairly fast to compute distance transform
  • Extends easily to 3D

Cons

  • Poor worst case performance - degenerate case similar to Bresenham's line
  • High degree of control flow divergence (for parallelization)
  • Not great for incrementally changing the map

Compressed Directional Distance Transform (CDDT/PCDDT) (ours)

The Compressed Directional Distance Transform (CDDT) algorithm uses a compressed data structure to represent map geometry in a way which allows for fast queries. An optional pruning step removes unneeded elements in the data structure for slightly faster operation (PCDDT). For a full description of the algorithm, see the associated paper: http://arxiv.org/abs/1705.01167

Pseudocode:

# for the given theta, determine a translation that will ensure the 
# y coordinate of every pixel in the rotated map will be positive
def y_offset(theta):
	pass

# give the range of y coordinates that the pixel overlaps with
def y_bounds(pixel):
	return range(min(pixel.corners.y), max(pixel.corners.y))

# build the CDDT datastructure
def precomputation(omap):
	# prune any unimportant geometry from the map
	edgeMap = morphological_edge_transform(omap)

	# build the empty LUT data structure
	compressed_lut = []
	for each theta in |theta_discretization|:
		projection_lut = []
		for each i in range(lut_widths[theta]):
			projection_lut.append([])
		compressed_lut.append(projection_lut)

	# populate the LUT data structure
	for each theta in |theta_discretization|:
		for each occupied pixel (x,y) in omap:
			pixel.rotate(theta)
			pixel.translate(y_offset(theta))
			lut_indices = y_bounds(pixel)
			
			for each index in lut_indices:
				compressed_lut[theta][index].append(pixel.center.x)

	# sort each LUT bin for faster access via binary search
	for each theta in |theta_discretization|:
		for each i in compressed_lut[theta].size():
			sort(compressed_lut[theta][i])

# (optional) remove unused entries from the LUT to save space
# highly recommended for static maps
def prune():
	# build an empty table of sets to keep track of which
	# indices in the CDDT data structure are used
	collision_table = []
	for theta in range(theta_discretization):
		collision_row = []
		for i in range(compressed_lut[theta].size()):
			collision_row.append(set())
		collision_table.append(collision_row)

	# ray cast from every possible (x,y,theta) state, keeping track
	# of which LUT entries are used
	for x in range(omap.width):
		for y in range(omap.height):
			for theta in range(theta_discretization):
				# keep track of which object in the LUT is found to collide
				# with the following ray cast query
				calc_range(x,y,theta) implies (lut_bin, lut_bin_index)
				collision_table[theta][lut_bin].add(lut_bin_index)

	# remove any element of the LUT that is not in the collision table
	for theta in range(theta_discretization):
		for i in range(compressed_lut[theta].size()):
			new_lut_bin = []
			for obstacle in compressed_lut[theta][i]:
				if obstacle in collision_table:
					new_lut_bin.append(obstacle)
				else: continue
			compressed_lut[theta][i] = new_lut_bin

# compute the distance to the nearest obstacle in the (x,y,theta) direction
def calc_range(x,y,theta):
	angle_index, discrete_angle, flipped_search = discretize_theta(theta)
	lut_x, lut_y = rotate(x, y, discrete_angle)

	if omap.occupied(x,y):
		return 0.0

	lut_bin = compressed_lut[angle_index][(int)lut_y]
	if flipped_search:
		nearest_obstacle_x = lut_bin.next_lesser_element(lut_x)
	else:
		nearest_obstacle_x = lut_bin.next_greater_element(lut_x)
	
	distance = abs(nearest_obstacle_x - lut_x)
	return distance

Analysis

Precomputation: O(|width|*|height|) for 2D grid. In general O(dk) where d is the dimensionality of the grid, and k is the number of grid locations. Calc range: worst case O(|longest map dimension|), on average much faster (close to logarithmic performance in scene size) Memory: O(|width|*|height|) for 2D grid. In general O(k) where k is the number of grid locations.

Pros

  • Fast calc_range, in practice nearly constant time
  • Radial symmetry optimizations can provide additional speed in the right context
  • Potential for online incremental compressed LUT modification for use in SLAM (would need to store additional metadata)
  • Space efficent
  • Fast construction time

Cons

  • Slow pruning time (optional)
  • Approximate due to the discrete theta space
  • Can be difficult to implement well
  • Curse of dimensionality in higher dimensions

Giant Lookup Table (GLT)

Precompute distances for all possible (x,y,theta) states and store the results in a big table for fast lookup.

Pseudocode:

# For every (x,y,theta) in a predefined grid, use Besenham's line or 
# ray maching to build the table
def precomputation(omap):
	giant_LUT[width][height][theta_discretization] = -1
	for x in range(omap.width):
		for y in range(omap.height):
			for theta in range(theta_discretization):
				giant_LUT[x][y][theta] = calc_range(x,y,theta)

# simply read from the table
# note: interpolation between the two closest discrete 
#       thetas would be more accurate but slower
def calc_range(x,y,theta):
	return giant_LUT[int(x), int(y), discrete(theta)]

Analysis

Precomputation: O(|theta_discretization|*|width|*|height|*O(calc_range)) Memory: O(|theta_discretization|*|width|*|height|) Calc range: O(1)

Pros

  • Very fast calc_range
  • Easy to implement

Cons

  • Very slow construction time
  • Approximate due to the discrete theta space
  • Very large memory requirement
  • Curse of dimensionality in higher dimensions

References

  1. J. Bresenham. "Algorithm for Computer Control of a Digital Plotter," IBM Systems Journal, vol. 4, no. 1, pp. 25-30, 1965.
  2. K. Perlin and E. M. Hoffert. "Hypertexture," Computer Graphics, vol 23, no. 3, pp. 297-306, 1989.
  3. M. Pharr, and R. Fernando. "Chapter 8. Per-Pixel Displacement Mapping with Distance Functions" in GPU gems 2: Programming techniques for high-performance graphics and general-purpose computation, 3rd ed. United States: Addison-Wesley Educational Publishers, 2005.

About

A collection of optimized ray cast methods for 2D occupancy grids including the CDDT algorithm. Written in C++ and CUDA with Python wrappers.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 66.3%
  • Cython 15.1%
  • Cuda 9.2%
  • Python 8.7%
  • Other 0.7%