16-833 - Robot Localization and Mapping | Selected Projects

Particle Filter Based Robot Localization

In this assignment, the end goal was to implement a particle filter for a lost robot equipped with internal odometry and a laser range finder. The robot was known to be located within a given map (Wean Hall @ Carnegie Mellon University). The well-known Monte Carlo Localization approach was implemented. The major components of this approach were:

The Motion Model
Ray-Casting
The Sensor Model
Resampling after Correction Step

Motion Model

The motion model implemented was drawn from Algorithm 5.6 in Probabilistic Robotics by Sebastian Thrun. The algorithm is shown below. This motion model function takes as a control input a pair of poses, one from time t-1 and the other at time t expressed in the robot’s internal frame as indicated by the bar above the x and y terms. The second argument to the function is the robot’s estimated location at time t-1 expressed in the world coordinate system.

The inputs pair of poses in the robot’s coordinate frame was first decomposed into three motions:

Rotation to align the robot heading towards the second pose
Translation to the second pose
Rotation to the final heading at the second pose

Noise sampled from a zero mean gaussian distribution was also added to these decomposed motions to represent the final terms that perform the motion model update based on the odometry input. The noise terms were also multiplied by tunable coefficients that allowed differing uncertainties in translation and rotation to be captured accurately.

Ray-Casting

The ray-casting implementation was based on the Digital Difference Analyzer method. It is essentially a line-drawing algorithm used to represent lines within a grid. In a nutshell, the algorithm works by propagating a straight line originating at the location of the robot and oriented along each of the angular increments specified by the resolution of the laser range finder. The intersections of each of these straight lines with grid cells are recorded and the occupancy of each intersecting grid cell is checked. The first cell that reports occupied along a particular ray is used to compute the distance measurement. The propagation within the occupancy grid and intersections are shown below.

Ray Propagation using DDA. The Cyan Points Represent Intersection with Y-Increment Lines on The Grid and the Red Points Represent intersection with X-increment Lines.

The ray-cast function was then tested in a circular and square room and the measured distances matched the actual distances up to the distance per pixel resolution of the occupancy grid.

Sensor Model (Likelihood Estimation)

Once the measured distances along each ray were obtained for a single scan, a likelihood estimate was assigned to each measurement. This was done by combining the likelihoods of a single measurement using four different distributions which were:

Gaussian Distribution – This distribution assigns a high likelihood to a measurement that matches closely with the ray-cast
Exponential Distribution – This distribution assigns the highest likelihood to zero measurements and gradually decays till the ray-cast measurement
Uniform Distribution Max – This distribution assigns the highest likelihood to a small interval around the maximum possible measurement
Uniform Distribution Random – This distribution assigns a low uniform likelihood to the entire range of measurements

The actual likelihood is a linear combination of the individual likelihoods computed using the above distributions. During robot motion, there are many situations where the measurement may not match the ray-cast but still emanate from a valid pose. Some of these cases may be:

The robot may be at the correct estimated location but there may be dynamic obstacles in the way leading to shorter than expected measurements. This scenario is accounted for by the exponential distribution.
The robot may be at the correct estimated location, but its orientation might be off by a slight margin due to which a beam may say propagate past a wall edge instead of striking it leading to a larger than expected measurement. Often this leads to a range measurement that is the maximum of the range. This scenario is accounted for by the uniform distribution for maximum measurements.
Finally, the “likelihood” that the robot is anywhere on the map is non-zero, any position on the map could technically be the correct one since all we are doing is obtaining an estimate. This is accounted for by the uniform distribution throughout the entire range.

The net distribution using the densities above lead to something that resembles the image below:

Distribution Image from Probabilistic Robotics by Sebastian Thrun

Resampling after Correction | Low Variance Sampling

The final piece of the puzzle was the low variance sampler implemented from table 4.4 from Probabilistic Robotics by Sebastian Thrun which is shown below:

The goal of this algorithm is to sample from the existing particles such that at the end, we are left with a distribution that has more emphasis on those particles that had higher weights from the previous iteration. This algorithm achieves this by repeatedly picking samples with a higher weight while rarely picking or sometimes entirely skipping particles with lower weights. To see why this works, consider the diagram drawn below for clarity. The following figure was created based on an explanation by user Robert Sutton on robotics stack exchange.

Diagram based on Explanation by Stack Exchange User Robert Sutton

To begin with, let all the particles be represented along the perimeter of a circle with lengths corresponding to their weights. So w₄ with the lowest weight is the shortest while w₂ with the highest weights is the longest. Let’s say we start with r=0.1 in the above algorithm. In that case, U₀ is initialized with 0.1 as well and C₀ = w₁. Since U₀ is less than C₀, x₁ (the state corresponding to the weight w₁) is added to the new sample set. Subsequently U₀ is updated to U₁. Now U1 is greater than C₀, so the counter i and the cumulative weight C is continuously incremented. C₀ now becomes C₁ and the internal loop condition is broken again so x₂ corresponding to w₂ is added to the sample set. U₁ is now incremented to U₂ and it is still lower than C₁, so x₂ is sampled again. This is what we want since x₂ has the highest weight so we want its count to represent its importance. U₂ is now incremented to U₃ which is higher than C₁, so i and C are incremented again to get C₂. Now x₃ is sampled and U is incremented once again to U₄. This time, even after C₂ is incremented to C₃ and C₄ it is still lower than U₄. It is only at C₅ that U₄ is less than C which causes x₆ to get added to the sample set. Here, the algorithm skipped over x₄ and x₅ which is exactly what we want since they are not important. This goes on until U is just shy of 1. The table above the circle shows the change in variables and the corresponding additions to the sample set.

Performance

The ray-casting operation was the most computationally expensive. The DDA algorithm is extremely precise down to the resolution of the pixel map, however this maybe undesirable given the impact on computation. Perhaps a different algorithm like Bresenham’s line algorithm could be used instead despite the lower accuracy.

Some of the trail runs of the algorithm on a dataset captured by a robot in a corridor in Wean Hall is shown below.

The initial random distribution was randomly spread all across the free space of the corridors. Over time with additional measurements coming in from the sensors the estimate is gradually improved and the low variance sampler gets rid of poor representations of the state.

Extended Kalman Filter based Simultaneous Localization and Mapping

The goal in this assignment was to implement a localization system using the extended Kalman filter. To this effect, the provided robot had its state represented by a vector p_t = [x_t, y_t, θ_t] at a time instant t. The control inputs to this model was given by a translation d_t and a subsequent rotation α_t at time t. The robot was also equipped with a laser sensor that detects landmarks and returns the range and heading angle from the pose of the robot to the landmark. The following relations were derived in order to construct the SLAM system. In particular:

The motion model that takes the translation and rotation to give the updated state.
The projection of the noise from the previous state to the next by linearising the non-linear motion model and adding control noise.
Transformation of range and heading laser measurement to global landmark locations including projection of noise terms.
Inverse transformation from landmark locations to laser range and heading measurements.
Noise projection Jacobian of previous transformation with respect to robot pose and landmark location.

The available input data were the series of beam range and heading measurements as well as control inputs d and α for a series of time steps. The robot was initialized at the origin with a zero covariance. The system state vector included the pose of the robot as well as all the estimated locations of the landmarks. For a given control input, the pose of the robot in the state vector was updated and the control noise was added to the covariance matrix of the updated pose along with the noise projection from the previous state. This was the prediction step. Following this, the received bearing and range measurement was used to compute the measurement Jacobian and the gain matrix in the kalman filter. This gain was then used to update the the state vector based on the difference in the expected and actual measurements. The gain matrix and measurement Jacobian was used to project the noise terms into the next update.

Trajectory Tracking along a Hexagonal Path

In the figure above, the blue marker indicates the position of the robot while tracing the trajectory along this hexagonal path. The magenta ellipses indicate the uncertainty in the robot’s position after the prediction step. The blue ellipse indicates the uncertainty in the robot’s position after the update has been applied using the landmark measurements. The final landmark ground truth locations have also been plotted on the trajectory and are shown located in the centers of the red ellipses.

Solver Implementation for 2D SLAM

The goal of this assignment was to implement and test the performance of several different 2D linear SLAM solvers. The dataset that was provided included odometry and landmark measurements for state estimation. These measurements were used to construct a linear system of the form Ax=b, where A was a sparse non-square matrix.

The following types of solvers were considered for implementation:

Pseudo-Inverse
LU-Factorization
QR-Factorization
LU Factorization with COLAMD Re-ordering
QR Factorization with COLAMD Re-ordering

Each of these solvers were used to determine the state estimate and describe the performance. The linear trajectory ground truth and the state estimate solved using pseudo-inverse is shown below:

The Pseudo-Inverse solution approach took about 1.9 to 2 seconds to solve on average. This is because matrix inversion is a computationally intensive task to perform. The factored matrix visualization using LU-Factorization and the LU-Factorization with COLAMD re-ordering is shown below.

The LU-Factorization approach takes about 0.06 seconds to run while the solver with COLAMD reordering takes about 0.08 seconds to run. The solution time is closely tied to the gain if any from re-ordering. If the time takes for matrix re-ordering was substantial without a large gain in sparsity of the matrix, then the apparent advantage of exploiting sparsity gets outweighed by the time taken for re-ordering columns.

The factor visualizations using QR-factorization and QR-factorization with COLAMD are shown below:

In the case of the QR solver, the version without re-ordering took about 0.35 seconds to solve while the version with COLAMD re-ordering took about 0.27 seconds to solve on average. This implies that the solution time outweighed the time to re-order the matrix columns.

The solved trajectories are identical to that from the pseudo-inverse solution so they are not all repeated here. A separate looped trajectory was also solved using the linear system. Both trajectories are shown below:

The same problem was then re-formulated to include a non-linear bearing angle component whose measurement Jacobian was computed. In the case of the non-linear solution, a linearized system of equations had to be constructed. The linearization point was chosen to be the initial trajectory obtained by simply using the state update through odometry and landmark measurements. The figures below shows the solution approach for the non-linear system. On the left is the image of the trajectory after initialization but before optimization. The figure on the right shows the trajectory after optimization. The solver used was the QR factorization with COLAMD reordering.

**Initial Trajectory before Optimization**

Dense SLAM with Point based Fusion

This section of the assignment involved implementing a dense SLAM system. The system was comprised of two major components, an Iterative Closest Point (ICP) based algorithm for camera localization and a point based fusion system for dense mapping. The dense map was stored in a data structure which contained the 3D points, normals, colors and weights corresponding to each “pixel” from the input frame. The input source frames were fed from a data-set of RGB-D images.

Iterative Closest Point

This subsystem returns the optimal transformation between two point sets, the source and target point maps by minimizing a loss function. The algorithm computes the transformation by:

Projective Point Correspondence – Determining corresponding points between two maps
Linear System Set Up – Parameterize transformation between two frames
Solution of Linear System

Projective Point Correspondence

In order to obtain the transformation between two frames, corresopndences need to be established between image coordinates in the two frames that represent the same point in the world. On of the simplest ways to implement this is by using the projective point correspondence method. In this method, given a known transformation between the source and target frames and the image coordinates from the source frame, we can transform the point (u1, v1) in the source frame into the coordinates (u2, v2) in the target frame. If the transformation is exact, then the image coordinates (u1, v1) and the image coordinates (u2, v2) represent the same world point in 3d. However, since projective correspondences are initially obtained with an initial “guess” transformation, usually an identity transformation and then optimized, there can be many erroneous projections. These corresponding transformations are then filtered based on a distance threshold, and the image normals at the points on the source and target frames are filtered based on angle thresholds. A basic assumption here is that for a small transformation between frames, the actual image coordinate locations in the next frame should not have moved more than a certain euclidean distance from the previous image coordinates. Moreover, the normals at those points shouldn’t deviate more than a certain angle as well.

Linearization of the System

The loss function used for optimization is the point-to-plane error metric. This is a popular choice where not just the euclidean distance but the normal information at the target point is also taken into account. The point to point error metric is the alternative where the error distance between two points give the loss function. In the case of the point-to-plane metric, the euclidean distance between the two points are then projected onto the normal of the target point. This projected distance is what is being optimized. The point to plane error metric and its geometric interpretation are shown below.

**Geometric Interpretation of Point to Plane Error Metric**

Since it is hard to directly solve for R, we make the small angle assumption and parametrize the rotation matrix and translation matrix as below.

**Small Angle Parametrization of Rotation and Translation Matrices**

With the given parametrization, the point-to-plane error metric can be re-written as shown below and then re-arranged into a linear system.

**Error Metric Re-written as an Incremental Update**

**Linear System from Previous Equation**

System Solution

The above system was solved using a QR formulation. The results of optimization on two nearby disparate point clouds are shown in the following image.

**Top Row shows Green and Red Point Clouds from Different Frames. Bottom Row shows Point Clouds after Transform Optimization.**

For the solution to converge a few pre-conditions need to be satisfied

The distance threshold needs to retain points that are actually corresponding points. If the frames are very different, a small distance threshold will not allow convergence to the correct solution. For very different frames the distance threshold must be expanded to allow for more valid points.
The initial guess needs to be close to the vicinity of the optimal solution/transformation. If the frames are not very different an identity transformation is sufficient for convergence.

Point Based Fusion

The next part of the assignment is the point based fusion system. The function of this subsystem is the maintenance and propagation of the saved dense map of the environment. The system was implemented using a Map class containing the world point co-ordinates, normals, color channels and weights of corresponding points.

Once again, points were filtered first from the depth images by removing points outside image plane and depth bounds and then removing those point correspondences that were outside specified distance and angle thresholds with the points specified in the map. After filtering, the input frame points having correspondences in the maintained map were merged, the remaining points were assumed to be new points and simply added to the maintained map.

In the merge step, higher weights are given to those points closest to the camera frame. It has been observed that these points are captured more accurately compared to those further away from the frame. An exponential weighting scheme was chosen with the points at the origin of the camera z-axis weighted with 1 and those further away with weight less than 1. The merging step involved a weighted average of the point correspondences, the normals and the color channels along with a normalization of the weights. In the add step, new points, associated normals and color channels were added to the maintained map.

The results of the Point Based Fusion system are shown below.

**Dense Map after Point Based Fusion (left) and Associated Normals (right)**

Tracking Results

The results of tracking using this SLAM implementation compared to the provided ground truth are shown below.