Particle Filter Based Robot Localization
In this assignment, the end goal was to implement a particle filter for a lost robot equipped with internal odometry and a laser range finder. The robot was known to be located within a given map (Wean Hall @ Carnegie Mellon University). The well-known Monte Carlo Localization approach was implemented. The major components of this approach were:
- The Motion Model
- Ray-Casting
- The Sensor Model
- Resampling after Correction Step
Motion Model
The motion model implemented was drawn from Algorithm 5.6 in Probabilistic Robotics by Sebastian Thrun. The algorithm is shown below. This motion model function takes as a control input a pair of poses, one from time t-1 and the other at time t expressed in the robot’s internal frame as indicated by the bar above the x and y terms. The second argument to the function is the robot’s estimated location at time t-1 expressed in the world coordinate system.
The inputs pair of poses in the robot’s coordinate frame was first decomposed into three motions:
- Rotation to align the robot heading towards the second pose
- Translation to the second pose
- Rotation to the final heading at the second pose
Noise sampled from a zero mean gaussian distribution was also added to these decomposed motions to represent the final terms that perform the motion model update based on the odometry input. The noise terms were also multiplied by tunable coefficients that allowed differing uncertainties in translation and rotation to be captured accurately.
Ray-Casting
The ray-casting implementation was based on the Digital Difference Analyzer method. It is essentially a line-drawing algorithm used to represent lines within a grid. In a nutshell, the algorithm works by propagating a straight line originating at the location of the robot and oriented along each of the angular increments specified by the resolution of the laser range finder. The intersections of each of these straight lines with grid cells are recorded and the occupancy of each intersecting grid cell is checked. The first cell that reports occupied along a particular ray is used to compute the distance measurement. The propagation within the occupancy grid and intersections are shown below.
The ray-cast function was then tested in a circular and square room and the measured distances matched the actual distances up to the distance per pixel resolution of the occupancy grid.
Sensor Model (Likelihood Estimation)
Once the measured distances along each ray were obtained for a single scan, a likelihood estimate was assigned to each measurement. This was done by combining the likelihoods of a single measurement using four different distributions which were:
- Gaussian Distribution – This distribution assigns a high likelihood to a measurement that matches closely with the ray-cast
- Exponential Distribution – This distribution assigns the highest likelihood to zero measurements and gradually decays till the ray-cast measurement
- Uniform Distribution Max – This distribution assigns the highest likelihood to a small interval around the maximum possible measurement
- Uniform Distribution Random – This distribution assigns a low uniform likelihood to the entire range of measurements
The actual likelihood is a linear combination of the individual likelihoods computed using the above distributions. During robot motion, there are many situations where the measurement may not match the ray-cast but still emanate from a valid pose. Some of these cases may be:
- The robot may be at the correct estimated location but there may be dynamic obstacles in the way leading to shorter than expected measurements. This scenario is accounted for by the exponential distribution.
- The robot may be at the correct estimated location, but its orientation might be off by a slight margin due to which a beam may say propagate past a wall edge instead of striking it leading to a larger than expected measurement. Often this leads to a range measurement that is the maximum of the range. This scenario is accounted for by the uniform distribution for maximum measurements.
- Finally, the “likelihood” that the robot is anywhere on the map is non-zero, any position on the map could technically be the correct one since all we are doing is obtaining an estimate. This is accounted for by the uniform distribution throughout the entire range.
The net distribution using the densities above lead to something that resembles the image below:
Resampling after Correction | Low Variance Sampling
The final piece of the puzzle was the low variance sampler implemented from table 4.4 from Probabilistic Robotics by Sebastian Thrun which is shown below:
The goal of this algorithm is to sample from the existing particles such that at the end, we are left with a distribution that has more emphasis on those particles that had higher weights from the previous iteration. This algorithm achieves this by repeatedly picking samples with a higher weight while rarely picking or sometimes entirely skipping particles with lower weights. To see why this works, consider the diagram drawn below for clarity. The following figure was created based on an explanation by user Robert Sutton on robotics stack exchange.
To begin with, let all the particles be represented along the perimeter of a circle with lengths corresponding to their weights. So w4 with the lowest weight is the shortest while w2 with the highest weights is the longest. Let’s say we start with r=0.1 in the above algorithm. In that case, U0 is initialized with 0.1 as well and C0 = w1. Since U0 is less than C0, x1 (the state corresponding to the weight w1) is added to the new sample set. Subsequently U0 is updated to U1. Now U1 is greater than C0, so the counter i and the cumulative weight C is continuously incremented. C0 now becomes C1 and the internal loop condition is broken again so x2 corresponding to w2 is added to the sample set. U1 is now incremented to U2 and it is still lower than C1, so x2 is sampled again. This is what we want since x2 has the highest weight so we want its count to represent its importance. U2 is now incremented to U3 which is higher than C1, so i and C are incremented again to get C2. Now x3 is sampled and U is incremented once again to U4. This time, even after C2 is incremented to C3 and C4 it is still lower than U4. It is only at C5 that U4 is less than C which causes x6 to get added to the sample set. Here, the algorithm skipped over x4 and x5 which is exactly what we want since they are not important. This goes on until U is just shy of 1. The table above the circle shows the change in variables and the corresponding additions to the sample set.
Performance
The ray-casting operation was the most computationally expensive. The DDA algorithm is extremely precise down to the resolution of the pixel map, however this maybe undesirable given the impact on computation. Perhaps a different algorithm like Bresenham’s line algorithm could be used instead despite the lower accuracy.
Some of the trail runs of the algorithm on a dataset captured by a robot in a corridor in Wean Hall is shown below.
The initial random distribution was randomly spread all across the free space of the corridors. Over time with additional measurements coming in from the sensors the estimate is gradually improved and the low variance sampler gets rid of poor representations of the state.
Extended Kalman Filter based Simultaneous Localization and Mapping
The goal in this assignment was to implement a localization system using the extended Kalman filter. To this effect, the provided robot had its state represented by a vector pt = [xt, yt, θt] at a time instant t. The control inputs to this model was given by a translation dt and a subsequent rotation αt at time t. The robot was also equipped with a laser sensor that detects landmarks and returns the range and heading angle from the pose of the robot to the landmark. The following relations were derived in order to construct the SLAM system. In particular:
- The motion model that takes the translation and rotation to give the updated state.
- The projection of the noise from the previous state to the next by linearising the non-linear motion model and adding control noise.
- Transformation of range and heading laser measurement to global landmark locations including projection of noise terms.
- Inverse transformation from landmark locations to laser range and heading measurements.
- Noise projection Jacobian of previous transformation with respect to robot pose and landmark location.
The available input data were the series of beam range and heading measurements as well as control inputs d and α for a series of time steps. The robot was initialized at the origin with a zero covariance. The system state vector included the pose of the robot as well as all the estimated locations of the landmarks. For a given control input, the pose of the robot in the state vector was updated and the control noise was added to the covariance matrix of the updated pose along with the noise projection from the previous state. This was the prediction step. Following this, the received bearing and range measurement was used to compute the measurement Jacobian and the gain matrix in the kalman filter. This gain was then used to update the the state vector based on the difference in the expected and actual measurements. The gain matrix and measurement Jacobian was used to project the noise terms into the next update.
In the figure above, the blue marker indicates the position of the robot while tracing the trajectory along this hexagonal path. The magenta ellipses indicate the uncertainty in the robot’s position after the prediction step. The blue ellipse indicates the uncertainty in the robot’s position after the update has been applied using the landmark measurements. The final landmark ground truth locations have also been plotted on the trajectory and are shown located in the centers of the red ellipses.
Solver Implementation for 2D SLAM
The goal of this assignment was to implement and test the performance of several different 2D linear SLAM solvers. The dataset that was provided included odometry and landmark measurements for state estimation. These measurements were used to construct a linear system of the form Ax=b, where A was a sparse non-square matrix.
The following types of solvers were considered for implementation:
- Pseudo-Inverse
- LU-Factorization
- QR-Factorization
- LU Factorization with COLAMD Re-ordering
- QR Factorization with COLAMD Re-ordering
Each of these solvers were used to determine the state estimate and describe the performance. The linear trajectory ground truth and the state estimate solved using pseudo-inverse is shown below:
The Pseudo-Inverse solution approach took about 1.9 to 2 seconds to solve on average. This is because matrix inversion is a computationally intensive task to perform. The factored matrix visualization using LU-Factorization and the LU-Factorization with COLAMD re-ordering is shown below.
The LU-Factorization approach takes about 0.06 seconds to run while the solver with COLAMD reordering takes about 0.08 seconds to run. The solution time is closely tied to the gain if any from re-ordering. If the time takes for matrix re-ordering was substantial without a large gain in sparsity of the matrix, then the apparent advantage of exploiting sparsity gets outweighed by the time taken for re-ordering columns.
The factor visualizations using QR-factorization and QR-factorization with COLAMD are shown below:
In the case of the QR solver, the version without re-ordering took about 0.35 seconds to solve while the version with COLAMD re-ordering took about 0.27 seconds to solve on average. This implies that the solution time outweighed the time to re-order the matrix columns.
The solved trajectories are identical to that from the pseudo-inverse solution so they are not all repeated here. A separate looped trajectory was also solved using the linear system. Both trajectories are shown below:
The same problem was then re-formulated to include a non-linear bearing angle component whose measurement Jacobian was computed. In the case of the non-linear solution, a linearized system of equations had to be constructed. The linearization point was chosen to be the initial trajectory obtained by simply using the state update through odometry and landmark measurements. The figures below shows the solution approach for the non-linear system. On the left is the image of the trajectory after initialization but before optimization. The figure on the right shows the trajectory after optimization. The solver used was the QR factorization with COLAMD reordering.
Dense SLAM with Point based Fusion
This section of the assignment involved implementing a dense SLAM system. The system was comprised of two major components, an Iterative Closest Point (ICP) based algorithm for camera localization and a point based fusion system for dense mapping. The dense map was stored in a data structure which contained the 3D points, normals, colors and weights corresponding to each “pixel” from the input frame. The input source frames were fed from a data-set of RGB-D images.
Iterative Closest Point
This subsystem returns the optimal transformation between two point sets, the source and target point maps by minimizing a loss function. The algorithm computes the transformation by:
- Projective Point Correspondence – Determining corresponding points between two maps
- Linear System Set Up – Parameterize transformation between two frames
- Solution of Linear System
Projective Point Correspondence
In order to obtain the transformation between two frames, corresopndences need to be established between image coordinates in the two frames that represent the same point in the world. On of the simplest ways to implement this is by using the projective point correspondence method. In this method, given a known transformation between the source and target frames and the image coordinates from the source frame, we can transform the point (u1, v1) in the source frame into the coordinates (u2, v2) in the target frame. If the transformation is exact, then the image coordinates (u1, v1) and the image coordinates (u2, v2) represent the same world point in 3d. However, since projective correspondences are initially obtained with an initial “guess” transformation, usually an identity transformation and then optimized, there can be many erroneous projections. These corresponding transformations are then filtered based on a distance threshold, and the image normals at the points on the source and target frames are filtered based on angle thresholds. A basic assumption here is that for a small transformation between frames, the actual image coordinate locations in the next frame should not have moved more than a certain euclidean distance from the previous image coordinates. Moreover, the normals at those points shouldn’t deviate more than a certain angle as well.
Linearization of the System
The loss function used for optimization is the point-to-plane error metric. This is a popular choice where not just the euclidean distance but the normal information at the target point is also taken into account. The point to point error metric is the alternative where the error distance between two points give the loss function. In the case of the point-to-plane metric, the euclidean distance between the two points are then projected onto the normal of the target point. This projected distance is what is being optimized. The point to plane error metric and its geometric interpretation are shown below.
Since it is hard to directly solve for R, we make the small angle assumption and parametrize the rotation matrix and translation matrix as below.
With the given parametrization, the point-to-plane error metric can be re-written as shown below and then re-arranged into a linear system.
System Solution
The above system was solved using a QR formulation. The results of optimization on two nearby disparate point clouds are shown in the following image.
For the solution to converge a few pre-conditions need to be satisfied
- The distance threshold needs to retain points that are actually corresponding points. If the frames are very different, a small distance threshold will not allow convergence to the correct solution. For very different frames the distance threshold must be expanded to allow for more valid points.
- The initial guess needs to be close to the vicinity of the optimal solution/transformation. If the frames are not very different an identity transformation is sufficient for convergence.
Point Based Fusion
The next part of the assignment is the point based fusion system. The function of this subsystem is the maintenance and propagation of the saved dense map of the environment. The system was implemented using a Map class containing the world point co-ordinates, normals, color channels and weights of corresponding points.
Once again, points were filtered first from the depth images by removing points outside image plane and depth bounds and then removing those point correspondences that were outside specified distance and angle thresholds with the points specified in the map. After filtering, the input frame points having correspondences in the maintained map were merged, the remaining points were assumed to be new points and simply added to the maintained map.
In the merge step, higher weights are given to those points closest to the camera frame. It has been observed that these points are captured more accurately compared to those further away from the frame. An exponential weighting scheme was chosen with the points at the origin of the camera z-axis weighted with 1 and those further away with weight less than 1. The merging step involved a weighted average of the point correspondences, the normals and the color channels along with a normalization of the weights. In the add step, new points, associated normals and color channels were added to the maintained map.
The results of the Point Based Fusion system are shown below.
Tracking Results
The results of tracking using this SLAM implementation compared to the provided ground truth are shown below.