visit
We use sensor combinations to overcome this restriction. This intuitively seems like the right play; more sensors mean more information. The jump from one camera to two, for instance, unlocks binocular vision, or the ability to see behind as well as in front. Better yet, use three cameras to do both at once. Add in a LiDAR unit, and see farther. Add in active depth, and see with more fidelity. Tying together multiple data streams is so valuable this act of Sensor Fusion is a whole discipline in itself.
Yet this boon in information often makes vision-enabled systems harder to build, not easier. Binocular vision relies on stable intrinsic and extrinsic camera properties, which cameras don’t have. Depth sensors lose accuracy with distance. A sensor can fail entirely, like LiDAR on a foggy day.This means that effective sensor fusion involves constructing vision architecture in a way that minimizes uncertainty in uncertain conditions. Sensors aren’t perfect, and data can be noisy. It’s the job of the engineer to sort this out and derive assurances about what is actually true. This challenge is what makes sensor fusion so difficult: it takes competency in information theory, geometry, optimization, fault tolerance, and a whole mess of other things to get right.
So how do we start?Instead, what we can do is minimize our uncertainty. Through the beauty of mathematics, we can combine all of this knowledge and actually come out with a more certain idea of our state through time than if we used any one sensor or model.
This 👆 is the magic of Kalman filters.Warning: Math.There are two things that we can easily track about our car’s state: its position pₜ and velocity vₜ.
We can speed up our robot by punching the throttle, something we do frequently. We do this by exerting a force f on the RC car’s mass m, resulting in an acceleration a (see Newton’s Second Law of Motion).
With just this information, we can derive a model for how our car will act over a time period Δt using some classical physics:
We can simplify this for ourselves using some convenient matrix notation. Let’s put the values we can track, position pₜ and velocity vₜ, into a state vector:
…and let’s put out applied forces into a control vector that represents all the outside influences affecting our state:
However, we’re not exactly sure whether or not our state values are true to life; there’s uncertainty! Let’s make some assumptions about what this uncertainty might look like in our system:
We’re going to give this uncertainty model a special name: a probability density function (PDF). This represents how probable it is that certain states are the true state. Peaks in our function correspond to the states that have the highest probability of occurrence.
Fig. 1. Our first PDFOur state vector xₜ represents the mean μ of this PDF. To derive the rest of the function, we can model our state uncertainty using a Pₜ:
There are some interesting properties here in Pₜ. The diagonal elements (Σₚₚ, Σᵥᵥ) represent how much these variables deviate from their own mean. We call this variance.
The off-diagonal elements of Pₜ express covariance between state elements. If Σₚᵥ is zero, for instance, then we know that an error in velocity won’t influence an error in position. If it’s any other value, we can safely say that one affects the other in some way. PDFs without covariance terms look like Figure 1 above, with major and minor axes aligned with our world axes. PDFs with covariance are skewed off-axis depending on how extreme the covariance is:
Fig. 2. A PDF with non-zero covariance. Notice the ‘tilt’ in the major and minor axes of the ellipsis.
Variance, covariance, and the related correlation of variables are valuable, as they make our PDF more information-dense.
We know how to predict xₜ₊₁, but we also need the predicted covariance Pₜ₊₁ if we’re going to describe our state fully. We can derive it from xₜ₊₁ using some (drastically simplified) linear algebra:However, we can factor in the effects of noisy control inputs another way: by adding a process noise covariance matrix Qₜ:
We have now derived the full prediction step:
Fig. 3. Starting state PDF in green, predicted state PDF in red. Notice how the distribution is more spread out. Our state is less certain than before.
There’s a good reason for that: everything up to this point has been a sort of “best guess”. We have our state, and we have a model of how the world works; all that we’re doing is using both to predict what might happen over time. We still need something to support these predictions outside of our model.
Something like sensor measurements, for instance.We’re getting there! So far, this post has covered: