Research Programming Artificial Intelligence Interviews Other

A Guide to Kalman Filter

Q: What is Kalman filter?

The Kalman filter is an algorithm that estimates the state of a dynamic system from a series of noisy measurements. Developed by Rudolf Kalman in the 1960s, it has become a core component of control systems, robotics, navigation, and finance for tracking and prediction. The Kalman filter is particularly useful because it provides an efficient computational means to make real-time predictions and corrections, minimizing the influence of noise.

Article by Ivan Smetannikov

December 18th, 2024

13 min read

In this article, we’ll make a few notes on how Kalman filter works, and will show how to overcome most common problems that might appear while using it.

What is Kalman filter?

The Kalman filter is an algorithm that estimates the state of a dynamic system from a series of noisy measurements. Developed by Rudolf Kalman in the 1960s, it has become a core component of control systems, robotics, navigation, and finance for tracking and prediction. The Kalman filter is particularly useful because it provides an efficient computational means to make real-time predictions and corrections, minimizing the influence of noise.

Key Concepts in the Kalman Filter

State Variables: Represent the underlying true state of the system, such as position and velocity in tracking problems. These variables evolve over time and are not directly measurable.
Prediction and Update: The filter operates in a loop of two main steps:

Prediction Step: Predicts the next state based on the current state and a known model of system dynamics.
Update Step: Adjusts the prediction based on a new measurement, correcting for observed discrepancies.

Noise: Assumes noise in both the process (system dynamics) and measurements. The Kalman filter uses statistical assumptions to model these uncertainties.
Gaussian Assumptions: It assumes that all noise is Gaussian (i.e., follows a normal distribution), which allows it to rely on Gaussian statistics to make optimal estimations.

In order to use the Kalman filter to estimate the internal state of a process given only a sequence of noisy observations, one must model the process in accordance with the following framework. This means specifying the matrices, for each time-step k, following:

$F_k$ , the state-transition model;
$H_k$ , the observation model;
$Q_k$ , the Variance and Covariance|covariance of the process noise;
$R_k$ , the Variance and Covariance|covariance of the observation noise;
sometimes $B_k$ , the control input model, only if it exists;
$u_k$ , the control vector, represents the controlling input into control-input sequence.

Predict step:

We are trying to predict how our state changes both for the model and for the

Predicted (a priori) state estimate: $\hat x_{k|k-1} = F_k \hat x_{k-1|k-1} + B_k u_k$ , or what measurement we predict.
Predicted (a priori) estimate covariance: $P_{k|k-1}=F_k P_{k-1|k-1} F_k^t + Q_k$ , or the uncertainty of the predicted measurement. Works due to the covariance transformation rule.

Update step:

Innovation or measurement pre-fit residual: $\widetilde y = z_k - H_k \hat x_{k|k-1}$ , where $z_k$ is measurement
Innovation (or pre-fit residual) covariance: $S_k = H_k P_{k|k-1} H_k^T + R_k$
Optimal Kalman gain: $K_k = P_{k|k-1} H_k^T S_k^{-1}$
Updated (a posteriori) state estimate: $\hat x_{k|k} = \hat x_{k|k-1} + K_k \widetilde y_k$
Updated (a posteriori) estimate covariance: $P_{k|k} = (I - K_k H_k) P_{k|k-1}$
Measurement post-fit residual: $\widetilde y = z_k - H_k \hat x_{k|k}$

Basic Kalman Filter implementation

Several lines of output:

prefit residual: -10.19742879 postfit residual: -7.31806471

prefit residual: -12.33359224 postfit residual: -8.85095993

prefit residual: -8.91856241 postfit residual: -6.40016151

prefit residual: -10.93745076 postfit residual: -7.84887742

Kalman filter without a model

We still can apply Kalman filter for measurements denoising when we do not have a process model, but only have measurements themselves.

Kalman filter with a complex transition model

In cases where the prediction model is complex or nonlinear, and it’s difficult or impossible to represent it with a simple matrix, you can use the predicted value from the model directly rather than relying on matrix calculations in the Kalman filter. This approach is common, especially in nonlinear Kalman filtering techniques like the Extended Kalman Filter (EKF) or the Unscented Kalman Filter (UKF).

Extended Kalman Filter (EKF)

This one can be used if you can calculate Jacobian of the transition function. In the standard (linear) Kalman filter, the state prediction step is:

\mathbf{x}_{n|n-1} = F \mathbf{x}_{n-1}

where $F$ is the state transition matrix. However, if you have a nonlinear or complicated model, you can replace this matrix-based prediction with a custom prediction function $f$ that captures the complex behavior of your model.

Define the Model Function: Let $f(\mathbf{x}_{n-1})$ represent your complex model that predicts the next state directly. Instead of computing $\mathbf{x}_{n|n-1}$ using $F \mathbf{x}_{n-1}$ , compute it as:

\mathbf{x}_{n|n-1} = f(\mathbf{x}_{n-1})

where $f$ could be any function (e.g., a nonlinear equation, simulation output, or a black-box prediction model).

You’ll still need a linear approximation of $f$ to update the covariance matrix $P$ . This is done by computing the Jacobian matrix $F_n$ of $f$ around the current state estimate:

F_n = \frac{\partial f}{\partial \mathbf{x}} \Big|_{\mathbf{x}_{n-1}}

This Jacobian matrix $F_n$ replaces the original state transition matrix $F$ and is used to update the covariance matrix as:

P_{n|n-1} = F_n P_{n-1} F_n^\top + Q

where $Q$ is the process noise covariance.

If calculating the Jacobian is difficult or impossible, you can use the Unscented Kalman Filter (UKF).

Unscented Kalman Filter (UKF)

The UKF doesn’t require an explicit Jacobian because it approximates the distribution of the state by propagating a set of “sigma points” through the nonlinear function $f$ . This makes it well-suited for complex models without linearization:

\mathbf{x}_{n|n-1} = f(\text{sigma points of } \mathbf{x}_{n-1})

Proceed with the Standard Update Step: After obtaining $\mathbf{x}_{n|n-1}$ (either from $f$ in EKF/UKF or through matrix multiplication in the linear Kalman filter), you can proceed with the usual Kalman filter update step, where measurements are incorporated to correct the predicted state.

The Unscented Kalman Filter (UKF) propagates sigma points through the nonlinear model to approximate the state distribution after the prediction step. Instead of linearizing the model (as in the Extended Kalman Filter), the UKF uses a set of carefully chosen sigma points that capture the mean and covariance of the state. These sigma points are propagated through the nonlinear model, and the resulting transformed points are used to approximate the predicted mean and covariance of the state distribution.

Here’s a step-by-step explanation of how sigma points propagation works in the UKF.

Step 1: Generate Sigma Points

For a state vector $\mathbf{x}_{n-1}$ of dimension $L$ with mean $\mathbf{x}_{n-1}$ and covariance $P_{n-1}$ , we create $2L + 1$ sigma points. These sigma points are chosen to capture the spread of the distribution around the mean.

Calculate the square root of the scaled covariance matrix:

\sqrt{(L + \lambda) P_{n-1}}

where $\lambda$ is a scaling parameter, often chosen as $\lambda = \alpha^2 (L + \kappa) - L$ , with tuning parameters $\alpha$ (often set to a small positive value, e.g., $10^{-3}$ ) and $\kappa$ (often set to 0 or 3-L).

Generate the sigma points:

\mathbf{X}_{0} = \mathbf{x}_{n-1}

\mathbf{X}_{i} = \mathbf{x}_{n-1} + \left[ \sqrt{(L + \lambda) P_{n-1}} \right]_i, \quad i = 1, \dots, L

\mathbf{X}_{i+L} = \mathbf{x}_{n-1} - \left[ \sqrt{(L + \lambda) P_{n-1}} \right]_i, \quad i = 1, \dots, L

where $\left[ \sqrt{(L + \lambda) P_{n-1}} \right]_i$ is the $i$ -th column of the matrix square root of $(L + \lambda) P_{n-1}$ .

This results in a set of $2L + 1$ sigma points:

\mathbf{X}_{0}, \mathbf{X}_{1}, \dots, \mathbf{X}_{2L}

These points are distributed around the mean $\mathbf{x}_{n-1}$ and represent the original state distribution.

Step 2: Propagate Sigma Points through the Nonlinear Function

Each sigma point is then propagated through the nonlinear state transition function $f$ :

\mathbf{X}_{i}^{\text{pred}} = f(\mathbf{X}_i), \quad i = 0, 1, \dots, 2L

This gives a set of transformed sigma points $\mathbf{X}_{i}^{\text{pred}}$ that represent the distribution of the predicted state after applying the nonlinear function.

Step 3: Compute the Predicted Mean and Covariance

Once we have the transformed sigma points, we calculate the predicted mean $\mathbf{x}_{n|n-1}$ and predicted covariance $P_{n|n-1}$ of the state.

Compute the predicted mean:

\mathbf{x}_{n|n-1} = \sum_{i=0}^{2L} W_{i}^{(m)} \mathbf{X}_{i}^{\text{pred}}

where $W_{i}^{(m)}$ are the weights for the mean, which depend on the scaling parameter $\lambda$ . Typically, $W_0^{(m)} = \frac{\lambda}{L + \lambda}$ and $W_i^{(m)} = \frac{1}{2(L + \lambda)}$ for $i = 1, \dots, 2L$ .

Compute the predicted covariance:

P_{n|n-1} = \sum_{i=0}^{2L} W_{i}^{(c)} \left( \mathbf{X}_{i}^{\text{pred}} - \mathbf{x}_{n|n-1} \right) \left( \mathbf{X}_{i}^{\text{pred}} - \mathbf{x}_{n|n-1} \right)^\top + Q

where $W_{i}^{(c)}$ are the weights for the covariance (similar to $W_{i}^{(m)}$ , but can be adjusted if desired), and $Q$ is the process noise covariance.

Step 4: Predict the Measurement and Update (Correction Step)

The prediction is completed at this point. In the UKF, the correction step (updating the state estimate based on the measurement) also uses sigma points.

Predict measurement sigma points: Transform the sigma points using the measurement function $h$ (if nonlinear):

\mathbf{Y}_{i} = h(\mathbf{X}_{i}^{\text{pred}}), \quad i = 0, 1, \dots, 2L

Compute predicted measurement mean and covariance:

Calculate the predicted measurement mean $\mathbf{y}_{n|n-1}$ and the measurement covariance $P_{yy}$ , as well as the cross-covariance $P_{xy}$ between the state and measurement. These are calculated using weighted sums similar to the predicted mean and covariance for the state.

Compute Kalman gain:

K = P_{xy} P_{yy}^{-1}

Update state and covariance:

Finally, use the Kalman gain to update the state estimate and covariance based on the measurement $\mathbf{z}_n$ :

\mathbf{x}_{n} = \mathbf{x}_{n|n-1} + K \left( \mathbf{z}_n - \mathbf{y}_{n|n-1} \right)

P_{n} = P_{n|n-1} - K P_{yy} K^\top

How to get required matrixes if you have only data samples

In the Kalman filter, the covariance matrices (particularly $P$ , the error covariance matrix, and $Q$ , the process noise covariance matrix) play essential roles in managing and representing uncertainties, but they are not always directly calculated from scratch. Here’s how each covariance matrix is determined and managed in a typical Kalman filter:

Error Covariance Matrix (P)

The error covariance matrix $P$ represents the uncertainty in the state estimate. It is initialized and then continuously updated by the Kalman filter.

Initialization of P

At the start, $P$ is usually initialized based on an initial guess about the uncertainties of the state estimates. For example, if the initial state variables are position and velocity, the initial $P$ matrix could be set with large values on the diagonal (to indicate high uncertainty if the initial estimate is rough) or small values if the initial state estimate is already quite accurate.

P_0 = \begin{bmatrix} \sigma_{position}^2 & 0 \\ 0 & \sigma_{velocity}^2 \end{bmatrix}

where $\sigma_{position}^2$ and $\sigma_{velocity}^2$ represent initial variances (uncertainties) in position and velocity estimates, respectively.

Update of P during Prediction and Correction

After initialization, $P$ is updated as part of each prediction and correction cycle:

Prediction Step:

During the prediction phase, the filter updates $P$ to account for the passage of time and the possibility of errors or variations in the system model. This is achieved by applying the following equation:

P_{t|t-1} = A \cdot P_{t-1|t-1} \cdot A^T + Q

$A$ : The state transition matrix, which models the dynamics of how each state variable (position, velocity, etc.) affects the next state.
$P_{t|t-1}$ : Predicted error covariance matrix, representing the updated uncertainty after accounting for system dynamics and added process noise.
$Q$ : Process noise covariance matrix, which introduces additional uncertainty to account for model imperfections.

This update increases $P$ to reflect the increasing uncertainty as we move forward in time without a new measurement.

Correction Step:

When a new measurement is available, $P$ is updated to reduce uncertainty using the Kalman gain $K$ :

P_{t|t} = (I - K_t \cdot H) \cdot P_{t|t-1}

$K_t$ : Kalman gain, which determines how much the measurement influences the state estimate.
$H$ : Observation matrix, mapping state variables to the measurement space.

This step reduces the uncertainty in $P$ by combining the prediction with the new measurement, allowing the Kalman filter to become more certain about the updated state.

Process Noise Covariance Matrix (Q)

The process noise covariance matrix $Q$ represents the uncertainty in the system’s dynamics or model. It’s essentially a model of how much unpredicted variation or disturbance (process noise) we expect in each state variable over time.

Setting Q

In most cases, $Q$ is designed rather than calculated based on the following considerations:

Expected Random Disturbances: If we are tracking a moving object, for instance, we might set a higher $Q$ if we expect the object to experience random, unpredictable accelerations (like wind or terrain changes).
Model Imperfections: Higher values in $Q$ are used to account for limitations in the system model.

For example, if we are tracking both position and velocity, we might set $Q$ as:

Q = \begin{bmatrix} \sigma_{process\ position}^2 & 0 \\ 0 & \sigma_{process\ velocity}^2 \end{bmatrix}

where:

$\sigma_{process\ position}^2$ and $\sigma_{process\ velocity}^2$ are the variances representing the expected uncertainty in position and velocity due to unknown disturbances or inaccuracies in the model.

In some cases, domain knowledge or experimentation is used to tune $Q$ to achieve desired performance. There are also advanced methods, such as adaptive filtering, where $Q$ can be adjusted dynamically based on observed errors.

Measurement Noise Covariance Matrix (R)

The measurement noise covariance matrix $R$ represents the uncertainty (noise) associated with the measurements received from sensors. Each element in $R$ quantifies how much noise is expected in each measurement.

Setting R

The values in $R$ are typically derived from:

Sensor Specifications: Often, the manufacturer provides information on the sensor’s accuracy, which can be used to set $R$ . For example, a GPS sensor may have an accuracy of ±3 meters, which would influence the variance (uncertainty) for position measurements.
Experimental Data: Sometimes $R$ is determined empirically by taking repeated measurements in a controlled environment to estimate the variance.

For example, if the Kalman filter is using a GPS sensor to measure position, $R$ might be set as:

R = \begin{bmatrix} \sigma_{measurement\ position}^2 \end{bmatrix}

where $\sigma_{measurement\ position}^2$ is the variance associated with the position measurement’s noise.

Matrices extraction from data summary

Here’s a summary of how each covariance matrix is determined in a Kalman filter:

Covariance Matrix

Covariance Matrix	Initialization	Update Method
$P$ (Error Covariance)	Initialized based on initial state uncertainty, often chosen manually.	Updated each cycle based on predictions and new measurements.
$Q$ (Process Noise Covariance)	Set based on expected model uncertainties and external disturbances. Often chosen empirically or based on domain knowledge.	Usually remains constant, though it can be adapted based on observed performance.
$R$ (Measurement Noise Covariance)	Set based on sensor noise characteristics, from specifications or empirical measurements.	Typically remains constant, but can also be adjusted if the sensor’s noise characteristics change.

In summary, these covariance matrices are often set based on known characteristics of the system and sensors, and then refined through the Kalman filter’s recursive process. The values in $P$ , $Q$ , and $R$ are crucial for determining how the Kalman filter balances the predicted state and measurement to achieve an optimal estimate.

Conclusion

The Kalman filter is a cornerstone algorithm for estimating the state of dynamic systems in the presence of noise and uncertainty. By leveraging a mathematical model of the system, it predicts future states and continuously refines these estimates using noisy measurements. At the heart of the Kalman filter are key components: the state transition matrix, observation model, and covariance matrices that describe the uncertainties in both the process and measurements. These matrices, derived from system dynamics and data, play a crucial role in balancing prediction accuracy and adaptability to real-world variability.

Understanding and fine-tuning these components based on system knowledge, sensor data, and empirical observations is essential for the filter to perform optimally. Whether tracking motion, filtering noisy signals, or managing uncertainties in complex systems, the Kalman filter provides a versatile framework that evolves as data is processed. This adaptability makes it invaluable across fields such as robotics, navigation, and finance, offering reliable insights even in unpredictable environments.

tagged:

3 upvotes

Get new articles via email

No spam – you'll only receive stuff we’d like to read ourselves.

A Guide to Kalman Filter