Gaussian Processes

A Gaussian Process is a random process where each of the random variables follows a Gaussian Distribution. You can also view a Gaussian Process as a Multivariate Gaussian with infinite dimensionality. If you don’t know what a random process is yet, we will cover this shortly. Gaussian processes help us quantify our uncertainty and in some ways I feel model real world scenarios more honestly. Gaussians, as you will see also make the math a lot easier.

Random Processes

A random process is a random function of time. Remember the basic concept of a function, which is the mapping of some input space (the collection of valid inputs) to some output space. To make things easier for us we constrain the inputs to:

to valid indices of
a random variables.

How do we make a function random?

By mapping each input of the function $f (t)$ to a sample (a realization of a random variable) of the random variable $X_{t}$ corresponding to the index of the input $t$ . For example, if we have 100 random variables: $[X_{1}, \dots, X_{20}]$

Then $f (t)$ is defined on integers (time indices) from 1 to 100. and $f (1) = sample (X_{1}) = x_{1}$ and $f (16) = sample (X_{16}) = x_{16}$ etc.

If this is still a bit confusing, don’t worry. Took me a while to get too. Below you can try to get the concept interactively.

The Grid is our input space for $f (t)$ , they represent $[X_{1}, \dots, X_{20}]$ .

Click on any square to get the return value for that input, and it gives you $x_{t}$ the sample from $X_{t}$ . Or you can manually give $f (t)$ the argument $t$ . Try to click on a square twice, or input the same index and more than once.

This is the difference.

If it were an ordinary function, $f (t)$ would always return the same value. But with a random function. We don’t always have to get the same output for the same input even though the mappings are unique.

Enter an index (1-20) or click on a square to sample a random value:

f ( ) =

Note that $t$ doesn’t have to be discrete. Time can have continous values. But for intuition and visualization I think it’s easier if we deal with discrete numbers for now. Now each of these random variables (squares above) were independent from one another. It didn’t matter what it’s own value was, or what any other squares realization was.

Mean and Autocovariance function

Let’s make an assumption.

We know the mean of each of the random variables and how they depend on one another. This means that we can define a mean function $μ (t) = 𝔼 [X_{t}]$ , and a covariance function, $c (i, j) = cov (X_{i}, X_{j})$ .

This is where it becomes useful to assume that the random variables are gaussian. Multivariate Gaussians have a few very useful properties.

if all the random variables in $[X_{1}, \dots, X_{20}]$ are gaussian, then their joint distribution is gaussian.
any marginal (slice of $[X_{1}, \dots, X_{20}]$ ) is gaussian. For example: $[X_{6}, \dots, X_{20}] given [X_{1}, \dots, X_{5}]$ or any other subvector must have a gaussian distribution.
if we know the mean and covariance matrix, we can write the full probability density function of $[X_{1}, \dots, X_{20}]$ .
the conditional distribution i.e. $[X_{1}, \dots, X_{5}]$ is gaussian too.

You will see the math, but for now you can imagine that this means for any set of unknown points we can predict the average value and how certain we are about it.

Let’s go back to our random function from before with the assumption that each of the random variables is a standard normal (mean 0, variance 1). Additionally, we assume that the covariance between any two random variables is $\frac{1}{2 | i - j |}$ , i.e. 1 over twice the distance between them. So $cov (1,2) = \frac{1}{2 (1)} = \frac{1}{2}$ and $cov (1,5) = \frac{1}{2 (4)} = \frac{1}{8}$ respectively.

Interactive Conditional Gaussian Process

Click the squares in order (1-20) to sample a random value. Observe the circles representing the sampled values.