Skewness describes how symmetric distribution is. It is defined as the third moment of the distribution after normalization:
Kurtosis is a measure of tailedness. The heavier the tail is, the larger its Kurtosis is. Mathematically, it is defined as
where is the mean, and is the standard deviation of the distribution of ; and is -th central moment.
It can be shown that the Kurtosis of Gaussian distribution is . People usually use excess Kurtosis as the extra Kurtosis of a distribution compared with standard Gaussian distribution. Namely,
We can create a plot with the square of Skewness as its x-axis and Kurtosis as its y-axis. This plot is called Cullen and Frey graph.
This graph helps us to determine which distribution our data is closest to.
International Swaps and Derivatives Association (ISDA) regulates the variation and initial margin (IM) and standardized it in a model called Standard Initial Margin Model(SIMM).
Gaussian mixture model (GMM) is a probability model for a mixture of several Gaussian distributions with possibly different mean and variance.
For example, we can model the 100m race time of all grade 12 students in a high school as two normal distributions: one for female students and one for male students. It is reasonable to expect two groups have different mean and may different variance.
When to use Gaussian mixture model?
1. Data has more than one clusters. In the following picture, the left one models the data with one normal distribution; the right one models the data by two normal distribution, Gaussian mixture model. Obviously, the right one better describes the data.
Pictures are from https://brilliant.org/wiki/gaussian-mixture-model/
2. Each cluster is theoretically normally distributed.
Theory of Gaussian Mixture Model
1. Gaussian distribution in 1 dimension Since there are several Gaussian distributions in the GMM. We assign an index to each Gaussian distribution: for where K is the number of clusters. For a given mean and variance , the probability density function is
Above is not the mathematical conditional expectation, but a statistical way of saying we know true parameters in advance.
2. Gaussian mixture model in 1 dimension The probability density function of GMM is the weighted average of several Gaussian densities:
where satisfies
Plug in the Gaussian density,
Note that this is a density function because its integral on is 1.
3. Gaussian mixture model in n-dimension Let be an n-dimension multivariate Gaussian random variable with mean vector and covariance matrix . Then the probability density function is
Then, the probability density function of GMM, which is the weighted average of serveral multivariate Gaussian density, is
with
Training the Model
Suppose that we know the number of clusters a priori. (The choice of relies on statistician’s experience.) Then, we can use Expectation Maximization (EM) algorithm to find the parameters and or for multi-dimensional model. Let be the number of clusters, and be the number of samples.
Step 1: Initialize
Randomly choose samples and set them to be the group mean. For example, in the case of , , . (note that this is also valid for multi-dimensional case)
Set all variances (resp. covariance matrices) to be the same value: sample variance (resp. sample covariance matrix). Namely,
where .
Set all weights equal to , i.e.,
Step 2: Expectation
We compute the probability that a sample belongs to cluster .
Step 3: Maximization
Update parameters then go back to step 2 until converge