library(latex2exp)  # use latex expressions
library(actuar)     # pareto distribution
library(cumstats)

Refs:

Tail Differences

Check these distributions. Which ones correspond to a Gaussian?

The shapes are similar but only the first is Gaussian. The others are the logistic, the Cauchy and a location–scaled Beta. They are all unimodal, symmetric and seem to decrease at similar rates. It seems we can just pick the one more convenient mathematically, ie, the Gaussian, and model stuff with it.

The difference are in the tails. The small parts that go out of the plots into infinity. However, these differences make a lot of difference!

Let’s duplicate each distribution and shift their centers by -4 and +4. The next plots show their density product:

The results are strikingly different! The products of similar shape distributions does not reflect the superficial similarity the distribution curves shown before. If we choose an inadequate distribution to model ‘stuff’, we will produce estimations that will be quite far off from what happens in reality.

We can get some intuition for the previous plots by plotting the original distributions in log scale:

The log of a Gaussian is a parabola and summing two parabolas give us a parabola. For the logistic the flat plateau is the sum of two lines with opposite slopes. For the Cauchy the tails decrease very slowly so they are still not too small at the other peak, resulting in the bimodal shape. For the Beta both domain are disjoint resulting in a flat zero.

The next plots show each log component in orange and the log sum in blue.

Distances and Distributions

Consider a unimodal symmetrical distribution with its mode at \(x=0\). Let’s assume that its density decreases with its distance to zero. The distribution’s pdf will follow

\[p(x) = \frac{c}{f(|x|)} \propto \frac{1}{f(|x|)}\]

where \(c\) must be a value such that the pdf integrates to \(1\) over \(x\in (-\infty,+\infty)\).

What are the possible functions \(f\) defining proper distributions?

Function \(f\) must have the following restrictions

The first candidate is linear growth, \(f(x) = 1 + |x|\), however

\[\int_{-\infty}^{+\infty} \frac{1}{1 + |x|} = +\infty\] The expression

\[\int_{-\infty}^{+\infty} \frac{1}{1 + |x|^k} < +\infty\]

is only true when \(k>1\).

For \(k=2\) we have the Cauchy distribution, a distribution with such heavy tails that no moments exist (even the mean does not exist). The monstrosities between \((1,2)\) have no name, as far as I can tell. The only distribution I know heavier than the Cauchy, with a closed form is the Levy distribution, but it does not fit this distance expression.

If we know the CDF of a distribution we can apply the inverse transformation to sample from such a distribution. Here’s an example for the Cauchy (\(k=2\))

sample.f <- function(n, k) {
  
  f <- function(x) (1 / (1+abs(x)^k))
  c <- integrate(f, -Inf, +Inf)$value
  
  inv.f <- function(u) tan(u*pi - 0.5)  # inverse transformation

  inv.f(runif(n,0,1))
}

set.seed(201)
xs <- sample.f(1e5, 2)
hist(xs, breaks=5e4, prob=T, xlim=c(-12,12), col='lightblue', border='white', main='')
curve(dcauchy(x), add=T, lwd=2, col='dodgerblue')

However, for most values \(k\) there is no closed formula for the CDF. And most sampling methods fail since these distributions are unbounded and have infinite variance.

For \(k>2\) the distributions are similar to the t-Student, but t-Student follows

\[p(x) \propto \frac{1}{(1 + \frac{|x|^2}{2k – 1})^k}\]

basically, when distances fall proportional to a polynomial, we get heavy-tailed distributions.

The next step is to consider exponential growth.

\[p(x) \propto \frac{1}{\exp(|x|)}\]

is the family of sub-exponential distributions, like the Laplace and the Exponential. The tail falls exponentially fast but slower than a Gaussian. This is the frontier between light and heavy tails.

For

\[p(x) \propto \frac{1}{\exp(|x|^2)}\]

we get the Gaussian where almost all probabilistic mass is around the center. This is already a thin-tailed distribution.

For

\[p(x) \propto \frac{1}{\exp(|x|^k)}\]

with \(k>2\) we have tails even thinner than the Gaussian.

The slower the decreasing velocity of \(f\) the harder is to do inference, because more and more information gets trapped in the tails, which are not so easily sampled.

Taxonomy of Heavy Tails