Peaking into the abyss

Here is a weird question: how “zero” can samples from a standard unit Gaussian actually get? Sure, the mean is zero. But that’s just a statistic. How much mass is around zero? Gaussian-weighted components are abundant in inference; can they ever truly “switch off” ?

Let xN(0,1)x \sim \mathcal{N}(0,1) and consider the smallest magnitude among nn samples:

mn=mininxi.m_n = \min_{i \le n} |x_i|.

Near zero the Gaussian density is essentially flat,

p(x)12π,p(x) \approx \frac{1}{\sqrt{2\pi}},

so mass scales linearly:

p(x<ε)2πε.p(|x| < \varepsilon) \approx \sqrt{\frac{2}{\pi}}\,\varepsilon.

Invert this heuristic and you obtain a clean rule:

After nn samples, the smallest value you typically see is on the order of 1/n1/n.

More precisely,

E[mn]π21n.\mathbb{E}[m_n] \approx \sqrt{\frac{\pi}{2}}\frac{1}{n}.

So:

samples from N(0,1)\mathcal{N}(0,1)expected minimum
10310^3
103\sim 10^{-3}
10610^6
106\sim 10^{-6}

Every decade of samples gets you another order of magnitude near what I call “the abyss floor.”

Looking at the abyss on a log scale

A convenient way to visualize this is to define

y=log10x.y = \log_{10} |x|.

Now each unit step left corresponds to another factor of ten toward zero.

The density becomes

p(y)10y,p(y) \propto 10^y,

for very negative yy. Each additional decade loses a factor of ten in probability mass.

Surely a sparse prior fixes this?

A common intuition is that Gaussian priors are “dense,” while Laplace priors encourage sparsity. So perhaps a Laplace distribution explores the abyss more eagerly.

Take the maximum entropy distribution with fixed mean E[x]=0E[x]=0 and mean absolute deviation Ex=1E|x|=1:

p(x)=12ex.p(x) = \frac12 e^{-|x|}.

Let r=xr = |x|. Then

rExp(1).r \sim \mathrm{Exp}(1).

The minimum of nn exponentials is still exponential:

mnExp(n),m_n \sim \mathrm{Exp}(n),

and therefore

E[mn]=1n.\mathbb{E}[m_n] = \frac{1}{n}.

Same scaling as for the Gaussian above!

Near zero, Gaussian and Laplace behave identically.

Both satisfy

P(x<ε)ε.P(|x| < \varepsilon) \propto \varepsilon.

So neither prior truly “dives into zero”.

The Laplace prior changes the shape of the peak (it has a cusp), which strongly affects optimization and MAP estimates. But in terms of raw probability mass around microscopic neighborhoods, it is far less different than our intuition suggests.

Go to next post ▤ or ▥ previous post in research series.