On basis function selection for sparse Gaussian process regression

Preprint link here.

Sparse Gaussian processes achieve $\mathcal{O}(N)$ inference by replacing the kernel with an appropriate expansion in a fixed basis $\{\phi_j\}$ on the input space. Given a compute budget $M \ll N$ , practitioners conventionally truncate the basis to its first $M$ entries.

Nothing in the formalism, however, prevents one from selecting only those $M$ basis functions that matter for the data at hand. This would avoid spending budget on basis functions where there is no signal, but it requires a criterion for ranking the candidates.

We propose three such criteria derived from an information-theoretic view of the basis-function selection problem. Each criterion matches a different state of knowledge at selection time: (no data), (no prior), and an (in-between) state. We then study the performance of truncation vs. selection strategies on six UCI regression benchmarks across three basis families: Hilbert-space Gaussian processes (HSGP), variational Fourier features (VFF), and variational inducing spherical harmonics (VISH).

We observe that the (no data) criterion is a safe default, matching or improving on truncation for HSGP, VFF, and VISH, with substantial gains for VISH and improvements over a recently developed selection heuristic for that basis family. The data-aware (no-prior) and (in-between) criteria provide substantial gains over truncation specifically for HSGP, which is the most broadly used of the three families in practice.

Go to next post ▤ in research series.