F-Localizations

FLocalization

Abstract type representing F-Localizations.

FittedFLocalization

Abstract type representing a fitted F-Localization (i.e., wherein the F-localization has already been determined by data).

DvoretzkyKieferWolfowitz

DvoretzkyKieferWolfowitz(;α = 0.05, max_constraints = 1000) <: FLocalization

The Dvoretzky-Kiefer-Wolfowitz band (based on the Kolmogorov-Smirnov distance) at confidence level 1-α that bounds the distance of the true distribution function to the ECDF \(\widehat{F}_n\) based on \(n\) samples. The constant of the band is the sharp constant derived by Massart:

\[ F \text{ distribution}: \sup_{t \in \mathbb R}\lvert F(t) - \widehat{F}_n(t) \rvert \leq \sqrt{\log(2/\alpha)/(2n)} \]

The supremum above is enforced discretely on at most max_constraints number of points.

ChiSquaredFLocalization

ChiSquaredFLocalization(α) <: FLocalization

The \(\chi^2\) F-localization at confidence level \(1-\alpha\) for a discrete random variable taking values in \(\{0,\dotsc, N\}\). It is equal to:

\[ f: \sum_{x=0}^N \frac{(n \hat{f}_n(x) - n f(x))^2}{n f(x)} \leq \chi^2_{N,1-\alpha}, \]

where \(\chi^2_{N,1-\alpha}\) is the \(1-\alpha\) quantile of the Chi-squared distribution with \(N\) degrees of freedom, \(n\) is the sample size, \(\hat{f}_n(x)\) is the proportion of samples equal to \(x\) and \(f(x)\) is then population pmf.

InfinityNormDensityBand

InfinityNormDensityBand(;a_min,
                         a_max,
                         kernel  =  Empirikos.FlatTopKernel(),
                         bootstrap = :Multinomial,
                         nboot = 1000,
                         α = 0.05,
                         rng = Random.MersenneTwister(1)
                    )  <: FLocalization

This struct contains hyperparameters that will be used for constructing a neighborhood of the marginal density. The steps of the method (and corresponding hyperparameter meanings) are as follows

  • First a kernel density estimate \(\bar{f}\) with kernel is fit to the data.
  • Second, a bootstrap (options: :Multinomial or Poisson) with nboot bootstrap replicates will be used to estimate \(c_n\), such that:

\[ \liminf_{n \to \infty}\mathbb{P}\left[\sup_{x \in [a_{\text{min}} , a_{\text{max}}]} | \bar{f}(x) - f(x)| \leq c_ n\right] \geq 1-\alpha \]

Note that the bound is valid from a_min to a_max. α is the nominal level and finally rng sets the seed for the bootstrap samples.

This F-Localization currently only works for homoskedastic Normal samples with common noise variance \(\sigma^2\). By default the above uses the following kernel, with bandwidth \(h = \sigma/\sqrt{\log(n)}\), where \(n\) is the sample size:

FlatTopKernel

FlatTopKernel(h) < InfiniteOrderKernel

Implements the FlatTopKernel with bandwidth h to be used for kernel density estimation through the KernelDensity.jl package. The flat-top kernel is defined as follows:

\[ K(x) = \frac{\sin^2(1.1x/2)-\sin^2(x/2)}{\pi x^2/ 20}. \]

Its use case is similar to the SincKernel, however it has the advantage of being integrable (in the Lebesgue sense) and having bounded total variation. Its Fourier transform is the following:

\[ K^*(t) = \begin{cases} 1, & \text{ if } t|\leq 1 \\ 0, &\text{ if } |t| \geq 1.1 \\ 11-10|t|,& \text{ if } |t| \in [1,1.1] \end{cases} \]

julia> Empirikos.FlatTopKernel(0.1)
FlatTopKernel | bandwidth = 0.1

SincKernel

SincKernel(h) <: InfiniteOrderKernel

Implements the SincKernel with bandwidth h to be used for kernel density estimation through the KernelDensity.jl package. The sinc kernel is defined as follows:

\[ K_{\text{sinc}}(x) = \frac{\sin(x)}{\pi x} \]

It is not typically used for kernel density estimation, because this kernel is not a density itself. However, it is particularly well suited to deconvolution problems and estimation of very smooth densities because its Fourier transform is the following:

\[ K^*_{\text{sinc}}(t) = \mathbf 1( t \in [-1,1]) \]