F-Localizations

DKW-F-Localization

Empirikos.DvoretzkyKieferWolfowitzType
DvoretzkyKieferWolfowitz(;α = 0.05, max_constraints = 1000) <: FLocalization

The Dvoretzky-Kiefer-Wolfowitz band (based on the Kolmogorov-Smirnov distance) at confidence level 1-α that bounds the distance of the true distribution function to the ECDF $\widehat{F}_n$ based on $n$ samples. The constant of the band is the sharp constant derived by Massart:

\[F \text{ distribution}: \sup_{t \in \mathbb R}\lvert F(t) - \widehat{F}_n(t) \rvert \leq \sqrt{\log(2/\alpha)/(2n)}\]

The supremum above is enforced discretely on at most max_constraints number of points.

source

$\chi^2$-F-Localization

Empirikos.ChiSquaredFLocalizationType
ChiSquaredFLocalization(α) <: FLocalization

The $\chi^2$ F-localization at confidence level $1-\alpha$ for a discrete random variable taking values in $\{0,\dotsc, N\}$. It is equal to:

\[f: \sum_{x=0}^N \frac{(n \hat{f}_n(x) - n f(x))^2}{n f(x)} \leq \chi^2_{N,1-\alpha},\]

where $\chi^2_{N,1-\alpha}$ is the $1-\alpha$ quantile of the Chi-squared distribution with $N$ degrees of freedom, $n$ is the sample size, $\hat{f}_n(x)$ is the proportion of samples equal to $x$ and $f(x)$ is then population pmf.

source

Gauss-F-Localization

Empirikos.InfinityNormDensityBandType
InfinityNormDensityBand(;a_min,
                         a_max,
                         kernel  =  Empirikos.FlatTopKernel(),
                         bootstrap = :Multinomial,
                         nboot = 1000,
                         α = 0.05,
                         rng = Random.MersenneTwister(1)
                    )  <: FLocalization

This struct contains hyperparameters that will be used for constructing a neighborhood of the marginal density. The steps of the method (and corresponding hyperparameter meanings) are as follows

  • First a kernel density estimate $\bar{f}$ with kernel is fit to the data.
  • Second, a bootstrap (options: :Multinomial or Poisson) with nboot bootstrap replicates will be used to estimate $c_n$, such that:

\[\liminf_{n \to \infty}\mathbb{P}\left[\sup_{x \in [a_{\text{min}} , a_{\text{max}}]} | \bar{f}(x) - f(x)| \leq c_ n\right] \geq 1-\alpha\]

Note that the bound is valid from a_min to a_max. α is the nominal level and finally rng sets the seed for the bootstrap samples.

source

This F-Localization currently only works for homoskedastic Normal samples with common noise variance $\sigma^2$. By default the above uses the following kernel, with bandwidth $h = \sigma/\sqrt{\log(n)}$, where $n$ is the sample size:

Empirikos.FlatTopKernelType
FlatTopKernel(h) < InfiniteOrderKernel

Implements the FlatTopKernel with bandwidth h to be used for kernel density estimation through the KernelDensity.jl package. The flat-top kernel is defined as follows:

\[K(x) = \frac{\sin^2(1.1x/2)-\sin^2(x/2)}{\pi x^2/ 20}.\]

Its use case is similar to the SincKernel, however it has the advantage of being integrable (in the Lebesgue sense) and having bounded total variation. Its Fourier transform is the following:

\[K^*(t) = \begin{cases} 1, & \text{ if } t|\leq 1 \\ 0, &\text{ if } |t| \geq 1.1 \\ 11-10|t|,& \text{ if } |t| \in [1,1.1] \end{cases}\]

julia> Empirikos.FlatTopKernel(0.1)
FlatTopKernel | bandwidth = 0.1
source
Empirikos.SincKernelType
SincKernel(h) <: InfiniteOrderKernel

Implements the SincKernel with bandwidth h to be used for kernel density estimation through the KernelDensity.jl package. The sinc kernel is defined as follows:

\[K_{\text{sinc}}(x) = \frac{\sin(x)}{\pi x}\]

It is not typically used for kernel density estimation, because this kernel is not a density itself. However, it is particularly well suited to deconvolution problems and estimation of very smooth densities because its Fourier transform is the following:

\[K^*_{\text{sinc}}(t) = \mathbf 1( t \in [-1,1])\]

source