F-Localizations
FLocalization
Abstract type representing F-Localizations.
FittedFLocalization
Abstract type representing a fitted F-Localization (i.e., wherein the F-localization has already been determined by data).
DvoretzkyKieferWolfowitz
DvoretzkyKieferWolfowitz(;α = 0.05, max_constraints = 1000) <: FLocalization
The Dvoretzky-Kiefer-Wolfowitz band (based on the Kolmogorov-Smirnov distance) at confidence level 1-α
that bounds the distance of the true distribution function to the ECDF \(\widehat{F}_n\) based on \(n\) samples. The constant of the band is the sharp constant derived by Massart:
\[ F \text{ distribution}: \sup_{t \in \mathbb R}\lvert F(t) - \widehat{F}_n(t) \rvert \leq \sqrt{\log(2/\alpha)/(2n)} \]
The supremum above is enforced discretely on at most max_constraints
number of points.
ChiSquaredFLocalization
ChiSquaredFLocalization(α) <: FLocalization
The \(\chi^2\) F-localization at confidence level \(1-\alpha\) for a discrete random variable taking values in \(\{0,\dotsc, N\}\). It is equal to:
\[ f: \sum_{x=0}^N \frac{(n \hat{f}_n(x) - n f(x))^2}{n f(x)} \leq \chi^2_{N,1-\alpha}, \]
where \(\chi^2_{N,1-\alpha}\) is the \(1-\alpha\) quantile of the Chi-squared distribution with \(N\) degrees of freedom, \(n\) is the sample size, \(\hat{f}_n(x)\) is the proportion of samples equal to \(x\) and \(f(x)\) is then population pmf.
InfinityNormDensityBand
InfinityNormDensityBand(;a_min,
a_max,
kernel = Empirikos.FlatTopKernel(),
bootstrap = :Multinomial,
nboot = 1000,
α = 0.05,
rng = Random.MersenneTwister(1)
) <: FLocalization
This struct contains hyperparameters that will be used for constructing a neighborhood of the marginal density. The steps of the method (and corresponding hyperparameter meanings) are as follows
- First a kernel density estimate \(\bar{f}\) with
kernel
is fit to the data. - Second, a
bootstrap
(options::Multinomial
orPoisson
) withnboot
bootstrap replicates will be used to estimate \(c_n\), such that:
\[ \liminf_{n \to \infty}\mathbb{P}\left[\sup_{x \in [a_{\text{min}} , a_{\text{max}}]} | \bar{f}(x) - f(x)| \leq c_ n\right] \geq 1-\alpha \]
Note that the bound is valid from a_min
to a_max
. α
is the nominal level and finally rng
sets the seed for the bootstrap samples.
This F-Localization currently only works for homoskedastic Normal samples with common noise variance \(\sigma^2\). By default the above uses the following kernel, with bandwidth \(h = \sigma/\sqrt{\log(n)}\), where \(n\) is the sample size:
FlatTopKernel
FlatTopKernel(h) < InfiniteOrderKernel
Implements the FlatTopKernel
with bandwidth h
to be used for kernel density estimation through the KernelDensity.jl
package. The flat-top kernel is defined as follows:
\[ K(x) = \frac{\sin^2(1.1x/2)-\sin^2(x/2)}{\pi x^2/ 20}. \]
Its use case is similar to the SincKernel
, however it has the advantage of being integrable (in the Lebesgue sense) and having bounded total variation. Its Fourier transform is the following:
\[ K^*(t) = \begin{cases} 1, & \text{ if } t|\leq 1 \\ 0, &\text{ if } |t| \geq 1.1 \\ 11-10|t|,& \text{ if } |t| \in [1,1.1] \end{cases} \]
julia> Empirikos.FlatTopKernel(0.1)
FlatTopKernel | bandwidth = 0.1
SincKernel
SincKernel(h) <: InfiniteOrderKernel
Implements the SincKernel
with bandwidth h
to be used for kernel density estimation through the KernelDensity.jl
package. The sinc kernel is defined as follows:
\[ K_{\text{sinc}}(x) = \frac{\sin(x)}{\pi x} \]
It is not typically used for kernel density estimation, because this kernel is not a density itself. However, it is particularly well suited to deconvolution problems and estimation of very smooth densities because its Fourier transform is the following:
\[ K^*_{\text{sinc}}(t) = \mathbf 1( t \in [-1,1]) \]