2
$\begingroup$

I am trying to compute the Kolmogorov distance between two univariate gaussian distributions $\mathcal{N}(0,n)$ and $\mathcal{N}(0,2n)$ for large $n$. I have a feeling this should be simple but whatever I have tried so far doesn't work. Could anyone give me some hints?

By the Kolmogorov distance between two distributions $P$ and $Q$ I mean:

$$\displaystyle max_t | \Pr[P>t] - \Pr[Q>t] | $$

Thanks,

2 Answers 2

2

Because the Gaussians have the same mean and Gaussians are symmetric about the mean, the expression $\Pr[P>t] - \Pr[Q>t]$ should have a single maximum and a single minimum whose absolute values are the same. So you could just solve $\max_t (\Pr[P>t] - \Pr[Q>t])$. To do that, I would express $\Pr[P>t]$ as $$\int_t^{\infty} \frac{1}{\sqrt{2\pi n}} e^{-x^2/(2n)} dx,$$ and $\Pr[Q>t]$ similarly. Then take the derivative of the differences with respect to $t$ via the Fundamental Theorem of Calculus, set it to $0$, and solve for $t$. Since you don't want a full answer, I'll stop there.

  • 0
    Thanks! I got the value of $t$ ($\sqrt{n \log 4}$) but now I have difficulty evaluating the expression $Pr[P >t]$. I understand there is no closed form known, are there approximations which might be useful in this case?2010-11-23
  • 0
    @Preyas: Right, there's no closed form. The cdf of the Gaussian can be expressed in terms of the well-known error function erf$(x)$, though. See, for example, http://en.wikipedia.org/wiki/Normal_distribution#Cumulative_distribution_function2010-11-23
  • 0
    Thanks again! I had not read the Taylor expansion of the error function2010-11-23
0

You can numericaly calculate the Kolmogorov distance between ${\cal N}(\mu_1, \sigma_1^2)$ and ${\cal N}(\mu_2, \sigma_2^2)$ with these R functions:

Kdist00 <-  function(a,b){
  z <- (a * b - (sign(b)+(b==0)) * sqrt(b^2 + 2 * (a^2 - 1) * log(a)))/(1 - a^2)
  out <- pnorm(a*z+b)-pnorm(z)
  attr(out, "where") <- z
  return(out)
}
Kdist0 <- function(mu1,sigma1,mu2,sigma2){
  b <- (mu1-mu2)/sigma2
  a <- sigma1/sigma2
  if(b>=0){
    out <- Kdist00(a,b) 
    attr(out, "where") <- mu1 + sigma1*attr(out, "where")
    return(out)
  }else{
    return(Kdist0(mu2,sigma2,mu1,sigma1))
  }
}
Kdist <- function(mu1,sigma1,mu2,sigma2){
  if(sigma1==sigma2){
    where <- -(mu1-mu2)/sigma2/2
    out <-  abs(pnorm(where)-pnorm(-where))
    attr(out, "where") <- where
    return(out)
  }
  return(Kdist0(mu1,sigma1,mu2,sigma2))
}

These functions are provided in this blog article which also provides the derivation.

Your claim that $K\bigl({\cal N}(0, n), {\cal N}(0, 2n)\bigr)$ is attained at $t=\pm \sqrt{4\log n}$ looks right:

> Kdist(0,sqrt(n),0,sqrt(2*n))
[1] 0.08303204
attr(,"where")
[1] -2.039334
> sqrt(n*log(4))
[1] 2.039334