1
$\begingroup$

Let $X_1,...,X_n$ be some observations, and let $\theta$ be some parameter of the density function we want to estimate.

Then, it is well known that

$l(\theta) = n^{-1} \sum_{i=1}^n \log f(X_i ; \theta)$ is called the average log-likelihood.

What is $\mathbb{E}[\log f(X_1 ; \theta)]$ called? Meaning, "the expected value of the likelihood"? It can be called the cross-entropy, or perhaps it has other name, but it seems to me it should have a name relating it to the likelihood, such as "population likelihood" perhaps or something of that sort.

Anyone knows? Wikipedia did not help much here.

  • 0
    I know that the second derivative is called [Fisher information](http://en.wikipedia.org/wiki/Fisher_information). I don't recall a special name for the expectation of the log-likelihood itself.2010-12-30
  • 0
    what is the expectation over? and why is there $X_1$ and not $X$ in there? Otherwise, isn't this expectation just the negative of the entropy?2010-12-30

1 Answers 1

3

slowsolver makes a very good point - the value of the parameter in the expectation is not explicit.

one usually considers log-likelihood-ratios, rather than log-likelihoods - as in the following:

let

$$K(\theta;\theta_0) = {\mathrm E}_{\theta_0}\log\frac{f(X|\theta)}{f(X|\theta_0)}.$$

this is just a kullback-leibler divergence between the pdfs indexed by $\theta$ and $\theta_0$. assuming $\theta_0$ is the [fixed] true value of the parameter, this K-L number differs by a constant from the expected log-likelihood [taken wrt $\theta_0$].

the significance of this K-L quantity is that it is negative if $\theta\ne\theta_0$ and is zero when $\theta=\theta_0$. [this follows directly from the concavity of log.] thus the K-L quantity, as a function of $\theta$, assumes its max at $\theta=\theta_0$.

since the average log-likelihood-ratio for the sample converges [for each $\theta$] to $K(\theta;\theta_0)$, one sees heuristically that the MLE of $\theta$ ought to be near $\theta_0$ when $n$ is large. [this is a handwaving version of wald's proof that that the MLE is consistent to the true value of the parameter. a rigorous argument requires, inter alia, continuity of $f$ in $\theta$ and conditions to insure that the K-L number stays bounded away from zero outside of a compact subset of $\theta$ [so that the MLE cannot escape to $\infty$] - as well as assorted other niceties.]

the above behavior of the K-L number as a function of $\theta$ also underlies arguments that [maximum-]likelihood-ratio tests are consistent as well.

if the expectation for the 'expected log-likelihood' is wrt some fixed pdf $f(x)$, which need not be one of the $f(\cdot|\theta)$ pdfs, then the value of $\theta$ maximizing the K-L quantity indicates the pdf in the model 'most like' $f(\cdot)$ [it has minimal K-L divergence from $f(\cdot)$]. when $f(\cdot)$ is not in the model [so that the model is incorrect], it is possible that the K-L function attains its max at more than one value of $\theta$. then the MLE can oscillate among the various maximizers of the K-L function.

if the expectation for the 'expected log-likelihood' is wrt $\theta$, rather than a fixed $\theta_0$ [or some other fixed $f(\cdot)$], i don't know offhand what interest there is in that quantity. [it is equivalent to considering $K(\theta,\theta_0)$ as a function of $\theta_0$, rather than as a function of $\theta$. but usually one wants the behavior of the entire average-log-likelihood[-ratio] function under the fixed true value $\theta_0$, not just its behavior at $\theta$ when $\theta$ is the true value.]