What is the difference between convergence in distribution, convergence in probability and almost sure convergence?
References
What is the difference between convergence in distribution, convergence in probability and almost sure convergence?
References
Convergence in distribution means pointwise convergence of the CDFs. It's not a form of convergence of the random variables per se.
Convergence in probability says that for every $\epsilon > 0$, the probability of the sequence of random variables being more than $\epsilon$ from the limiting random variable goes to $0$. Convergence in probability is also called convergence in measure.
Convergence almost everywhere = almost sure convergence = pointwise convergence of the random variables except possibly on a set of measure zero.
Here is a diagram showing which modes of convergence imply which other modes.
For $X_1$, $X_2$, $\ldots$, i.i.d., with finite variance $\sigma ^{2}$, the Central Limit Theorem (convergence in distribution) says that if we wait long enough, that is, take a large enough sample, then $$ \frac{\sqrt n(\overline{X}_n-\mu)}{\sigma}$$ will have a probability distribution which is arbitrarily close to a norm(0,1) distribution. Notice the CLT doesn't say anything about the actual behavior of any particular $\overline{X}$ for large $n$, only that if we observed a whole bunch of them (all at that large $n$) and made a histogram then it would be approximately bell-shaped.
The Weak Law of Large Numbers (convergence in probability) says that if we wait long enough (i.e., take a large enough sample) then the chance that $$ \frac{\overline{X}_n-\mu}{\sigma}$$ is nearby zero can be made arbitrarily high. Note a couple of things:
The Strong Law of Large Numbers (convergence almost surely) says that $$ \frac{\overline{X}_n-\mu}{\sigma}$$ is 100% sure to be arbitrarily close to 0, provided we wait long enough (take a large enough sample). That is strong, indeed. Again, please note a couple of things:
Of course, nothing is ever guaranteed in probability; we can do no better than 100% sure. Also, the SLLN doesn't say anything about how long we would have to wait to be within $\epsilon$, we would need the Law of the Iterated Logarithm for something like that. Finally, this discussion is about convergence (in distribution/probability/a.s.) to a constant, while the general definition is about convergence to another random variable (or even convergence of two sequences). But we can regain the intuition if we think about differences.
I don't see anything wrong with the other two answers given for this question, I just thought I'd offer another perspective.
Consider a series of random variables $X_i$ to $X$. Let $F(x)$ be the cumulative distribution function of $X$ an $F_{n}$ be the cumulative distribution function of $X_i$. Convergence in distribution occurs when $F(x)$ is the pointwise limit of $F_{n}$, ie. $\lim_{n \to \infty} F_{n}(x)=F(x)$, whenever F is continuous at x.
Convergence in probability is defined as follows $\lim_{n \to \infty } Pr(|X_n-X| \ge e)=0$. It is stronger than convergence in distribution - I don't know how to prove this, but I can give the following explanation. For convergence in distribution, we only looked at the cumulative functions, so it didn't matter whether the variables were dependent or not. For convergence in probability, $X$ needs to either be a constant or $X_n$ must be dependent on X.
Almost sure convergence is defined as follows $Pr(\lim_{n \to \infty} X_n=X)=1$. On this MathOverflow thread I asked if there was a way to reduce limits to some kind of canonical form. I obtained the following results where $dif_x=|X_n-X|$:
We notice that for convergence in probability, we can reduce $p$ below any arbitrary positive value by choosing values of $n$ high enough, while in the second formula we can reduce it to 0. Additionally, the probability in the first formula looks at individual probabilities, while the second looks at probabilities over all possible values of e and n. From these forms it is clear that Almost Sure Convergence implies Convergence in probability. If convergence in probability is not occurring, then we can find some e, d, with with $p \ge d \forall n$. This alone will cause the $p$ in almost sure convergence to not be 0.
A sequence X1 is. 1,1,2,1,1,2 ... 1,1,2... A sequence X2 is. 2,1,1,2,1,1,... 2,1,1.
Assume a classical setup for the measure.
X1 and X2 converge in distribution. Since both random variables output the value "1" with 2/3 probability, and "2" has probability 1/3.
But X1 will never converge in probability to X2 because as the number of samples increases two thirds of the samples from the random variables will still be different from each other. Obviously they are not getting close to each other as they should in convergence in probability.