11
$\begingroup$

Given is a sequence $\langle a_1,a_2,\ldots,a_n\rangle$ over the alphabet $\{1,2,\ldots,m\}$ chosen uniformly at random among the $m^n$ possibilities. What is the expected size of the set $\{a_1,a_2,\ldots,a_n\}$?

If $m=n$ it seems the answer tends to $(1-1/e)n$ as $n\to\infty$, but I don't know why.

I bumped into this while benchmarking some code for hashtables, so I wouldn't be surprised if it is a standard result in the hash world.

2 Answers 2

24

Define the indicator random variable $I_i$ for $1 \leq i \leq m$ as $1$ if alphabet $i$ is present in the set ${a_1,\dots,a_n}$. Then the size of the set is simply $\sum_{i=1}^m I_i$. The expectation of this can be easily computed by linearity of expectation. The probability that $I_i$ equals $1$ is given by $1-\left( \frac{m-1}{m} \right)^n$ and therefore the expected size of the set is $m \left[ 1- \left( 1 - \frac{1}{m} \right)^n \right]$. For $m=n$, the limiting value is indeed as you mentioned in the question.

3

This is dealt with in depth at http://www.math.uah.edu/stat/urn/Birthday.html.

  • 1
    That link is broken for me right now (Feb 2012). Do you mean http://www.math.uah.edu/stat/urn/Birthday.html ?2012-02-28
  • 0
    Yes, the site must have been updated. I fixed the URL.2012-05-24