2
$\begingroup$

Recently I discussed an experiment with a friend. Assume we start a random experiment. At first there is an array with size $100,000$, all set to $0$. We calculate at each round a random number modulo $2$ and select one random position in that array. If the number in the array is $1$, nothing is changed and otherwise the pre-computed value is set. The question is: how many distinct hash values would we have added in $1$%, $5$%, $50$%, $95$%, $99$% of all cases?

Example: $4$ rounds with array of size $10$:

Array                     Position   random number
[0,...,0]                    5              0
[0,...,0]                    7              1
[0,...0,1,0,0,0]             6              1
[0,..0,.1,1,0,0,0]           6              0
[0,..0,.1,1,0,0,0]           2              0

First we considered this a somehow simple problem, but after thinking for some hours, searching the web, and asking some math students, we couldn't find a solution. Do you know a probability distribution for this problem?

Remark: Was also posted on Math Overflow and got its answer there.

  • 2
    FYI, I wouldn't post this on MO. I made the same mistake, thinking it was the StackOverflow for Math. If you read the faq it is not.2010-07-25
  • 0
    "The question is: how many distinct hash values would we have added in 1%, 5%, 50%, 95%, 99% of all cases?" I'm afraid I don't understand the question - there are no hash values involved here. Also, what do you mean by "what would we have added" and "in 1% of cases?" Are you asking the average number of values needed to generate to fill 1% of the array, or the average percent of the array filled after generating x values?2010-07-25
  • 2
    It is not clear to me what your question is asking. It sounds like you want to choose a random position in the array, and if it is a 0, change it to a 1 with 50% probability. Then you want to find the distribution for the number of iterations it takes until the array contains n% 1's? Is that correct?2010-07-26
  • 2
    This question has been answer-accepted on MO. What to do on this side? Someone copy the MO answer or just close it?2010-08-02
  • 1
    From my point of view closing it would be the best alternative.2010-08-02
  • 0
    I've quoted the MO answer with attribution as community wiki. I'd rather not see the only answer to a question be contained entirely in an offsite link. If you want to discuss what should be the correct procedure to deal with questions that get answered offsite, you can raise the question on meta, but I see little downside to this approach.2010-08-03

1 Answers 1

2

As answered by T. on MathOverflow.

This is equivalent to (among other names) the Coupon Collector problem. Your are asking about the distribution of the number of coupons collected after t steps, when the total number of possible coupons is n.

http://en.wikipedia.org/wiki/Coupon_collector%27s_problem

ADDED: this and related distributions are also studied under other names such as Birthday Problem, random mappings, and random hashing. Kolchin-Sevastyanov-Chistyakov Random Allocations, Knuth The Art Of Computer Programming, vol. 2, and Flajolet & Sedgwick Analytic Combinatorics all discuss these problems and may contain the precise asymptotics of the distribution you are looking for.

III.10 in Flajolet and Sedgewick gives the Poisson answer $1−\exp(−t/n)$ when the ratio is held constant, but other asymptotic regimes are also of interest especially in hashing problems. Birthday problem is when $t=O(n^{1/2})$ and one gets statistics of the number of collisions. For t=n^k with 1/2