Say we have a set of $12$ numbers - $(0,1,2,0,3,0,4,0,5,6,0).$ We go thru this set one number at a time, and choose a number with a probability $1/3$; we sample. What is the probability we end up with $3$ $0$'s out of the $4$ numbers we will likely choose(on average)? Is there a closed-form solution for this?
Probability of getting three zeros when sampling a set
0
$\begingroup$
probability
-
0That's sampling without replacement. You should read up on the [hypergeometric distribution](http://en.wikipedia.org/wiki/Hypergeometric_distribution) – 2010-11-14
-
0@kahen: no it isn't, since they go through the list one number at a time. – 2010-11-14
-
1@probability_noob: Your list is of length 11. – 2010-11-14
-
0Also, it's not clear what is the event whose probability you're trying to calculate (Nate and me offer conflicting interpretations). – 2010-11-14
-
0Sorry about the confusion, and thanks for answering. To rephrase: – 2010-11-14
-
0Consider a stream of values with different frequencies(in the example above, 0 has more frequency) and you are sampling with a probability p. I want to know what is the probability that I sample high frequency events and the probability that I sample low frequency events (say the number 1 or 2 or 3 etc). – 2010-11-14
-
0This is still not very clear, can you write a C function (any other language will do) that returns "true" if your event happens, given a Boolean array of size 12? – 2010-11-14
-
0Apologize again. Let me take a more concrete example. Say I'm looking at a stream of web page visits by the entire human population. Some web pages are visited more than others (fb.com, cnn.com, etc) and some are relatively rare (e.g. stackechange.com). Now I do not collect ALL pages I see, rather I sample with a probability p. It is a stream. What I want to know is - after I've sampled for some time, what is the probability of sampled highly popular pages (cnn) vs rare pages (stackexchange). Say I know the popularity distribution of the pages. Thanks much again for your time – 2010-11-14
-
0I think it is pretty clear what you want. Nate got it right (imho). – 2010-11-14
-
0In my opinion, Ralth got it right, or at least is close to it (my answer is a cumbersome version of Ralth's). – 2010-11-15