0
$\begingroup$

Say we have a set of $12$ numbers - $(0,1,2,0,3,0,4,0,5,6,0).$ We go thru this set one number at a time, and choose a number with a probability $1/3$; we sample. What is the probability we end up with $3$ $0$'s out of the $4$ numbers we will likely choose(on average)? Is there a closed-form solution for this?

  • 0
    That's sampling without replacement. You should read up on the [hypergeometric distribution](http://en.wikipedia.org/wiki/Hypergeometric_distribution)2010-11-14
  • 0
    @kahen: no it isn't, since they go through the list one number at a time.2010-11-14
  • 1
    @probability_noob: Your list is of length 11.2010-11-14
  • 0
    Also, it's not clear what is the event whose probability you're trying to calculate (Nate and me offer conflicting interpretations).2010-11-14
  • 0
    Sorry about the confusion, and thanks for answering. To rephrase:2010-11-14
  • 0
    Consider a stream of values with different frequencies(in the example above, 0 has more frequency) and you are sampling with a probability p. I want to know what is the probability that I sample high frequency events and the probability that I sample low frequency events (say the number 1 or 2 or 3 etc).2010-11-14
  • 0
    This is still not very clear, can you write a C function (any other language will do) that returns "true" if your event happens, given a Boolean array of size 12?2010-11-14
  • 0
    Apologize again. Let me take a more concrete example. Say I'm looking at a stream of web page visits by the entire human population. Some web pages are visited more than others (fb.com, cnn.com, etc) and some are relatively rare (e.g. stackechange.com). Now I do not collect ALL pages I see, rather I sample with a probability p. It is a stream. What I want to know is - after I've sampled for some time, what is the probability of sampled highly popular pages (cnn) vs rare pages (stackexchange). Say I know the popularity distribution of the pages. Thanks much again for your time2010-11-14
  • 0
    I think it is pretty clear what you want. Nate got it right (imho).2010-11-14
  • 0
    In my opinion, Ralth got it right, or at least is close to it (my answer is a cumbersome version of Ralth's).2010-11-15

4 Answers 4