1
$\begingroup$

We have a set of items $I = \{i_1, i_2, ..., i_n\}$. Each of these items has what we call a p value, which is some real number. We want to choose a subset of $I$, call it $I'$, of size $m$ (for some $m$ with $1 \le m \le n$) such that the average of the p values of the items in $I'$ falls within some specified range, $[p_l, p_u]$. (For example, we might require an average p value between 0.70 and 0.74.) Moreover, we want to do this efficiently.

We hope to do this in O(n) time, but any polynomial time algorithm is good enough. We certainly do not want to just try every possible subset of I of size $m$ and then check whether it satisfies the average p-value constraint.

Finally, we will be doing this repeatedly and we want the subsets chosen to be a uniformly random distribution over all the possible such subsets.

Is there a way of doing this?

  • 0
    Your question might be better suited to stackoverflow.com or cstheory.stackexachange.com.2010-08-24
  • 0
    A duplicate of this question exists on [cstheory.se](http://cstheory.stackexchange.com/questions/506).2010-08-25

2 Answers 2

1

Assuming $m$ is an input, your problem is NP-Complete.

Subset sum can be reduced to it.

Given $A = \{x_{1}, ..., x_{n}\}$ for which we need to find a subset of sum $S$, we can create $n$ inputs to your algorithm, with input $A$, $m$ and $p_{L} = p_{U} = \frac{S}{m}$, for $1 \leq m \leq n$.

If $m$ is fixed then the brute force algorithm (which you don't want) is $O(n^m)$ and so has a polynomial time algorithm.

0

One solution I was considering was to choose items for I' successively such that the average p value of the items selected so far always falls within some upper and lower bound.

Obviously, we could just choose p_l and p_u as our lower and upper bounds, respectively, and this would satisfy the average p value constraint. But this would also be too restrictive. It could leave out a large number of possible I' where small p values might balance out high p values.

So I was considering letting the upper and lower bounds start out far apart when the first items are selected and then let them narrow down to p_l and p_u when the final mth item is selected. So we would have a cone of ranges that would become narrower as more items are picked.

More precisely, we would define two functions u and l such that the average of the first i items picked (call this p_i) would satisfy l(i) <= p_i <= u(i) for 0 <= i <= m.

I'm not really sure how to define these two functions, though. I know that l(m) = p_l and u(m) = p_u, but I don't know what l(0) or u(0) should be. Also, should the functions be linear?

Comments or suggestions about this possible solution are welcome.