2

We are interested in approximating the probability of $n$ (possibly dependent) events $\{e_1,\dots, e_n\},$ but we can only estimate the probability of an intersection of any $q$ of them:

$$P(e_{k_1}\cap e_{k_2}\cap \dots \cap e_{k_q})$$

The problem's model gives us the following identity. We have an algebraic function $f:\underset{q \text{ times}}{\underbrace{[0,1]\times \dots \times [0,1]}}\rightarrow [0,1]$ where:

$$f\left(P(e_{k_1}),P(e_{k_2}),\dots, P(e_{k_n})\right) = P(e_{k_1}\cap e_{k_2}\cap \dots \cap e_{k_q}).$$

$q$ is fairly small (it's 9), but $n$ can be on the order of dozens to several thousand.

If we took enough samples we would have a chance of just using algebra to solve for each $P(e_i)$ individually, but the amount of samples and computation time needed to do this seems huge.

Is there a good way to approach such a problem?

  • Several thousand is small computationally maybe you need to profile and analyze your code or consider parallel computing – pyCthon Sep 28 '12 at 17:13

1 Answers1

3

I'm not clear from your question what type of model you're using, but in Linear Modelling there is a class of problems known as "large p, small n" in the "Variable Selection" literature, the best known solution is probably the Lasso, refer Tibshirani's Lasso page.

Sean
  • 1,619
  • it appears the OP is doing something along the lines of inverting a function of several variables and is finding that it becomes quite difficult as the number of variables increase. – Macro Jun 30 '12 at 06:36
  • ah... if that is the case my answer is probably a delete... – Sean Jun 30 '12 at 06:44