10

I have a continuous random variable $\tau$ and I want to evaluate

$$ E\left(\sum_{i=1}^{\lfloor \tau \rfloor} Y_i\right), $$

where $Y_i$ are known, non-random, and $\lfloor . \rfloor$ is the floor function. If $Y_i$s were iid I know I could use Wald's equation, for instance, but that is not the case. I am able to solve this through Monte Carlo, as I can simulate different $\tau$s. However, this will be very time-consuming since $Y_i$ can be big and the Monte Carlo samples can be large. It would be significantly easier if I could approximate the expectation above with $$ \sum_{i=1}^{\lfloor E(\tau) \rfloor} Y_i. $$

Is there a theoretical guarantee of this approximation?

Note on support: The vectors $Y_i$ are typically not large in magnitude, but they can be large in dimensions. The domain of $\tau$ is fixed to be $(1,N)$, where $N$ is known in advance, and it is unimodal.

Peter Mortensen
  • 343
  • 3
  • 10
Car Loz
  • 850
  • 3
    I don't quite see how the dimensionality of the $Y_i$ comes in, can't we just treat them as scalars, component by component? And if the $Y_i$ are typically not large, then my answer below still makes sense: if $Y_1=0$, then your approximation is $0$, and the correct value is completely determined by the later $Y_i$. Your additional knowledge on $\tau$ is also consistent with my adversarial example. I think you will need to invest a lot more information to get a useful approximation of the form you are looking for. – Stephan Kolassa Sep 07 '22 at 18:14
  • 3
    Let $S_n=\sum_{i=1}^n Y_i$ and let $p_n=\Pr(\tau\lt n) - \Pr(\tau\lt n-1).$ The usual formula for the expectation applies, $\sum_{n=1}^\infty p_n S_n.$ Absent additional assumptions about these quantities, you are in the most general setting that applies to any discrete variable. – whuber Sep 07 '22 at 20:30
  • 1
    Clearly the approximation will be poor if the $Y_j$ for $j > \lfloor E(\tau) \rfloor$ can be arbitrarily larger (or smaller) than the $Y_i$ for $i \le \lfloor E(\tau) \rfloor$ and in other similar cases. – Henry Sep 08 '22 at 21:11

5 Answers5

25

The expectation $$ \mathbb E\left[\sum_{i=1}^{\lfloor\tau\rfloor} Y_i\right]=\mathbb E\left[\sum_{i=1}^{\infty} \mathbb I_{\tau\ge i} Y_i\right]$$simplifies into $$Y_1\underbrace{\mathbb P(\tau\ge 1)}_{=1}+Y_2\mathbb P(\tau\ge 2)+\cdots+Y_N\underbrace{\mathbb P(\tau=N)}_{=0}$$ and since the $Y_i$'s are known, only the cdf of $\tau$ need be approximated by Monte Carlo, if I understand correctly, resulting in $$Y_1+Y_2\hat{\mathbb P}(\tau\ge 2)+\cdots+Y_{N-1}\hat{\mathbb P}(\tau\ge N-1)$$ where $\hat{\mathbb P}$ denotes the empirical distribution. The magnitude / dimension of the $Y_i$'s thus does not impact the Monte Carlo effort.

Galen
  • 8,442
Xi'an
  • 105,342
11

This approximation will definitely not work in general. Consider a $\tau$ with support on $(1,3)$ and $E(\lfloor\tau\rfloor)\in [1,2)$. Then $$ \sum_{i=1}^{E(\lfloor \tau \rfloor)} Y_i = Y_1, $$ but if $Y_2\to\infty$ and $\tau$ has any mass at all on $[2,3)$, $$ E\left(\sum_{i=1}^{\lfloor \tau \rfloor} Y_i\right)\to\infty. $$ You will need to include more information on the distribution of $\lfloor\tau\rfloor$ and the whole vector $Y$.

utobi
  • 11,726
Stephan Kolassa
  • 123,354
3

Answer: Part 1 (Expectation of the $E(\tau)$ will not work) To add to the previous answers about why the expectation will not work.

Consider the following, let us assume:

  1. $\tau$ uniformly distributed ~ Unif([1,3]) (i.e. p($\tau$=1)=p($\tau$=2)=p($\tau$=3)=$\frac{1}{3}$) [I.e. you will have to redefine your continuous distribution as a discrete one]
  2. $Y_1=1$, $Y_2=5$, $Y_3=100$

Then,

  • $E(\tau)=2$
  • $\Sigma_i^{E(\tau)} Y_i$ = 1+5 = 6

However, the real expectation is:

  • $\frac{1}{3}(1) + \frac{1}{3}(1+5) + \frac{1}{3}(1+5+100)$ = 38

Answer: Part 2 (Explicit Formula for the expectation)

Starting with $E(\Sigma_{i=1}^{\lfloor\tau\rfloor}Y_i)$, we have:

$E(\Sigma_{i=1}^{\lfloor\tau\rfloor}Y_i)$ = $\Sigma_{i=1}^N p(\tau \geq i) Y_i$ = $\Sigma_{i=1}^N (1-p(\tau < i)) Y_i$ = $\Sigma_{i=1}^N (Y_i) - \Sigma_{i=1}^N p(\tau < i) Y_i$ = $\Sigma_{i=1}^N (Y_i) - \Sigma_{i=1}^N CDF_{\tau}(i) Y_i$

Where CDF is the Cumulatative Distribution Function.


I apologise for any mistakes I have made as this is my first post. Hope this helps.

Kolyan1
  • 39
1

Let me change a little notation (so we use uppercase for random variables)

We want $E[Z]$ where

$$Z = \sum_{i=0}^{\lfloor T \rfloor} a_i = \sum_{i=0}^\infty a_i h(T-i) \tag 1$$

where $h()$ is the Heaviside step function. We assume $T$ is continuous, non-negative, with density $f_T$ and CDF $F_T$. Let $G(t)=P(T>t)=1-F_T(t)$

Then

$$E[h(T-i)]=P(T > i) = G(i) \tag 2$$ and

$$E[Z] = \sum_{i=0}^{\infty} a_i G(i) \tag 3$$

We know $E[T]=\int_0^{\infty} g(t) dt $. This suggests that your approximation is fair only when the $a_i$ are (almost?) constant and $f_T$ is quite smooth.

Edit: alternatively, using summation by parts

$$E[Z] = \sum_{i=0}^{\infty} g_i A(i) \tag 4$$

where $g_i = P( i \le T < i+1)$ (probability mass function of $\lfloor T \rfloor$ ) and $A_i=a_i + a_{i+1}$

leonbloy
  • 1,458
-1

Set $S_t = \sum_{i=1}^t Y_i$. Then $$ E[S_\tau\mid \tau \leq T] = \sum_{t=1}^T S_t P(\tau = t) \triangleq H_T =H_{T-1} + S_T\cdot P(\tau = T) $$

Y = getY()
p = simulatep()
H = 0
T = -1
S = 0
while not converged:
    T += 1
    S = S + Y[T]
    H = H + S * p[T]
Hunaphu
  • 2,212