3

Consecutive differences in Poisson arrivals have an Exponential Distribution. In modelling this kind of data, I have usually seen the arrival rate (lambda) held as constant. Sometimes I have seen Non-Homogeneous approach where arrival lambda can change as a function of time.

In Non-Homogeneous approaches, I have generally seen the arrival rate change as a basic step/staircase function. But I was wondering if it is also possible to have the arrival rate parameter change probabilistically/stochastically according to some process.

For example, perhaps the arrival rate can change according to an Autoregressive Process - or perhaps the arrival rate can fluctuate probabilistically according to a Discrete Markov Chain (eg. two states lambda1, lambda2 - P(lambda1,lambda1), P(lambda1,lambda2), P(lambda2,lambda1), P(lambda2, lambda2)).

I think this might be able to make the models more flexible and realistic since its quite likely that rates might hover stochastically around points instead of uniformly going up or down ... but I am not sure if this allowed (i.e. stochastically changing rate parameter) because it might violate assumptions or complicate the modelling/inference process?

I wrote some R simulations to illustrate what I am talking about:

library(ggplot2)

set.seed(123)

Case 1: AR(1) process

n <- 500 # number of time periods phi <- 0.9 # AR(1) coefficient

Define the constant rate

lambda <- 5

Simulate the AR(1) process

arrival_rate_ar <- arima.sim(n = n, model = list(ar = phi), sd = sqrt(lambda*(1-phi^2)))

Ensure all arrival rates are positive a

arrival_rate_ar <- abs(arrival_rate_ar) + lambda

Simulate the arrival data with AR(1) arrival rate

arrival_data_ar <- rpois(n, lambda = arrival_rate_ar)

Case 2: Simulate constant arrival rate

arrival_rate_constant <- rep(lambda, n) arrival_data_constant <- rpois(n, lambda = arrival_rate_constant)

Case 3: Define the switching rate

lambda1 <- 3 lambda2 <- 8 p <- 0.05 arrival_rate_switch <- rep(lambda1, n) for(i in 2:n){ if(runif(1) < p){ arrival_rate_switch[i] <- ifelse(arrival_rate_switch[i-1] == lambda1, lambda2, lambda1) } else { arrival_rate_switch[i] <- arrival_rate_switch[i-1] } } arrival_data_switch <- rpois(n, lambda = arrival_rate_switch)

Create a data frame

df <- data.frame(Time = rep(1:n, 3), ArrivalRate = c(arrival_rate_ar, arrival_rate_constant, arrival_rate_switch), ArrivalData = c(arrival_data_ar, arrival_data_constant, arrival_data_switch), RateType = rep(c("AR(1)", "Constant", "Switch"), each = n))

plots

p1 <- ggplot(df[df$RateType == "AR(1)",], aes(x = Time, y = ArrivalData)) + geom_line() + ggtitle("Arrival Data (AR(1) Rate)") + xlab("Time") + ylab("Number of Arrivals") + theme_bw()

p2 <- ggplot(df[df$RateType == "AR(1)",], aes(x = Time, y = ArrivalRate)) + geom_line() + ggtitle("Arrival Rate (AR(1))") + xlab("Time") + ylab("Rate") + theme_bw()

p3 <- ggplot(df[df$RateType == "Constant",], aes(x = Time, y = ArrivalData)) + geom_line() + ggtitle("Arrival Data (Constant Rate)") + xlab("Time") + ylab("Number of Arrivals") + theme_bw()

p4 <- ggplot(df[df$RateType == "Constant",], aes(x = Time, y = ArrivalRate)) + geom_line() + ggtitle("Arrival Rate (Constant)") + xlab("Time") + ylab("Rate") + theme_bw()

p5 <- ggplot(df[df$RateType == "Switch",], aes(x = Time, y = ArrivalData)) + geom_line() + ggtitle("Arrival Data (Switching Rate)") + xlab("Time") + ylab("Number of Arrivals") + theme_bw()

p6 <- ggplot(df[df$RateType == "Switch",], aes(x = Time, y = ArrivalRate)) + geom_line() + ggtitle("Arrival Rate (Switching)") + xlab("Time") + ylab("Rate") + theme_bw()

enter image description here

  • Is the approach I described mathematically logical?
  • Is this kind of approach popular in statistics (ie suppose we observe data and want to fit models based on these approaches to this data)?
  • Do people ever use these kinds of approaches or is it unnecessarily complicated and mathematically incorrect?
  • Or perhaps (due to the stochastic nature of the models) the approaches I described would result in parameter estimates with large variances/unbiased/not consistent/not asymptotically normal?

Would be interested to hear opinions on this. The closest thing I could find to approach I described was:

  • "Doubly Stochastic Processes"
  • Coxian Process
  • the financial Heston Model (ie Black-Scholes where variance is now a stochastic time parameter)
  • a combination of a Poisson Thinning Process and Compounding Poisson Process?
  • 4
    In my experience, the step function approach is not usual. Have you searched for information on nonhomogeneous poisson processes? – whuber Dec 17 '23 at 18:25
  • In our traffic analysis textbook, the step function approach is the only approach that is shown. Do you think the approaches I described make sense? – Uk rain troll Dec 17 '23 at 18:27
  • I just read some more ... perhaps this is a combination of a Poisson Thinning Process and a Compounding Poisson Process? – Uk rain troll Dec 17 '23 at 19:51
  • 2
    See this site search. A general account of thinning is given at https://stats.stackexchange.com/a/621281/919. – whuber Dec 18 '23 at 14:04
  • 1
    thank you! is what i have done correct? is it logical? – Uk rain troll Dec 18 '23 at 15:50
  • 1
    "Correct" could mean different things. It is a model. The challenges for you include (a) whether it's appropriate for your application and (b) if so, how to estimate the Poisson rate over time. What we lack is a question about a definite, real-world problem you face. If you have one in mind, it would be helpful to describe it here. – whuber Dec 18 '23 at 18:07
  • was just wondering ... I thought that the arrival rate parameter in many systems might evolve dynamically according to an AR process or a discrete time markov chain ... yet I can't find anyone who tried this online .... which makes me wonder: perhaps doing this is mathematically incorrect in principle? – Uk rain troll Dec 19 '23 at 03:22
  • 1
    @user123945: no, it is not incorrect in principle. But maybe you need a lot of data to make it useful – kjetil b halvorsen Dec 27 '23 at 05:36
  • As a concrete example of a time varying arrival rate, consider the rate at which orders arrive at the stock market. The arrival rate versus time is widely described as "u-shaped", with more activity in the early morning and late afternoon, and least activity mid-day. – krkeane Jan 05 '24 at 13:45
  • 1
    Agreeing with @whuber "the step function approach is not usual" - the Poisson parameter is continuous. If I were modeling something like the stock market order arrival rate, I would consider a (continuous) state space model, where the arrival rate is the state parameter. The approach I'm thinking of would be along the lines of West and Harrison "Bayesian forecasting and dynamic models" 2006 / ch. 4 "the dynamic linear model". – krkeane Jan 05 '24 at 13:58
  • "Self-exciting point processes" are a form of auto-regressive Poisson processes that may used to model phenomenon like gun shots, where events are likely to occur in bunches. – krkeane Jan 05 '24 at 14:01

1 Answers1

2

$$ \begin{aligned} y_t &\sim Po(m_t)\\ m_t &= \alpha m_{t-1} + v_t \end{aligned} $$ Since $E[y_t] = m_t$ the mean (rolling for instance) is a decent estimator of $m$. Otherwise, a theoretically better behaved estimate is easily given by a particle filter. The particle filter can give the smoothing distribution but a smoothing estimate can also be found by maximizing the log-likelihood using a an optimization package: $$ \ell(m) = \log p(m_{1:T}\mid y_{1:T})"\propto" -\frac{1}{2\sigma^2}\sum_{t=1}^T(m_t - \alpha m_{t-1})^2 + \sum_{t=1}^T (y_t \log(m_t) - m_t) $$ I used the assumption that $\alpha=1$ and a small $\sigma$ to avoid having to deal with any additional parameters.

Finally, assuming that $y_t = m_t + e_t$ gives the common linear HMM-setting which is computationally more efficient than sequential Monte Carlo. A Poisson can be approximated by a normal variable so the assumption is not completely disconnected from theory.

Hunaphu
  • 2,212