2

I am writing my thesis where I am analyzing the effect of tweet sentiment on abnormal stock returns for companies during an event of negative press.

My dataset comprises of various negative press events for different companies for an event window of 7 days (t = -7 to t = 7), where for each event I measure abnormal returns per day, cumulative abnormal returns (CAR) per day, and the average tweet sentiment of that day.

My regression formula currently looks like this:

abnormal returns ~ Tn * sentiment + Tz * sentiment + Tp * sentiment

where:

Tn = abs(t) if t is negative, otherwise 0
Tz = 1 if t=0 (event day), otherwise 0
Tp = abs(t) is t is positive, otherwise 0

Running this regression gave me no significant coefficients, however when I used CAR as my dependent, I got a significant coefficient for the Tp var, and the sentiment:Tp var.

My question is, is it statistically wrong to use CAR as a dependent variable, as the individual observations are not independent from one another, all else equal? Or is it still valid?

1 Answers1

1

The cumulative variable has no errors that are independently distributed. Ordinary least squares (OLS) makes this assumption and if it is violated then you over-estimate the precision of your fitted curve. You should use generalized least squares (GLS) instead, to involve the correlation between the error terms.

  • Could I also continue with OLS if I use robust standard errors? – Damiano Galante Jun 12 '22 at 20:19
  • @DamianoGalante no, those are working more to reduce the variance of the estimate when the error distribution has long tails or has outliers. – Sextus Empiricus Jun 12 '22 at 20:21
  • See also this question https://stats.stackexchange.com/questions/491794/ and in the answer I made a plot with time series generated according to an Ornstein-Uhlenbeck process (a sort of random walk with an attractive force). You can see that those time series seemingly have some trend or other patterns. But that is due to the correlation between the data points. You already have this effect with your tweets and stock returns. That is data with some autocorrelation. When you use the cumulative values it becomes even stronger. – Sextus Empiricus Jun 12 '22 at 20:28
  • Alright, thanks a lot! I am trying to model it on R using gls() but I'm not really familiar with the terms that I need to assign within the function (like the correlation and weights term). Would you be able to explain that to me or have any material that can explain it? – Damiano Galante Jun 12 '22 at 23:25
  • @DamianoGalante I will add an example. But don't have too many hopes about it. I believe that fitting the cumulative data (while also thinking about the correlation of the error terms) is not doing anything different from fitting the data directly. – Sextus Empiricus Jun 13 '22 at 08:20
  • @DamianoGalante: Note that, in the cumulative case, what you're really is modelling as the response is $y_n = \sum_{i}^{n} \epsilon_i$. This is non-stationary in variance ( and as noted by Sextus Empiricus, not independent from previous response ) so that OLS assumption is violated. – mlofton Nov 12 '23 at 19:58