1

I would be very grateful if anyone with a stats background might sanity checks whether my approach is correct.

I am recording the prescribing of a particular drug over time. Ultimately, beyond the steps listed below, an 'interrupt' i.e., the prescription of a 2nd drug, is introduced halfway through the study. I am trying to monitor the pre-interrupt and post-interrupt trend in data. I am moving towards segmented regression analysis, but before then, I need to ensure that my initial regression analysis is correct.

I've outlined two scenarios. The first focuses on the number of participants, and the second on a normalised dose average.

NOTE: This is time-series data. I have checked for autocorrelation, there is none I suspect the number of participants at each time-point is independent of the time before.

First scenario The explanatory variable is time, binned per month over a two-year obs period (for 24 data points), and the y response is the number of participants prescribed drug_X. Initially, I examined participants ~ time as linear regression. However, this is count data, thus surely the relationship should be modelled using Poisson regression?

I've looked at the results of linear vs poisson. Linear regression returns an understandable intercept e.g., 600, with most of the response data points not too far from this mark. But, the Poisson regression returns an intercept a good magnitude smaller e.g., 16. I believe, the Poisson regression relies on a log transformation at some stage, but I'm (a) not sure if that's true, and (b) how do I interpret the slope and intercept of a Poisson regression when the actual data points are a different order of magnitude?

Second scenario As before, but rather than the number of patients per bin, it's the summed average dose normalised by a score function to account for the impact differences in drug and dose have. For this, am I right in assuming the relationship avg.dose ~ time is linear regression? I'm struggling to untangle the idea of "count data" from, "any data that at some stages requires counting/summing".

  • 1
    Count data will be an integer, your dose will be continuous, I guess. I don't know enough about your problem to understand why you would want counts as a function of time, but the possoin regression is the way to go with count data. If you use linear regression you will almost surely be unable to full fill the normality constraints on the residuals. From a quick search: the poisson reg. applies a log transform and the coefficients are odds ratios (like in logistic regression). – Sapiens Apr 09 '21 at 19:18
  • 1
    See link: https://stats.idre.ucla.edu/stata/output/poisson-regression/ – Sapiens Apr 09 '21 at 19:18
  • 1
    In practice it may not matter if you use linear regression or Poisson regression although with counts (numbers that can not be divided) I think Poisson is more formally correct. – user54285 Apr 09 '21 at 22:43
  • Thank you both. That's really helped. I needed a quick sanity check. So from my own understanding, I'm looking at scenario #1 as count data (the number of times a patient turns up in a month) thus Poisson, and scenario #2 is continuous (a dose value) so linear regression. – Anthony Nash Apr 10 '21 at 02:06
  • An explanation is at https://stats.stackexchange.com/questions/142338/goodness-of-fit-and-which-model-to-choose-linear-regression-or-poisson/142353#142353. Poisson regression takes into account the increase of variance with mean! – kjetil b halvorsen Aug 16 '21 at 03:01

0 Answers0