0

I have a data set of Arsenic (As) in rice grain which is dependent of As in soil and also As in irrigation water. The permissible limit of As in rice grain is 400 ppb however in reality more than that are widely observed. In my data set I have considerable numbers of data which is beyond that limit. Now as the As concentration in rice is dependent on that in soil and irrigation water I have censored (right) the As in grain at 400 ppb and then regress it against soil and water. GrainAs= Intercept + B1IrriAs + B2SoilAs After getting the above coefficients I can say that up to which level of As in irrigation water the grain level will not cross 400 ppb. As it is censored so the coefficients will be estimated by MLE. I was wondering whether this approach is correct or not? Or I should follow linear regression. Data is normally distributed. I have tested the models(Tobit and Linear) with test set and the prediction by Tobit is better than simple linear regression in terms of RMSE, MAE and MAPE.

Jajati
  • 1
  • 1
    The right censoring loses useful information, especially because the censoring limit equals the threshold in which you are interested. Could you explain why you are doing that? Do you have the option not to? – whuber Feb 02 '21 at 19:19
  • Thanks whuber! Yes I can go for linear regression. Actually I was wondering whether this approach is correct or not. – Jajati Feb 02 '21 at 19:22
  • 1
    It will be more powerful than the censored version. Assuming you want accurate predictions of when the rice concentration exceeds the 400 ppb limit, what you want to watch out for is the possibility that extreme values of the As concentration in rice might not be consistent with the fit near 400 ppb. In other words, you want to use diagnostic and goodness-of-fit procedures to make sure the estimates near 400 ppb are as accurate as possible and are not harmed by lack of fit at very low or very high values. – whuber Feb 02 '21 at 19:26
  • Many thanks........Whuber – Jajati Feb 02 '21 at 21:16

0 Answers0