I have a data set, consisting of positive and negative patients (virus infection). If the patient is negative, it has 0 as outcome (y), if it is positive it has a positive value, up to 100. The input (x) values are numeric too and I want to predict the y values just out of the x values. x consits of more than one variable. In the group of the negative patients at least one x variable contains a lot of zeros too.
Is there a possibility to do some methods like PCR (principle component regression), PLS, Lasso, Ridge, glmnet (all these methods work fine if I just analyze the positive group) or don't they work when the there are so many zeros in the data? Can I transform the data (log10) and add a one to avoid the zeros and therefore having better results? Must I do a classification in two groups first and then a regression for the positive group or is there a one-step possibility too?