
I use R, Party package in order to fit prediction model ("classifier") for
"Converted.clicks" as response variable.
The rest of vars are used as explaining variables in the model.
Here is the relevant part of my code:
table(DF$Converted.clicks)
"0" = 31456
"1" = 39
"2" = 6
Formula<-Converted.clicks ~ Day.of.week
+ Device
+ Keyword
+ Quality.score
+ Network..with.search.partners.
+ Ad.group
+ Match.type
ct<-ctree(Formula,data=DF)
#######################################
Issue:
The Converted.clicks variable is highly imbalanced.The majority of the observations has
class "zero". So after ctree function is applied,all the predictions are "zero",there are
no classes "1" and "2" predicted.
My questions are:
Is the classifier Decision Tree model is appropriate model to predict
as.factor(DF$Converted.clicks)?
If so, how can I balance the response var (i.e.to give the chance the two rest classes
"1" and "2" to be predicted?) - if I need to use weights, I need an
example,please.
Is there any other appropriate model to predict # of Converted.clicks? I understand
that Regression Decision Tree is only for continuous response variable, but in my case
I have an integer response var, please advise.
Ctreepackage. You are using thepartypackage. 2- Clicks is a integer so I'm not sure why are you converting it to a categorical variable. 3- I don't think thatCost...converted.clickis independant variable. It is actually a dependant variable of clicks. – David Arenburg Feb 08 '15 at 11:27partypackage.rparthas a simple cost function...as doesC5.0. Would be surprised if there isn't a cost function hidden there somewhere, but if you have got lots of data could always downsample. Some might advocate using probabilities fromctree-not sure if that makes sense (ESL 9.2.5) – charles Feb 09 '15 at 01:46