I have a problem. My dataset contains categorical variables and I need to convert them in numeric, because I have to do a comparison of accuracy between Logistic Regression and Neural Network. I used createDataPartition from caret, but train and set contain factor variables and not numeric. If I do partition in this other way:
split1<- sample(c(rep(0, 0.7 * nrow(data_d)), rep(1, 0.3 * nrow(data_d))))
I don't have a representative observations of 0 and 1 of numeric variables... How can I fix?
DATASET and DATA MANIPULATION:
'data.frame': 7032 obs. of 20 variables:
$ y : chr "No" "No" "Yes" "No" ...
$ gender : chr "Female" "Male" "Male" "Male" ...
$ SeniorCitizen : chr "0" "0" "0" "0" ...
$ Partner : chr "Yes" "No" "No" "No" ...
$ Dependents : chr "No" "No" "No" "No" ...
$ tenure : chr "1" "34" "2" "45" ...
$ PhoneService : chr "No" "Yes" "Yes" "No" ...
$ MultipleLines : chr "No" "No" "No" "No" ...
$ InternetService : chr "DSL" "DSL" "DSL" "DSL" ...
$ OnlineSecurity : chr "No" "Yes" "Yes" "Yes" ...
$ OnlineBackup : chr "Yes" "No" "Yes" "No" ...
$ DeviceProtection: chr "No" "Yes" "No" "Yes" ...
$ TechSupport : chr "No" "No" "No" "Yes" ...
$ StreamingTV : chr "No" "No" "No" "No" ...
$ StreamingMovies : chr "No" "No" "No" "No" ...
$ Contract : chr "Month-to-month" "One year" "Month-to-month" "One year" ...
$ PaperlessBilling: chr "Yes" "No" "Yes" "No" ...
$ PaymentMethod : chr "Electronic check" "Mailed check" "Mailed check" "Bank transfer (automatic)" ...
$ MonthlyCharges : chr "29.85" "56.95" "53.85" "42.3" ...
$ TotalCharges : chr "29.85" "1889.5" "108.15" "1840.75" ...
.....
After one-hot and manipulation, I have this:
'data.frame': 7032 obs. of 27 variables:
$ y.Yes : num 0 0 1 0 1 1 0 0 1 0 ...
$ gender.Male : num 0 1 1 1 0 0 1 0 0 1 ...
$ SeniorCitizen.1 : num 0 0 0 0 0 0 0 0 0 0 ...
$ Partner.Yes : num 1 0 0 0 0 0 0 0 1 0 ...
$ Dependents.Yes : num 0 0 0 0 0 0 1 0 0 1 ...
$ tenure : num -1.2802 0.0643 -1.2394 0.5124 -1.2394 ...
$ PhoneService.Yes : num 0 1 1 0 1 1 1 0 1 1 ...
$ MultipleLines.Yes : num 0 0 0 0 0 1 1 0 1 0 ...
$ InternetService.DSL : num 1 1 1 1 0 0 0 1 0 1 ...
$ InternetService.Fiber.optic : num 0 0 0 0 1 1 1 0 1 0 ...
$ InternetService.No : num 0 0 0 0 0 0 0 0 0 0 ...
$ OnlineSecurity.Yes : num 0 1 1 1 0 0 0 1 0 1 ...
$ OnlineBackup.Yes : num 1 0 1 0 0 0 1 0 0 1 ...
$ DeviceProtection.Yes : num 0 1 0 1 0 1 0 0 1 0 ...
$ TechSupport.Yes : num 0 0 0 1 0 0 0 0 1 0 ...
$ StreamingTV.Yes : num 0 0 0 0 0 1 1 0 1 0 ...
$ StreamingMovies.Yes : num 0 0 0 0 0 1 0 0 1 0 ...
$ Contract.Month.to.month : num 1 0 1 0 1 1 1 1 1 0 ...
$ Contract.One.year : num 0 1 0 1 0 0 0 0 0 1 ...
$ Contract.Two.year : num 0 0 0 0 0 0 0 0 0 0 ...
$ PaperlessBilling.Yes : num 1 0 1 0 1 1 1 0 1 0 ...
$ PaymentMethod.Bank.transfer..automatic.: num 0 0 0 1 0 0 0 0 0 1 ...
$ PaymentMethod.Credit.card..automatic. : num 0 0 0 0 0 0 1 0 0 0 ...
$ PaymentMethod.Electronic.check : num 1 0 0 0 1 1 0 0 1 0 ...
$ PaymentMethod.Mailed.check : num 0 1 1 0 0 0 0 1 0 0 ...
$ MonthlyCharges : num -1.162 -0.261 -0.364 -0.748 0.196 ...
$ TotalCharges : num -0.994 -0.174 -0.96 -0.195 -0.94 ...
...
with dput, I have thousands of number.. I can't report it here.