This is the code I have written.
library(rpart)
library(rpart.plot)
library(caret)
data = read.csv("hotel_bookings.csv")
set.seed(1122)
pd <- sample(2, nrow(data), replace = TRUE, prob = c(0.9,0.1))
train <- data[pd==1,]
test <- data[pd==2,]
tree <- rpart(is_canceled ~ hotel + lead_time + stays_in_week_nights + adults + market_segment + distribution_channel + previous_cancellations + previous_bookings_not_canceled + reserved_room_type + assigned_room_type + deposit_type + agent + total_of_special_requests + days_in_waiting_list + reservation_status_date,data)
rpart.plot(tree)
pred <- predict(tree, test, type="class")
t <- table(test$is_canceled,pred)
confusionMatrix(table(pred, test$is_canceled))
printcp(tree)
This is the description of the code that I have to implement A grid search is a hyperparameter optimization method; it chooses a set of optimal hyperparameters for a learning algorithm. For example, if you wanted to tune two parameters, a and b, and a can take the values between 1-3, and b can take the values between 4-6, you will do a grid search like so:
for a = 1 to 3 {
for b = 4 to 6 {
model = train_ML_model(..., a, b)
result = predict(model, ...)
save result
}
}
At the end of the loops, examine the result (the collection data structure to which you append your results in each iteration) and see which value of a and b leads to a model you will consider to be best. You will conduct a grid search across two Random Forest : ntree (number of trees in the forest) and mtry (randomly chosen attributes for each split). Run the grid search programmatically, i.e., using loops, instead of manually building nine models.