0

This is the code I have written.

library(rpart)
library(rpart.plot)
library(caret)

data = read.csv("hotel_bookings.csv")
set.seed(1122)
pd <- sample(2, nrow(data), replace = TRUE, prob = c(0.9,0.1))
train <- data[pd==1,]
test <- data[pd==2,]
tree <- rpart(is_canceled ~ hotel + lead_time + stays_in_week_nights + adults + market_segment + distribution_channel + previous_cancellations + previous_bookings_not_canceled + reserved_room_type + assigned_room_type + deposit_type + agent + total_of_special_requests + days_in_waiting_list + reservation_status_date,data)
rpart.plot(tree)
pred <- predict(tree, test, type="class")
t <- table(test$is_canceled,pred) 
confusionMatrix(table(pred, test$is_canceled))
printcp(tree)

This is the description of the code that I have to implement A grid search is a hyperparameter optimization method; it chooses a set of optimal hyperparameters for a learning algorithm. For example, if you wanted to tune two parameters, a and b, and a can take the values between 1-3, and b can take the values between 4-6, you will do a grid search like so:

for a = 1 to 3 {
 for b = 4 to 6 {
 model = train_ML_model(..., a, b)
 result = predict(model, ...)
 save result
 }
}

At the end of the loops, examine the result (the collection data structure to which you append your results in each iteration) and see which value of a and b leads to a model you will consider to be best. You will conduct a grid search across two Random Forest : ntree (number of trees in the forest) and mtry (randomly chosen attributes for each split). Run the grid search programmatically, i.e., using loops, instead of manually building nine models.

Kapil Gund
  • 21
  • 2

0 Answers0