11.3 Tuning Model’s Parameters
Tuning model parameters is a parameter optimization problem (Dalpiaz 2021). Depending on the models, the adjustable parameters can be different completely. For example, the decision tree has two adjustable parameters: complexity parameter (CP)
and tune length (TL)
. CP
tells the algorithm to stop when the measure (generally is accuracy) does not improve by this factor. TL
tells how many instances to use for training. SVM models, as another example, also have two adjustable parameters cost
and gamma
. The cost
, is a parameter that controls the trade-off between the classification of training points and a smooth decision boundary. It suggests the model chooses data points as a support vector. If the value of cost
is large, then the model choose more data points as a support vector and we get a higher variance and lower bias, which may lead to the problem of overfitting; If the value of cost
is small, then the model will choose fewer data points as a support vector and get a lower variance and high bias. Gamma
defines how far the influence of a single training example reaches. If the value of Gamma is high, then the decision boundary will depend on the points close to the decision boundary and the nearer points carry more weights than far away points due to which the decision boundary becomes more wiggly. If the value of Gamma
is low, then the far-away points carry more weights than the nearer points and thus the decision boundary becomes more like a straight line.
We will continue use RF model as an example to demonstrate the parameter tuning process. RF has many parameters that can be adjusted but the two main tuning parameters are mtry
and ntree
.
mtry
: Number of variables randomly selected as testing conditions at each split of decision trees. default value issqr(col)
. Increasingmtry
generally improves the performance of the model as each node has a higher number of options to be considered. However, this is not necessarily true as this decreases the diversity of individual trees. At the same time, it will decrease the speed. Hence, it needs to strike the right balance.ntree
: Number of trees to grow. the default value is 500. A higher number of trees give you better performance but makes your code slower. You should choose as high a value as your processor can handle because this makes your predictions stronger and more stable.
In the rest of the section, we demonstrate the process of using CV to fine-tune RF model’s parameters mtry
and ntree
. In general, different optimization strategies can be used to find a model’s optimal parameters. The two most commonly used methods for RF are Random search and Grid search.
Random Search. Define a search space as a bounded domain of parameter values and randomly sample points in that domain.
Grid Search. Define a search space as a grid of parameter values and evaluate every position in the grid.
Let us try them one at a time.
Random Search
Random search provided by the package caret
with the method “rf
” (Random forest) in function train
can only tune parameter mtry
2.
Let us continue using what we have found from the previous sections, that are:
- model
rf.8
with 9 predictors. - CV with
3-folds
andrepeat 10 times
.
Let us also fix “ntree = 500
” and “tuneLength = 15
”, and use random
search to find mtry
.
#library(caret)
#library(doSNOW)
# Random Search
set.seed(2222)
# #use teh best sampling results that is K=3 ant T=10
# cv.3.folds <- createMultiFolds(rf.label, k = 3, times = 10)
#
# # Set up caret's trainControl object.
# ctrl.1 <- trainControl(method = "repeatedcv", number = 3, repeats = 10, index = cv.3.folds, search="random")
#
# # set up cluster for parallel computing
# cl <- makeCluster(6, type = "SOCK")
# registerDoSNOW(cl)
#
# # Set seed for reproducibility and train
# set.seed(34324)
#
# #use rf.train.8 with 9 predictors
#
# #RF_Random <- train(x = rf.train.8, y = rf.label, method = "rf", tuneLength = 15, ntree = 500, trControl = ctrl.1)
# #save(RF_Random, file = "./data/RF_Random_search.rda")
#
# #Shutdown cluster
# stopCluster(cl)
# Check out results
load("./data/RF_Random_search.rda")
print(RF_Random)
## Random Forest
##
## 891 samples
## 9 predictor
## 2 classes: '0', '1'
##
## No pre-processing
## Resampling: Cross-Validated (3 fold, repeated 10 times)
## Summary of sample sizes: 594, 594, 594, 594, 594, 594, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.8435466 0.6643066
## 3 0.8453423 0.6690529
## 4 0.8437710 0.6665398
## 5 0.8419753 0.6630091
## 6 0.8397306 0.6586318
## 7 0.8383838 0.6556425
## 8 0.8379349 0.6544327
## 9 0.8353535 0.6495571
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 3.
We can see that the random search for mtry
has found the best value is 3. When the model uses the parameter mtry = 3
it can have an accuracy of 84.53%.
Grid Search
Grid search is generally searching for more than one parameter. Each axis of the grid is a parameter, and points in the grid are specific combinations of parameters. Because caret train can only tune one parameter, the grid search is now a linear search through a vector of candidate values.
# ctrl.2 <- trainControl(method="repeatedcv", number=3, repeats=10, index = cv.3.folds, search="grid")
#
# set.seed(3333)
# # set Grid search with a vector from 1 to 15.
#
# tunegrid <- expand.grid(.mtry=c(1:15))
#
# # set up cluster for parallel computing
# cl <- makeCluster(6, type = "SOCK")
# registerDoSNOW(cl)
#
#
# #RF_grid_search <- train(y = rf.label, x = rf.train.8, method="rf", metric="Accuracy", trControl = ctrl.2, tuneGrid = tunegrid, tuneLength = 15, ntree = 500)
#
#
# #Shutdown cluster
# stopCluster(cl)
# #save(RF_grid_search, file = "./data/RF_grid_search.rda")
load("./data/RF_grid_search.rda")
print(RF_grid_search)
## Random Forest
##
## 891 samples
## 9 predictor
## 2 classes: '0', '1'
##
## No pre-processing
## Resampling: Cross-Validated (3 fold, repeated 10 times)
## Summary of sample sizes: 594, 594, 594, 594, 594, 594, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 1 0.8232323 0.6140400
## 2 0.8439955 0.6652153
## 3 0.8452301 0.6691079
## 4 0.8443322 0.6675864
## 5 0.8428732 0.6645467
## 6 0.8398429 0.6584647
## 7 0.8379349 0.6548634
## 8 0.8390572 0.6571467
## 9 0.8370370 0.6529631
## 10 0.8365881 0.6519263
## 11 0.8359147 0.6504591
## 12 0.8370370 0.6525838
## 13 0.8365881 0.6520535
## 14 0.8356902 0.6502470
## 15 0.8354658 0.6494413
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 3.
The Grid search method identified the best parameter for mtry
is also 3. When mtry = 3
, the model’s estimated accuracy reaches 84.52%.
We can see that both search methods have the same mtry
suggestions.
Manual Search
Let us consider another parameter ntree
in the RF model
. Since our train
method from caret
cannot tune ntree
, we have to write our own function to search the best value of parameter ntree
. This method is also called Manual Search. The idea is to write a loop repeating the same model’s fitting process a certain number of times. Each time Within a loop, a different value of the parameter to be tuned is used, and the model’s results are accumulated, Finally, a manual comparison is made to figure out what is the best value of the tuned parameter.
To tune the RF model’s parameter ntree
, we set mtry=3
from the above section and use a list of 4 values (100, 500, 1000, 1500)3 and find which one produces the best result.
# Manual Search we use control 1 random search
model_list <- list()
tunegrid <- expand.grid(.mtry = 3)
control <- trainControl(method="repeatedcv", number=3, repeats=10, search="grid")
# # the following code have been commented out just for produce the markdown file. so it will not wait for ran a long time
# # set up cluster for parallel computing
# cl <- makeCluster(6, type = "SOCK")
# registerDoSNOW(cl)
#
#
# #loop through different settings
#
# for (n_tree in c(100, 500, 1000, 1500)) {
#
# set.seed(3333)
# fit <- train(y = rf.label, x = rf.train.8, method="rf", metric="Accuracy", tuneGrid=tunegrid, trControl= control, ntree=n_tree)
#
# key <- toString(n_tree)
# model_list[[key]] <- fit
# }
#
# #Shutdown cluster
# stopCluster(cl)
# save(model_list, file = "./data/RF_manual_search.rda")
# # the above code comneted out for output book file
load("./data/RF_manual_search.rda")
# compare results
results <- resamples(model_list)
summary(results)
##
## Call:
## summary.resamples(object = results)
##
## Models: 100, 500, 1000, 1500
## Number of resamples: 30
##
## Accuracy
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 100 0.7979798 0.8249158 0.8367003 0.8383838 0.8535354 0.8855219 0
## 500 0.8013468 0.8324916 0.8451178 0.8418631 0.8518519 0.8821549 0
## 1000 0.8013468 0.8282828 0.8434343 0.8415264 0.8518519 0.8787879 0
## 1500 0.8013468 0.8324916 0.8451178 0.8430976 0.8518519 0.8855219 0
##
## Kappa
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 100 0.5686275 0.6277878 0.6498418 0.6540654 0.6778695 0.7539114 0
## 500 0.5751073 0.6439474 0.6681725 0.6614719 0.6854823 0.7462468 0
## 1000 0.5751073 0.6327199 0.6676526 0.6608493 0.6823330 0.7394356 0
## 1500 0.5751073 0.6409731 0.6714760 0.6640467 0.6857492 0.7539114 0
We can see with the default mtry =3 setting, the best ntree value is 1500. The model can reach 84.31% accuracy.
References
Dalpiaz, David. 2021. Tune Machine Learning Algorithms in R. otexts. https://machinelearningmastery.com/tune-machine-learning-algorithms-in-r/.
Not all machine learning algorithms are available in caret for tuning. The choice of parameters was decided by the developers of the package. Only those parameters that have a large effect are available for tuning in caret. For the
RF
method, onlymtry
parameter is available in caret for tuning. The reason is its effect on the final accuracy and that it must be found empirically for a dataset↩︎These are generally used
ntree
values. For demonstration purposes we only choose these values, you can try more different values.↩︎