11.3 Tuning Model’s Parameters

Tuning model parameters is a parameter optimization problem (Dalpiaz 2021). Depending on the models, the adjustable parameters can be different completely. For example, the decision tree has two adjustable parameters: complexity parameter (CP) and tune length (TL). CP tells the algorithm to stop when the measure (generally is accuracy) does not improve by this factor. TL tells how many instances to use for training. SVM models, as another example, also have two adjustable parameters cost and gamma. The cost, is a parameter that controls the trade-off between the classification of training points and a smooth decision boundary. It suggests the model chooses data points as a support vector. If the value of cost is large, then the model choose more data points as a support vector and we get a higher variance and lower bias, which may lead to the problem of overfitting; If the value of cost is small, then the model will choose fewer data points as a support vector and get a lower variance and high bias. Gamma defines how far the influence of a single training example reaches. If the value of Gamma is high, then the decision boundary will depend on the points close to the decision boundary and the nearer points carry more weights than far away points due to which the decision boundary becomes more wiggly. If the value of Gamma is low, then the far-away points carry more weights than the nearer points and thus the decision boundary becomes more like a straight line.

We will continue use RF model as an example to demonstrate the parameter tuning process. RF has many parameters that can be adjusted but the two main tuning parameters are mtry and ntree.

mtry: Number of variables randomly selected as testing conditions at each split of decision trees. default value is sqr(col). Increasing mtry generally improves the performance of the model as each node has a higher number of options to be considered. However, this is not necessarily true as this decreases the diversity of individual trees. At the same time, it will decrease the speed. Hence, it needs to strike the right balance.
ntree: Number of trees to grow. the default value is 500. A higher number of trees give you better performance but makes your code slower. You should choose as high a value as your processor can handle because this makes your predictions stronger and more stable.

In the rest of the section, we demonstrate the process of using CV to fine-tune RF model’s parameters mtry and ntree. In general, different optimization strategies can be used to find a model’s optimal parameters. The two most commonly used methods for RF are Random search and Grid search.

Random Search. Define a search space as a bounded domain of parameter values and randomly sample points in that domain.

Grid Search. Define a search space as a grid of parameter values and evaluate every position in the grid.

Let us try them one at a time.

Random Search

Random search provided by the package caret with the method “rf” (Random forest) in function train can only tune parameter mtry².

Let us continue using what we have found from the previous sections, that are：

model rf.8 with 9 predictors.
CV with 3-folds and repeat 10 times.

Let us also fix “ntree = 500” and “tuneLength = 15”, and use random search to find mtry.

#library(caret)
#library(doSNOW)
# Random Search
set.seed(2222)
# #use teh best sampling results that is K=3 ant T=10
# cv.3.folds <- createMultiFolds(rf.label, k = 3, times = 10)
# 
# # Set up caret's trainControl object.
# ctrl.1 <- trainControl(method = "repeatedcv", number = 3, repeats = 10, index = cv.3.folds, search="random")
# 
# # set up cluster for parallel computing
# cl <- makeCluster(6, type = "SOCK")
# registerDoSNOW(cl)
# 
# # Set seed for reproducibility and train
# set.seed(34324)
# 
# #use rf.train.8 with 9 predictors 
# 
# #RF_Random <- train(x = rf.train.8, y = rf.label, method = "rf", tuneLength = 15, ntree = 500, trControl = ctrl.1)
# #save(RF_Random, file = "./data/RF_Random_search.rda")
# 
# #Shutdown cluster
# stopCluster(cl)

# Check out results
load("./data/RF_Random_search.rda")
print(RF_Random)

## Random Forest 
## 
## 891 samples
##   9 predictor
##   2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (3 fold, repeated 10 times) 
## Summary of sample sizes: 594, 594, 594, 594, 594, 594, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##   2     0.8435466  0.6643066
##   3     0.8453423  0.6690529
##   4     0.8437710  0.6665398
##   5     0.8419753  0.6630091
##   6     0.8397306  0.6586318
##   7     0.8383838  0.6556425
##   8     0.8379349  0.6544327
##   9     0.8353535  0.6495571
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 3.

Figure 11.2: The best mtry numbers on model’s accuracy produced by Random search.

We can see that the random search for mtry has found the best value is 3. When the model uses the parameter mtry = 3 it can have an accuracy of 84.53%.

Grid Search

Grid search is generally searching for more than one parameter. Each axis of the grid is a parameter, and points in the grid are specific combinations of parameters. Because caret train can only tune one parameter, the grid search is now a linear search through a vector of candidate values.

# ctrl.2 <- trainControl(method="repeatedcv", number=3, repeats=10, index = cv.3.folds, search="grid")
# 
# set.seed(3333)
# # set Grid search with a vector from 1 to 15.
# 
# tunegrid <- expand.grid(.mtry=c(1:15))
# 
# # set up cluster for parallel computing
# cl <- makeCluster(6, type = "SOCK")
# registerDoSNOW(cl)
# 
# 
# #RF_grid_search <- train(y = rf.label, x = rf.train.8,  method="rf", metric="Accuracy", trControl = ctrl.2, tuneGrid = tunegrid, tuneLength = 15, ntree = 500)
# 
# 
# #Shutdown cluster
# stopCluster(cl)
# #save(RF_grid_search, file = "./data/RF_grid_search.rda")

load("./data/RF_grid_search.rda")
print(RF_grid_search)

## Random Forest 
## 
## 891 samples
##   9 predictor
##   2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (3 fold, repeated 10 times) 
## Summary of sample sizes: 594, 594, 594, 594, 594, 594, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##    1    0.8232323  0.6140400
##    2    0.8439955  0.6652153
##    3    0.8452301  0.6691079
##    4    0.8443322  0.6675864
##    5    0.8428732  0.6645467
##    6    0.8398429  0.6584647
##    7    0.8379349  0.6548634
##    8    0.8390572  0.6571467
##    9    0.8370370  0.6529631
##   10    0.8365881  0.6519263
##   11    0.8359147  0.6504591
##   12    0.8370370  0.6525838
##   13    0.8365881  0.6520535
##   14    0.8356902  0.6502470
##   15    0.8354658  0.6494413
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 3.

Figure 11.3: The best mtry numbers on model’s accuracy produced by Grid search.

The Grid search method identified the best parameter for mtry is also 3. When mtry = 3, the model’s estimated accuracy reaches 84.52%.

We can see that both search methods have the same mtry suggestions.

Manual Search

Let us consider another parameter ntree in the RF model. Since our train method from caret cannot tune ntree, we have to write our own function to search the best value of parameter ntree. This method is also called Manual Search. The idea is to write a loop repeating the same model’s fitting process a certain number of times. Each time Within a loop, a different value of the parameter to be tuned is used, and the model’s results are accumulated, Finally, a manual comparison is made to figure out what is the best value of the tuned parameter.

To tune the RF model’s parameter ntree, we set mtry=3 from the above section and use a list of 4 values (100, 500, 1000, 1500)³ and find which one produces the best result.

# Manual Search we use control 1 random search
model_list <- list()

tunegrid <- expand.grid(.mtry = 3)
control <- trainControl(method="repeatedcv", number=3, repeats=10, search="grid")

# # the following code have been commented out just for produce the markdown file. so it will not wait for ran a long time
# # set up cluster for parallel computing
# cl <- makeCluster(6, type = "SOCK")
# registerDoSNOW(cl)
# 
# 
# #loop through different settings
# 
# for (n_tree in c(100, 500, 1000, 1500)) {
#   
#   set.seed(3333)
#   fit <- train(y = rf.label, x = rf.train.8,  method="rf", metric="Accuracy",  tuneGrid=tunegrid, trControl= control, ntree=n_tree)
# 
#   key <- toString(n_tree)
#   model_list[[key]] <- fit
# }
# 
# #Shutdown cluster
# stopCluster(cl)
# save(model_list, file = "./data/RF_manual_search.rda")
# # the above code comneted out for output book file

load("./data/RF_manual_search.rda")
# compare results
results <- resamples(model_list)
summary(results)

## 
## Call:
## summary.resamples(object = results)
## 
## Models: 100, 500, 1000, 1500 
## Number of resamples: 30 
## 
## Accuracy 
##           Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## 100  0.7979798 0.8249158 0.8367003 0.8383838 0.8535354 0.8855219    0
## 500  0.8013468 0.8324916 0.8451178 0.8418631 0.8518519 0.8821549    0
## 1000 0.8013468 0.8282828 0.8434343 0.8415264 0.8518519 0.8787879    0
## 1500 0.8013468 0.8324916 0.8451178 0.8430976 0.8518519 0.8855219    0
## 
## Kappa 
##           Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## 100  0.5686275 0.6277878 0.6498418 0.6540654 0.6778695 0.7539114    0
## 500  0.5751073 0.6439474 0.6681725 0.6614719 0.6854823 0.7462468    0
## 1000 0.5751073 0.6327199 0.6676526 0.6608493 0.6823330 0.7394356    0
## 1500 0.5751073 0.6409731 0.6714760 0.6640467 0.6857492 0.7539114    0

#plot 
dotplot(results)

Figure 11.4: The impact of the mtry numbers on model’s accuracy.

We can see with the default mtry =3 setting, the best ntree value is 1500. The model can reach 84.31% accuracy.

References

Dalpiaz, David. 2021. Tune Machine Learning Algorithms in R. otexts. https://machinelearningmastery.com/tune-machine-learning-algorithms-in-r/.

Not all machine learning algorithms are available in caret for tuning. The choice of parameters was decided by the developers of the package. Only those parameters that have a large effect are available for tuning in caret. For the RF method, only mtry parameter is available in caret for tuning. The reason is its effect on the final accuracy and that it must be found empirically for a dataset↩︎
These are generally used ntree values. For demonstration purposes we only choose these values, you can try more different values.↩︎