Random forest machine learning

8/4/2023

As described in Breiman (2002), some of the key features of Random Trees are: "These parameters have to be tuned in order to find the right trade-off, they need to be such that they are neither too strict nor too loose for the tree to be neither too shallow nor too deep" (Louppe, 2014). In the latter case, the model is said to overfit the data, i.e., to be too flexible and to capture isolated structures (i.e., noise) that are specific to the learning set" (Louppe, 2014). In the former case, the model is indeed said to underfit the data, i.e., to be not flexible enough the capture the structure between X and Y. When optimizing a Random Trees model, “special care must be taken so that the resulting model is neither too simple nor too complex. experiments show however that when noise is important, Bagging usually yield better results" (Louppe, 2014). Randomization can also occur by randomizing "the choice of the best split at a given node. In this way, each tree in the forest is trained on slightly different data, which introduces differences between the trees" (Denil et al., 2014). A common technique for introducing randomness in a Tree "is to build each tree using a bootstrapped or sub-sampled data set. These are (1) the method for splitting the leafs, (2) the type of predictor to use in each leaf, and (3) the method for injecting randomness into the trees" (Denil et al., 2014). "There are three main choices to be made when constructing a random tree. In order to grow these ensembles, often random vectors are generated that govern the growth of each tree in the ensemble" (Breiman, 2001). In other words, "signiﬁcant improvements in classiﬁcation accuracy have resulted from growing an ensemble of trees and letting them vote for the most popular class. The fundamental principle of ensemble methods based on randomization “is to introduce random perturbations into the learning procedure in order to produce several different models from a single learning set L and then to combine the predictions of those models to form the prediction of the ensemble” (Louppe, 2014). The Random Forest algorithm, called thereafter Random Trees for trademark reasons, was originally conceived by Breiman (2001) “as a method of combining several CART style decision trees using bagging Since its introduction by Breiman (2001) the random forests framework has been extremely successful as a general-purpose classification and regression method" (Denil et al., 2014). The term ensemble implies a method which makes predictions by averaging over the predictions of several independent base models. Random Trees (RT) belong to a class of machine learning algorithms which does ensemble classification. Random Forest Trees (RFT) is a machine learning algorithm based on decision trees. One way to overcome this limitation is to produce many variants of a single decision tree by selecting every time a different subset of the same training set in the context of randomization-based ensemble methods (Breiman, 2001). A single decision tree is easy to conceptualize but will typically suffer from high variance, which makes them not competitive in terms of accuracy.

0 Comments

Random forest machine learning

Leave a Reply.

Author

Archives

Categories