Скачать книгу

i element-of d 1 Subscript Baseline upper S EndAbsoluteValue EndFraction minus StartFraction StartAbsoluteValue sigma Subscript a Sub Subscript i element-of d 2 Subscript intersection y equals c Sub Subscript i Subscript Baseline upper S EndAbsoluteValue Over StartAbsoluteValue sigma Subscript a Sub Subscript i element-of d 2 Subscript Baseline upper S EndAbsoluteValue EndFraction EndAbsoluteValue right-parenthesis squared"/>

      When the target attribute is binary, the Gini and twoing criteria are equivalent. For multiclass problems, the twoing criterion prefers attributes with evenly divided splits.

      Orthogonality (ort) criterion: This binary criterion is defined as

italic o r t left-parenthesis a Subscript i Baseline comma d 1 comma d 2 upper S right-parenthesis equals 1 minus cosine theta left-parenthesis upper P Subscript y comma 1 Baseline comma upper P Subscript y comma 2 Baseline right-parenthesis

      KolmogorovSmirnov (K–S) criterion: Assuming a binary target attribute, namely, dom(y) = {c1, c2}, the criterion is defined as

      (2.21)italic k s left-parenthesis a Subscript i Baseline comma d 1 comma d 2 comma upper S right-parenthesis equals StartAbsoluteValue StartFraction StartAbsoluteValue sigma Subscript a Sub Subscript i Subscript element-of d 1 intersection y equals c 1 Baseline upper S EndAbsoluteValue Over StartAbsoluteValue sigma Subscript y equals c 1 Baseline upper S EndAbsoluteValue EndFraction minus StartFraction StartAbsoluteValue sigma Subscript a Sub Subscript i Subscript element-of d 1 intersection y equals c 2 Baseline upper S EndAbsoluteValue Over StartAbsoluteValue sigma Subscript y equals c 2 Baseline upper S EndAbsoluteValue EndFraction EndAbsoluteValue

      It was suggest extending this measure to handle target attributes with multiple classes and missing data values. The results indicate that the suggested method outperforms the gain ratio criteria.

      Stopping criteria: The tree splitting (growing phase) continues until a stopping criterion is triggered. The following conditions are common stopping rules: (i) all instances in the training set belong to a single value of y, (ii) the maximum tree depth has been reached, (iii) the number of cases in the terminal node is less than the minimum number of cases for the parent nodes, and (iv) if the node were split, the number of cases in one or more child nodes would be less than the minimum number of cases for child nodes. The best splitting criteria is not greater than a certain threshold.

      Pruning methods: Employing tightly stopping criteria tends to create small and underfitted decision trees. On the other hand, using loosely stopping criteria tends to generate large decision trees that are overfitted to the training set. Pruning methods were developed to solve this dilemma. There are various techniques for pruning decision trees. Most of them perform top‐down or bottom‐up traversal of the nodes. A node is pruned if this operation improves a certain criterion. Next, we describe the most popular techniques.

      Costcomplexity pruning (pr): This proceeds in two stages. In the first stage, a sequence of trees T0, T1, … Tk is built on the training data, where T0 is the original tree before pruning and Tk is the root tree. In the second stage, one of these trees is chosen as the pruned tree, based on its generalization error estimation. The tree Ti + 1 is obtained by replacing one or more of the subtrees in the predecessor tree Ti with suitable leaves. The subtrees that are pruned are those that obtain the lowest increase in apparent error rate per pruned leaf (l):

      (2.22)alpha equals StartFraction epsilon left-parenthesis italic p r left-parenthesis upper T comma t right-parenthesis comma upper S right-parenthesis minus epsilon left-parenthesis upper T comma upper S right-parenthesis Over bar l left-parenthesis upper T right-parenthesis bar minus bar l left-parenthesis italic p r left-parenthesis upper T comma t right-parenthesis right-parenthesis bar EndFraction

      where ε(T, S) indicates the error rate of the tree T over the sample S, and ∣l(T)∣ denotes the number of leaves in T. The parameter pr(T, t) denote the tree obtained by replacing the node t in T with a suitable leaf. In the second phase, the generalization error of each pruned tree T0, T1, … Tk is estimated. The best pruned tree is then selected. If the given dataset is large enough, it is suggested to break it into a training set and a pruning set. The trees are constructed using the training set and evaluated on the pruning set. On the other hand, if the given dataset is not large enough, the cross‐validation methodology is suggested, despite the computational complexity implications.

      Reduced error pruning: While traversing the internal nodes from the bottom to the top, the procedure checks, for each internal node, whether replacing it with the most frequent class reduces the tree’s accuracy. If not, the node is pruned. The procedure continues until any further pruning would decrease the accuracy. It can be shown that this procedure ends with the smallest accurate subtree with respect to a given pruning set.

      (2.23)epsilon prime left-parenthesis t right-parenthesis equals 1 minus max Underscript c Subscript i Baseline element-of dom left-parenthesis y right-parenthesis Endscripts StartFraction bar sigma Subscript y equals c Sub Subscript i Subscript Baseline upper S Subscript t Baseline bar plus l dot p Subscript a p r Baseline left-parenthesis y equals c Subscript i Baseline right-parenthesis Over bar upper S Subscript t Baseline bar plus l EndFraction

      where papr(y = ci) is the a priori probability of y getting the value ci, and denotes the weight given to the a priori probability. A node is pruned if it does not increase the m probability‐error rate.

      Pessimistic pruning: The basic idea is that the error ratio estimated using the training set is not reliable enough. Instead, a more realistic measure known as continuity correction for binomial distribution should be used: ε′(T, S) = ε(T, S) + ∣ l(T) ∣ /2 · ∣ S∣. However, this correction still produces an optimistic error rate. Consequently, it was suggested to prune an internal node t if its error rate is within one standard error from a reference tree, namely

      (2.24)epsilon prime left-parenthesis italic p r left-parenthesis upper T comma t right-parenthesis comma upper S right-parenthesis less-than-or-equal-to epsilon prime left-parenthesis upper T comma upper S right-parenthesis plus StartRoot StartFraction epsilon prime left-parenthesis upper T comma upper S right-parenthesis left-parenthesis 1 minus epsilon prime left-parenthesis upper T comma upper S right-parenthesis right-parenthesis Over bar upper S 
				<p style= Скачать книгу