Скачать книгу

Underscript i equals 1 Overscript l Endscripts left-parenthesis alpha Subscript i Baseline minus alpha Subscript i Superscript asterisk Baseline right-parenthesis left pointing angle x Subscript i Baseline comma x right pointing angle plus b period"/>

      This is the so‐called support vector expansion; that is, w can be completely described as a linear combination of the training patterns xi . In a sense, the complexity of a function’s representation by SVs is independent of the dimensionality of the input space X, and depends only on the number of SVs. Moreover, note that the complete algorithm can be described in terms of dot products between the data. Even when evaluating f(x), we need not compute w explicitly. These observations will come in handy for the formulation of a nonlinear extension.

      Computing b : Parameter b can be computed by exploiting the so‐called Karush−Kuhn−Tucker (KKT) conditions stating that at the point of the solution the product between dual variables and constraints has to vanish, giving αi(ε + ξiyi + 〈w, xi〉 + b) = 0, alpha Subscript i Superscript asterisk Baseline left-parenthesis epsilon plus xi Subscript i Superscript asterisk Baseline plus y Subscript i Baseline minus left pointing angle w comma x Subscript i Baseline right pointing angle minus b right-parenthesis equals 0, (Cαi)ξi = 0 and left-parenthesis upper C minus alpha Subscript i Superscript asterisk Baseline right-parenthesis xi Subscript i Superscript asterisk Baseline equals 0 period This allows us to draw several useful conclusions:

      1 Only samples (xi, yi) with corresponding lie outside the ε ‐insensitive tube.

      2  ; that is, there can never be a set of dual variables αi , that are both simultaneously nonzero. This allows us to conclude that(4.51) (4.52)

      In conjunction with an analogous analysis on alpha Subscript i Superscript asterisk, we have for b

      (4.53)StartLayout 1st Row max left-brace negative epsilon plus y Subscript i Baseline minus left pointing angle w comma x Subscript i Baseline right pointing angle bar alpha Subscript i Baseline less-than upper C or alpha Subscript i Superscript asterisk Baseline greater-than 0 right-brace less-than-or-equal-to b 2nd Row less-than-or-equal-to min left-brace negative epsilon plus y Subscript i Baseline minus left pointing angle w comma x Subscript i Baseline right pointing angle bar alpha Subscript i Baseline greater-than 0 or alpha Subscript i Superscript asterisk Baseline less-than upper C right-brace EndLayout

      Kernels: We are interested in making the SV algorithm nonlinear. This, for instance, could be achieved by simply preprocessing the training patterns xi by a map Φ: XF into some feature space F, as already described in Chapter 2, and then applying the standard SV regression algorithm. Let us have a brief look at the example given in Figure 2.8 of Chapter 2. We had (quadratic features in 2) with the map Φ: 23 with Φ left-parenthesis x 1 comma x 2 right-parenthesis equals left-parenthesis x 1 squared comma StartRoot 2 EndRoot x 1 x 2 comma x 2 squared right-parenthesis. It is understood that the subscripts in this case refer to the components of x2. Training a linear SV machine on the preprocessed features would yield a quadratic function as indicated in Figure 2.8. Although this approach seems reasonable in the particular example above, it can easily become computationally infeasible for both polynomial features of higher order and higher dimensionality.

      Implicit mapping via kernels: Clearly this approach is not feasible, and we have to find a computationally cheaper way. The key observation [96] is that for the feature map of the above example we have

      (4.54)left pointing angle left-parenthesis x 1 squared comma StartRoot 2 EndRoot x 1 x 2 comma x 2 squared right-parenthesis comma left-parenthesis x 1 Superscript prime 2 Baseline comma StartRoot 2 EndRoot x prime 1 x prime 2 comma x 2 Superscript prime 2 Baseline right-parenthesis right pointing angle equals left pointing angle x comma x Superscript prime Baseline right pointing angle squared

      As noted in the previous section, the SV algorithm only depends on the dot products between patterns xi . Hence, it suffices to know k(x, x′) ≔ 〈Φ(x), Φ(x′)〉 rather than Φ explicitly, which allows us to restate the SV optimization problem:

      The difference to the linear case is that w is no longer given explicitly. Also, note that in the nonlinear setting, the optimization problem corresponds to finding the flattest function in the feature space, not in the input space.

      4.4.3 Combination of Fuzzy Models and SVR

      Given observation data from an unknown system, data‐driven methods aim to construct a decision function f(x) that can serve as an approximation of the system. As seen from the previous sections, both fuzzy models and SVR are employed to describe the decision function. Fuzzy models characterize the system by a collection of interpretable if‐then rules, and a general fuzzy model that consists of a set of rules with the following structure will be used here:

      (4.57)StartLayout 1st Row upper R Subscript i Baseline colon If x 1 is upper A Subscript i Baseline 1 Baseline and x 2 is upper A Subscript i Baseline 2 Baseline and ellipsis x Subscript d Baseline is upper A Subscript italic i d Baseline comma 2nd Row then y Subscript i Baseline equals g Subscript i Baseline left-parenthesis x comma beta Subscript i Baseline right-parenthesis for i equals 1 comma 2 ellipsis comma c period EndLayout

Скачать книгу