Скачать книгу

href="#fb3_img_img_7014d630-6fbf-5b3b-a443-90f829143c39.png" alt="ModifyingAbove normal x With ampersand c period circ semicolon equals upper R left-parenthesis bold-italic chi right-parenthesis"/>, such that ModifyingAbove normal x With ampersand c period circ semicolon almost-equals normal x.

      An interesting property of any dimensionality reduction technique is to consider its stability. In this context, a technique is said to be ε‐stable if for any two input data points, x1 and x2, the following inequality holds [36]: left-parenthesis 1 minus script upper E right-parenthesis double-vertical-bar normal x 1 minus normal x 2 double-vertical-bar Subscript 2 Superscript 2 Baseline less-than-or-equal-to double-vertical-bar bold-italic chi 1 minus bold-italic chi 2 double-vertical-bar Subscript 2 Superscript 2 Baseline less-than-or-equal-to left-parenthesis 1 plus script upper E right-parenthesis double-vertical-bar normal x 1 minus normal x 2 double-vertical-bar Subscript 2 Superscript 2. Intuitively, this equation implies that Euclidean distances in the original input space are relatively conserved in the output feature space.

      Methods based on statistics and information theory: This family of methods reduces the input data according to some statistical or information theory criterion. Somehow, the methods based on information theory can be seen as a generalization of the ones based on statistics in the sense that they can capture nonlinear relationships between variables, can handle interval and categorical variables at the same time, and many of them are invariant to monotonic transformations of the input variables.

Schematic illustration of black circles represent the input data, gray squares represent class representatives. l left-parenthesis normal x Subscript n Baseline bar normal x Subscript chi Baseline comma sigma squared right-parenthesis equals StartFraction 1 Over left-parenthesis 2 pi right-parenthesis Superscript StartFraction upper M Over 2 EndFraction Baseline sigma EndFraction exp left-parenthesis minus one half StartFraction double-vertical-bar normal x Subscript n Baseline minus normal x Subscript chi Baseline double-vertical-bar squared Over sigma squared EndFraction right-parenthesis period

      With our previous definition of uχ(x), we can express it as

l left-parenthesis normal x Subscript n Baseline bar normal x Subscript chi Baseline comma sigma squared right-parenthesis equals StartFraction 1 Over left-parenthesis 2 pi right-parenthesis Superscript StartFraction upper M Over 2 EndFraction Baseline sigma EndFraction exp left-parenthesis minus one half StartFraction sigma-summation Underscript chi equals 1 Overscript upper K Endscripts u Subscript chi Baseline left-parenthesis normal x Subscript n Baseline right-parenthesis double-vertical-bar normal x Subscript n Baseline minus normal x Subscript chi Baseline double-vertical-bar squared Over sigma squared EndFraction right-parenthesis

      The log likelihood of observing the whole dataset xn{n = 1, 2, …, N) after removing all constants is upper L left-parenthesis bold-italic upper X bar normal x Subscript chi Baseline right-parenthesis equals sigma-summation Underscript n equals 1 Overscript upper N Endscripts sigma-summation Underscript chi equals 1 Overscript upper K Endscripts u Subscript chi Baseline left-parenthesis normal x Subscript n Baseline right-parenthesis double-vertical-bar normal x Subscript n Baseline minus normal x Subscript chi Baseline double-vertical-bar squared We thus see that the goal function of vector quantization JVQ produces the maximum likelihood estimates of the underlying xl vectors.

      Under this generative model, the probability density function of the observations is the convolution of a Gaussian function and a set of delta functions located at the xχ vectors, that is, a set of Gaussians located at the xχ vectors. The vector quantization then is an attempt to find the centers of the Gaussians forming the probability density function of the input data. This idea has been further pursued by Mixture Models, which are a generalization of vector quantization in which, instead of looking only for the means of the Gaussians associated with each class, we also allow each class to have a different covariance matrix χ and different a priori probability πχ. The algorithm looks for estimates of all these parameters by Expectation–Maximization, and at the end produces for each input observation xn, the label χ of the Gaussian that has the

Скачать книгу