Скачать книгу

2.12 we have eight points, and we want to apply k‐means to create clusters for these points. Here is how we can do it:

      1 Choose the number of clusters k.

      2 Select k random points from the data as centroids.

      3 Assign all the points to the closest cluster centroid.

      4 Recompute the centroids of newly formed clusters.

      5 Repeat steps 3 and 4.

      There are essentially three stopping criteria that can be adopted to stop the k‐means algorithm:

      1 Centroids of newly formed clusters do not change.

      2 Points remain in the same cluster.

      3 The maximum number of iterations is reached.

      2.1.9 Dimensionality Reduction

Schematic illustration of k equals 3 means clustering on 2D dataset.

      Source: Based on PulkitS01 [3], K‐Means implementation, GitHub, Inc. Available at [53],https://gist.github.com/PulkitS01/97c9920b1c913ba5e7e101d0e9030b0e.

Schematic illustration of concept of data projection.

      Dimensionality reduction [32–34] is a method of converting high‐dimensional variables into lower‐dimensional variables without changing the specific information of the variables. This is often used as a preprocessing step in classification methods or other tasks.

      Design Example 2.2

      Principal component analysis (PCA)

      The projected position of a point on these lines gives the coordinates in k‐dimensional reduced space.

      Steps in PCA: (i) Compute covariance matrix of the dataset S, (ii) calculate the eigenvalues and eigenvectors of . The eigenvector with the largest eigenvalue λ1 is the first PC. The eigenvector with the kth largest eigenvalue λk is the kth PC. λk/i λi = proportion of variance captured by the kth PC.

Schematic illustration of successive data projections.

      The full set of PCs comprises a new orthogonal basis for the feature space, whose axes are aligned with the maximum variances of the original data. The projection of original data onto the first k PCs gives a reduced dimensionality representation of the data. Transforming reduced dimensionality projection back into the original space gives a reduced dimensionality reconstruction of the original data. Reconstruction will have some error, but it can be small and often is acceptable given the other benefits of dimensionality reduction. Choosing the dimension k is based on i = 1,k λi/i = 1,S λi > β[%], where β is a predetermined value.

      2.2.1 Logistic Regression

      (2.7)l equals product Underscript normal j equals 1 Overscript normal upper N Endscripts product Underscript g equals 1 Overscript upper G Endscripts normal pi Subscript italic g j Baseline Superscript y Super Subscript italic g j

      where normal pi Subscript italic g j Baseline equals Prob left-parenthesis upper Y equals g bar normal upper X Subscript j Baseline right-parenthesis equals e Superscript normal upper X Super Subscript j Superscript normal upper B Super Subscript g Superscript Baseline slash left-parenthesis e Superscript normal upper X Super Subscript j Superscript normal upper B 1 Baseline plus e Superscript normal upper X Super Subscript j Superscript normal upper B 2 Baseline plus midline-horizontal-ellipsis plus e Superscript normal upper X Super Subscript j Superscript normal upper B Super Subscript upper G Superscript Baseline right-parenthesis equals e Superscript normal upper X Super Subscript j Superscript normal upper B Super Subscript g Superscript Baseline slash left-parenthesis sigma-summation Underscript s equals 1 Overscript upper G Endscripts e Superscript normal upper X Super Subscript j Superscript normal upper B Super Subscript s Superscript Baseline right-parenthesis and ygj is one if the jth observation is in outcome g and zero otherwise. Using the fact that sigma-summation Underscript g equals 1 Overscript upper G Endscripts y Subscript italic g j Baseline equals 1, the log likelihood, L, becomes

      (2.8)StartLayout 1st Row upper L equals ln left-parenthesis l right-parenthesis equals sigma-summation Underscript j equals 1 Overscript upper N Endscripts sigma-summation Underscript g equals 1 Overscript upper G Endscripts y Subscript italic g j Baseline ln left-parenthesis normal pi Subscript italic g j Baseline right-parenthesis equals sigma-summation Underscript j equals 1 Overscript normal upper 
				<p style= Скачать книгу