Скачать книгу

table attributes columnalign left end attributes row cell bold z subscript bold 1 bold equals bold c subscript bold 11 bold x subscript bold 1 bold plus bold c subscript bold 12 bold x subscript bold 2 bold plus bold midline horizontal ellipsis bold plus bold c subscript bold 1 bold p end subscript bold x subscript bold p end cell row cell bold z subscript bold 2 bold equals bold c subscript bold 21 bold x subscript bold 1 bold plus bold c subscript bold 22 bold x subscript bold 2 bold plus bold midline horizontal ellipsis bold plus bold c subscript bold 2 bold p end subscript bold x subscript bold p end cell row bold vertical ellipsis row cell bold z subscript bold q bold equals bold c subscript bold q bold 1 end subscript bold x subscript bold 1 bold plus bold c subscript bold q bold 2 end subscript bold x subscript bold 2 bold plus bold midline horizontal ellipsis bold plus bold c subscript bold qp bold x subscript bold p end cell end table bold z equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row cell straight z subscript 1 end cell row cell straight z subscript 2 end cell row straight vertical ellipsis row cell straight z subscript straight k end cell end table close parentheses equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row cell straight c subscript 11 end cell cell straight c subscript 12 end cell horizontal ellipsis cell straight c subscript 1 straight p end subscript end cell row cell straight c subscript 21 end cell cell straight c subscript 22 end cell horizontal ellipsis cell straight c subscript 2 straight p end subscript end cell row straight vertical ellipsis straight vertical ellipsis blank straight vertical ellipsis row cell straight c subscript straight q 1 end subscript end cell cell straight c subscript straight q 2 end subscript end cell horizontal ellipsis cell straight c subscript qp end cell end table close parentheses open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row cell straight x subscript 1 end cell row cell straight x subscript 2 end cell row straight vertical ellipsis row cell straight x subscript straight p end cell end table close parentheses equals bold Cx.

      The sample mean vector and sample covariance matrix of

bold z subscript straight i equals bold Cx subscript straight i comma space of 1em straight i equals 1 comma 2 comma horizontal ellipsis comma straight n

      are given by

      Obviously, (2.9) and (2.10) are generalizations of (2.7) and (2.8), respectively.

      Example 2.5 For the auto.spec data set, using the mean() function of R the sample means of the variables city.mpg and highway.mpg can be found as 25.22 and 30.75, respectively. If we are interested in the overall MPG of a car, denoted by z, as the following weighted average of x1 = city.mpg and x2 = highway.mpg:

z equals 0.4 x subscript 1 plus 0.6 x subscript 2 equals bold c to the power of T open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row cell x subscript 1 end cell row cell x subscript 2 end cell end table close parentheses comma

      where c = (0.4 0.6)T. Then by (2.7) the sample mean of the overall MPG in the data set is

z with bar on top equals bold c to the power of bold T bold x with bold bar on top equals open parentheses 0.4 space 0.6 close parentheses open parentheses table row cell 25.22 end cell row cell 30.75 end cell end table close parentheses equals 28.54.

      To find the sample variance of z, first we obtain the sample covariance matrix for city.mpg and highway.mpg using the cov() function of R:

      cov(auto.spec.df[, c("city.mpg", "highway.mpg")]) cor(auto.spec.df[, c("city.mpg", "highway.mpg")])

      The function cor() calculates the sample correlation matrix. Based on the output from the above R codes, we have

bold S equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row cell 42.8 end cell cell 43.76 end cell row cell 43.76 end cell cell 47.42 end cell end table close parentheses comma space of 1em bold R equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row 1 cell 0.971 end cell row cell 0.971 end cell 1 end table close parentheses. straight s subscript straight z superscript 2 equals bold c to the power of straight T bold Sc equals left parenthesis 0.4 text end text 0.6 right parenthesis left parenthesis table row cell 42.8 end cell cell 43.76 end cell row cell 43.76 end cell cell 47.42 end cell end table right parenthesis left parenthesis 0.4 text end text 0.6 right parenthesis to the power of straight T equals 44.9.

      Bibliographic Notes

      Data visualization methods are discussed in books in the data mining area, for example, Shmueli et al. [2017] and Williams [2011]. In this chapter, we mostly use the graphics functions from base R. A popular dedicated graphics package in R is the ggplot2 package by Wickham [2016]. The ggplot2 package provides more flexible and powerful graphics capability that can create presentation-quality visualization. However, it also comes with a significant learning curve to get familiar with the special technical language used in ggplot2. For those who use data visualizations on a regular basis, it is worth the time and effort to learn ggplot2.

      Sample statistics such as sample mean vector and sample covariance matrix for multivariate observations are discussed in detail in many multivariate statistics books, for example, Johnson et al. [2002] and Rencher [2003].

      Exercises

      1 Consider the data in the following table with two numerical variables x1 and x2 and two categorical variables x3 and x4.

x1x2x3x4
91YesOn
53NoOff
12YesOff
34YesOn
6−1NoOn
33YesOn

      1 Manually sketch the scatter plot for x1 and x2.Manually sketch the mosaic plot for x3 and x4.

      1 Consider the data set in Exercise 1. Manually calculate the sample mean vector, the sample covariance matrix, and the sample correlation matrix of x = (x1 x2)T.

      2 Consider the data in the following table with two numerical variables x1 and x2 and two categorical variables x3 and x4.

x1x2x3x4
10YesWorking
46NoFail
22YesFail
03NoFail
34NoWorking
57YesWorking

Скачать книгу