Скачать книгу

2 end fraction plus n over sigma squared. end cell end table"/> (3.29)

      The posterior mean given in (3.28) can be understood as a weighted average of the prior mean μ0 and the sample mean , which is the MLE of μ. When the sample size n is very large, the weight for is close to one and the weight for μ0 is close to 0, and the posterior mean is very close to the MLE, or the sample mean. On the other hand, when n is very small, the posterior mean is very close the prior mean μ0. Similarly, if the prior variance σ02 is very large, the prior distribution has a flat shape and the posterior mean is close to the MLE. Note that because the mode of a normal distribution is equal to the mean, the MAP of μ is exactly μn. Consequently, when n is very large, or when the prior is flat, the MAP is close to the MLE.

      When the data follow a p-dimensional multivariate normal distribution with unknown mean μ and known covariance matrix Σ, the posterior distribution based on a random sample of independent observations D = {x1, x2,…, xn} is given by

f left parenthesis bold mu vertical line D right parenthesis proportional to f left parenthesis D vertical line bold mu right parenthesis g left parenthesis bold mu equals product from i equals 1 to n of f left parenthesis bold x subscript i vertical line bold mu right parenthesis g left parenthesis bold mu right parenthesis comma

      where g(μ) is the density of the conjugate prior distribution Np(μ0, Σ0). Similar to the univariate case, the posterior distribution of μ can be obtained as

table row cell bold mu vertical line D tilde N subscript p left parenthesis mu subscript n comma capital sigma subscript n right parenthesis comma end cell end table

      where

      where is the sample mean of the data, which is the MLE of μ. It is easy to see the similarity between the results for the univariate data in (3.28) and (3.29) and the results for the multivariate data in (3.30) and (3.31). The MAP of μ is exactly μn. Similar to the univariate case, when n is large, or when the prior distribution is flat, the MAP is close to the MLE.

      One advantage of the Bayesian inference is that the prior knowledge can be included naturally. Suppose, for example, a randomly sampled product turns out to be defective. A MLE of the defective rate based on this single observation would be equal to 1, implying that all products are defective. By contrast, a Bayesian approach with a reasonable prior should give a much less extreme conclusion. In addition, the Bayesian inference can be performed in a sequential manner very naturally. To see this, we can write the posterior distribution of μ with the contribution from the last data point xn separated out as

      Example 3.3: For the side_temp_defect data set from a hot rolling process, suppose the true covariance matrix of the side temperatures measured at location 2, 40, and 78 of Stand 5 is known and given by

table row cell bold S equals open parentheses table row cell 2547.4 end cell cell negative 111.0 end cell cell 133.7 end cell row cell negative 111.0 end cell cell 533.1 end cell cell 300.7 end cell row cell 133.7 end cell cell 300.7 end cell cell 562.5 end cell end table close parentheses. end cell end table

      We use the nominal mean temperatures as given in Example 3.2 as the mean of the prior distribution and a diagonal matrix with variance equal to 100 for each temperature variable as its covariance matrix:

Скачать книгу