Скачать книгу

it possible to form expectations about the future, including the uncertainty associated with those expectations.

      Let's begin with an example of a simple probabilistic system: rolling a pair of fair, six‐sided dice. In this case, if D represents the sum of the dice, then D is a random variable with 11 possible values ranging from 2 to 12. Some of these outcomes are more likely than others. Since, for instance, there are more ways to roll a sum of 7 ([1,6], [2,5], [3,4], [4,3], [5,2], [6,1]) than a sum of 10 ([4,6], [5,5], [6,4]), there is a higher probability of rolling a 7 than a 10. Observing that there are 36 possible rolls ([1,1], [1,2], [2,1], etc.) and that each is equally likely, one can use symbols to be more precise about this:

The distribution of D can be represented elegantly using a histogram. These types of graphs display the frequency of different outcomes, grouped according to defined ranges. When working with measured data, histograms are used to estimate the true underlying probability distribution of a probabilistic system. For this fair dice example, there will be 11 bins, corresponding to the 11 possible outcomes. This histogram is shown below in Figure 1.1, populated with data from 100,000 simulated dice rolls.

Figure 1.1 A histogram for 100,000 simulated rolls for a pair of fair dice. This diagram shows the likelihood of each outcome occurring according to this simulation (e.g., the height of the bin ranging from 6.5 to 7.5 is near 17%, indicating that 7 occurred nearly 17% of the time in the 100,000 trials).

      Distributions like the ones shown here can be summarized using quantitative measures called moments.3 The first two moments are mean and variance.

      Mean (first moment): Also known as the average and represented by the Greek letter μ (mu), this value describes the central tendency of a distribution. This is calculated by summing all the observed outcomes

together and dividing by the number of observations
:

(1.5)

For distributions based on statistical observations with a sufficiently large number of occurrences, the mean corresponds to the expected value of that distribution. The expected value of a random variable is the weighted average of outcomes and the anticipated average outcome over future trials. The expected value of a random variable X , denoted

, can be estimated using statistical data and Equation (1.5), or if the unique outcomes (
) and their respective probabilities
are known, then the expected value can also be calculated using the following formula:

      (1.6)

      In the dice sum example, represented with random variable D , the possible outcomes (2, 3, 4, …, 12) and the probability of each occurring (2.78%, 5.56%, 8.33%, …, 2.78%) are known, so the expected value can be determined as follows:

      The theoretical long‐term average sum is seven. Therefore, if this experiment is repeated many times, the mean of the observations calculated using Equation (1.5) should yield an output close to seven.

      Variance (second moment): This is the measure of the spread, or variation, of the data points from the mean of the distribution. Standard deviation, represented with by the Greek letter σ (sigma), is the square root of variance and is commonly used as a measure of uncertainty (equivalently, risk or volatility). Distributions with more variance are wider and have more uncertainty around future outcomes. Variance is calculated according to the following:4

      (1.7)

      When a large portion of data points are dispersed far from the mean, the variance of the entire set is large, and uncertainty on measurements from that system is significant. The variance of a random variable X, denoted

(X), can also be calculated in terms of the expected value, E [X]:

      (1.8)

      For the dice sum random variable, D, the possible outcomes (2, 3, 4, …, 12) and the probability of each occurring (2.78%, 5.56%, 8.33%, …, 2.78%) are known, so the variance of this experiment is as follows:

This equation indicates that the spread of the distribution for this random variable is around 5.84 and the uncertainty (standard deviation) is approximately 2.4 (shown in Figure 1.2).

      One can compare these theoretical estimates for the mean and standard deviation of the dice sum experiment to the values measured from statistical data. The calculated first and second moments from the simulated dice roll experiment are plotted in Figure 1.2 for comparison.

      Obtaining a distribution average

Скачать книгу