Computational Statistics in Data Science. Группа авторов. Математика. . Читать онлайн. Литмир. LITMIR.BIZ

Computational Statistics in Data Science. Группа авторов

Читать онлайн.

Информация о произведении:

Название Computational Statistics in Data Science

Год выпуска 0

isbn 9781119561088

Автор произведения Группа авторов

Жанр Математика

Серия

Издательство John Wiley & Sons Limited

Computational Statistics in Data Science - Группа авторов

Скачать книгу

Consider the general coupon collector problem [38] where the goal is to collect distinct objects (e.g., coupons, trading cards, and action figures). Specifically, independent draws of size are made from with replacement, and interest is in the number of draws necessary, say , to draw all objects at least once. The classical case where and all objects are equally likely yields a closed‐form solution (related to random sampling of digits). We consider a variation where and action figures appear in cereal boxes with probabilities in Table 1.

We estimate the expected number of boxes needed to collect all 15 action figures and the probability we needed to buy more than 100 and 200 total boxes. Denote these as , , and , respectively. Additionally, we implement an absolute precision sequential stopping rule to simulate until 95% confidence interval lengths for the three quantities of interest are below 1, 0.01, and 0.01, respectively. Specifically, we set and simulate an additional 100 Monte Carlo sample between checking the stopping rule. The sequential stopping rule terminates at with estimates of . We note that stopping is based on since its 95% confidence interval criteria is the most restrictive. The left panel of Figure 1 provides a histogram of the Monte Carlo samples along with vertical bold lines corresponding to 100 and 200 boxes.

Table 1 Probabilities for each action figure

Figures	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
Probability	0.2	0.1	0.1	0.1	0.1	0.1	0.05	0.05	0.05	0.05	0.02	0.02	0.02	0.02	0.02

Figure 1 Histograms of simulated boxes and mean number of boxes for two Monte Carlo sampling strategies in the collector problem.

A more efficient Monte Carlo experiment is available if we only wish to estimate . Suppose that is the set of all permutations of the set representing the order in which the action figures were collected. Then, for any , we can calculate and notice

normal upper E left-bracket upper W right-bracket equals sigma-summation Underscript z element-of upper Z Endscripts normal upper E left-bracket upper W vertical-bar upper Z equals z right-bracket upper P left-bracket upper Z equals z right-bracket

This calculation is unavailable since there are over 3 trillion partitions in . However, we can simulate equally likely permutations from and estimate with

StartFraction 1 Over n EndFraction sigma-summation Underscript t equals 1 Overscript n Endscripts normal upper E left-bracket upper W vertical-bar upper Z equals upper Z Subscript t Baseline right-bracket

Using this sampler, we simulate until the 95% confidence interval length for normal upper E left-bracket upper W right-bracket is below 1. Again, we set n Superscript asterisk Baseline equals 100 and simulate an additional 100 Monte Carlo sample between checking the stopping rule. Now the sequential stopping rule terminates at n equals 5500 with an estimate of 116.1, which is approximately 10 times more efficient than the naive Monte Carlo sampling. The right panel of Figure 1 provides a histogram of the Monte Carlo simulated means.

7.2 Estimating Risk for Empirical Bayes

Risk of empirical Bayes estimators is often not available in closed form, and Monte Carlo simulation is used to estimate it. Consider Example 3.3 from Robert and Casella [4] where for

Скачать книгу

Новинки

Популярные

Наши рекомендации

ТОП просматриваемых книг сайта: