ТОП просматриваемых книг сайта:
Probability with R. Jane M. Horgan
Читать онлайн.Название Probability with R
Год выпуска 0
isbn 9781119536987
Автор произведения Jane M. Horgan
Жанр Математика
Издательство John Wiley & Sons Limited
Often, the data for supervised learning are randomly divided into two parts, one for training and the other for testing. In machine learning, we derive the line of best fit from the training set
The testing set is used to see how well the line actually fits. Usually, an breakdown of the data is made, the 80% is used for “training,” that is, to obtain the line, and the 20% is used to decide if the line really fits the data, and to ascertain if the model is appropriate for future predictions. The model is updated as new data become available.
Example 3.1
Suppose there are 50 pairs of observations available for obtaining the line that best fits the data in order to predict
from
. The data are randomly divided into the training set and testing set, using 40 observations for training (Table 3.1), and 10 for testing (Table 3.2).
TABLE 3.1 The Training Set
Observation Numbers |
|
|
Observation Numbers |
|
|
1 | 11.8 | 31.3 | 21 | 15.1 | 80.1 |
2 | 10.8 | 59.9 | 22 | 14.7 | 66.9 |
3 | 8.6 | 27.6 | 23 | 10.5 | 42.0 |
4 | 10.3 | 57.7 | 24 | 10.9 | 72.9 |
5 | 8.5 | 50.2 | 25 | 11.6 | 67.8 |
6 | 11.6 | 52.1 | 26 | 9.1 | 45.3 |
7 | 14.4 | 79.1 | 27 | 5.4 | 30.2 |
8 | 8.6 | 32.3 | 28 | 8.8 | 49.6 |
9 | 12.4 | 58.8 | 29 | 11.2 | 44.3 |
10 | 14.9 | 79.5 | 30 | 7.4 | 46.1 |
11 | 8.9 | 57.0 | 31 | 7.9 | 45.1 |
12 | 8.7 | 35.1 | 32 | 12.2 | 46.5 |
13 | 11.7 | 68.2 | 33 | 8.5 | 42.7 |
14 | 11.4 | 60.1 | 34 | 9.3 | 56.3 |
15 | 8.8 | 44.5 | 35 | 10.0 | 27.4 |
16 | 5.9 | 28.9 | 36 | 3.8 | 20.2 |
17 | 13.5 | 75.8 | 37 | 14.9 | 68.5 |
18 | 8.7 | 48.7 | 38 | 12.4 | 72.6 |
19 | 11.0 | 54.7 | 39 |
|