ТОП просматриваемых книг сайта:
Sampling and Estimation from Finite Populations. Yves Tille
Читать онлайн.Название Sampling and Estimation from Finite Populations
Год выпуска 0
isbn 9781119071273
Автор произведения Yves Tille
Жанр Математика
Издательство John Wiley & Sons Limited
Some personalities, such as Malthus and Babbage in Great Britain, and Quételet in Belgium, contributed greatly to the development of statistical methodology. On the other hand, the establishment of a statistical apparatus was a necessity in the construction of modern states, and it is probably not a coincidence that these personalities come from the two countries most rapidly affected by the industrial revolution. At that time, the statistician's objective was mainly to make enumerations. The main concern was to inventory the resources of nations. In this context, the use of sampling was unanimously rejected as an inexact and fundamentally unscientific procedure. Throughout the 19th century, the discussions of statisticians focused on how to obtain reliable data and on the presentation, interpretation, and possibly modeling (adjustment) of these data.
1.3 Controversy on the use of Partial Data
In 1895, the Norwegian Anders Nicolai Kiær, Director of the Central Statistical Office of Norway, presented to the Congress of the International Statistical Institute of Statistics (ISI) in Bern a work entitled Observations et expériences concernant des dénombrements représentatifs (Observations and experiments on representative enumeration) for a survey conducted in Norway. Kiær (1896) first selected a sample of cities and municipalities. Then, in each of these municipalities, he selected only some individuals using the first letter of their surnames. He applied a two‐stage design, but the choice of the units was not random. Kiær argues for the use of partial data if it is produced using a “representative method”. According to this method, the sample must be a representation with a reduced size of the population. Kiær's concept of representativeness is linked to the quota method. His speech was followed by a heated debate, and the proceedings of the Congress of the ISI reflect a long dispute. Let us take a closer look at the arguments from two opponents of Kiær's method (see ISI General Assembly Minutes, 1896).
Georg von Mayr (Prussia)[
] It is especially dangerous to call for this system of representative investigations within an assembly of statisticians. It is understandable that for legislative or administrative purposes such limited enumeration may be useful – but then it must be remembered that it can never replace complete statistical observation. It is all the more necessary to support this point, that there is among us in these days a current among mathematicians who, in many directions, would rather calculate than observe. But we must remain firm and say: no calculation where observation can be done.4Guillaume Milliet (Switzerland). I believe that it is not right to give a congressional voice to the representative method(which can only be an expedient) an importance that serious statistics will never recognize. No doubt, statistics made with this method, or, as I might call it, statistics, pars pro toto, has given us here and there interesting information; but its principle is so much in contradiction with the demands of the statistical method that as statisticians, we should not grant to imperfect things the same right of bourgeoisie, so to speak, that we accord to the ideal that scientifically we propose to reach.5
The content of these reactions can again be summarized as follows: since statistics is by definition exhaustive, renouncing complete enumeration denies the very mission of statistical science. The discussion does not concern the method proposed by Kiaer, but is on the definition of statistical science. However, Kiaer did not let go, and continued to defend the representative method in 1897 at the congress of the ISI at St. Petersburg (see Kiær, 1899), in 1901 in Budapest, and in 1903 in Berlin (see Kiær, 1903, 1905). After this date, the issue is no longer mentioned at the ISI Congress. However, Kiær obtained the support of Arthur Bowley (1869–1957), who then played a decisive role in the development of sampling theory. Bowley (1906) presented an empirical verification of the application of the central limit theorem to sampling. He was the true promoter of random sampling techniques, developed stratified designs with proportional allocations, and used the law of total variance. It will be necessary to wait for the end of the First World War and the emergence of a new generation of statisticians for the problem to be rediscussed within the ISI. On this subject, we cannot help but quote Max Plank's reflection on the appearance of new scientific truths: “a new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it” (quoted by Kuhn, 1970, p. 151).
In 1924, a commission (composed of Arthur Bowley, Corrado Gini, Adolphe Jensen, Lucien March, Verrijn Stuart, and Frantz Zizek) was created to evaluate the relevance of using the representative method. The results of this commission, entitled “Report on the representative method of statistics”, were presented at the 1925 ISI Congress in Rome. The commission accepted the principle of survey sampling as long as the methodology is respected. Thirty years after Kiær's communication, the idea of sampling was officially accepted. The commission laid the foundation for future research. Two methods are clearly distinguished: “random selection” and “purposive selection”. These two methods correspond to two fundamentally different scientific approaches. On the one hand, the validation of random methods is based on the calculation of probabilities that allows confidence intervals to be build for certain parameters. On the other hand, the validation of the purposive selection method can only be obtained through experimentation by comparing the obtained estimations to census results. Therefore, random methods are validated by a strictly mathematical argument while purposive methods are validated by an experimental approach.
1.4 Development of a Survey Sampling Theory
The report of the commission presented to the ISI Congress in 1925 marked the official recognition of the use of survey sampling. Most of the basic problems had already been posed, such as the use of random samples and the calculation of the variance of the estimators for simple and stratified designs. The acceptance of the use of partial data, and especially the recommendation to use random designs, led to a rapid mathematization of this theory. At that time, the calculation of probabilities was already known. In addition, statisticians had already developed a theory for experimental statistics. Everything was in place for the rapid progress of a fertile field of research: the construction of a statistical theory of survey sampling.
Jerzy Neyman (1894–1981) developed a large part of the foundations of the probabilistic theory of sampling for simple, stratified, and cluster designs. He also determined the optimal allocation of a stratified design. The optimal allocation method challenges the basic idea of the quota method, which is the “representativeness”. Indeed, depending on the optimal stratification, the sample should not be a miniature of the population as some strata must be overrepresented. The article published by Neyman (1934) in the Journal of the Royal Statistical Society is currently considered one of the founding texts of sampling theory. Neyman identified the main fields of research and his work was to have a very important impact in later years. We now know that Tschuprow (1923) had already obtained some of the results that were attributed to Neyman, but the latter seems to have found them independently of Tschuprow. It is not surprising that such a discovery was made simultaneously in several places. From the moment that the use of random samples was considered a valid method, the theory would arise directly from the application of the theory of probability.
1.5 The US Elections of 1936
During the same period, the implementation of the quota method contributed much more to the development of the use of survey sampling methods than theoretical studies. The 1936 US election marked an important turning point in the handling of questionnaire surveys. The facts can be summarized as follows. The major American newspapers used to publish, before the elections, the results of empirical surveys produced from large samples (two million people polled for the Literary Digest) but without any method to select individuals. While most polls predicted Landon's victory, Roosevelt was elected. Surveys conducted by Crossley, Roper, and Gallup on smaller samples but using the quota method gave a correct prediction. This event helped to confirm the validity of the data provided by opinion polls.
This