ТОП просматриваемых книг сайта:
Sampling and Estimation from Finite Populations. Yves Tille
Читать онлайн.Название Sampling and Estimation from Finite Populations
Год выпуска 0
isbn 9781119071273
Автор произведения Yves Tille
Жанр Математика
Издательство John Wiley & Sons Limited
1.6 The Statistical Theory of Survey Sampling
The establishment of a new scientific consensus in 1925 and the identification of major lines of research in the following years led to a very rapid development of survey theory. During the Second World War, research continued in the United States. Important contributions are due to Deming & Stephan (1940), Stephan (1942, 1945, 1948) and Deming (1948, 1950, 1960), especially on the question of adjusting statistical tables to census data. Cornfield (1944) proposed using indicator variables for the presence of units in the sample. Cochran (1939, 1942, 1946, 1961) and Hansen & Hurwitz (1943, 1949) showed the interest of unequal probability sampling with replacement. Madow (1949) proposed unequal probability systematic sampling (see also Hansen et al., 1953a,b). This is quickly established that an unequal probability sampling with fixed size without replacement is a complex problem. Narain (1951), Horvitz & Thompson (1952), Sen (1953), and Yates & Grundy (1953) presented several methods with unequal probabilities in two articles that are certainly among the most cited in this field. Devoted to the examination of several designs with unequal probabilities, these texts are mentioned for the general estimator (expansion estimator) of the total, which is also proposed and discussed. The expansion estimator is, in fact, an unbiased general estimator applicable to any sampling design without replacement. However, the proposed estimator of variance has a default. Yates & Grundy (1953) showed that the variance estimator proposed by Horvitz and Thompson can be negative. They proposed a valid variant when the sample is of fixed sample size and gives sufficient conditions for it to be positive. As early as the 1950s, the problem of sampling with unequal probabilities attracted considerable interest, which was reflected in the publication of more than 200 articles. Before turning to rank statistics, Hájek (1981) discussed the problem in detail. A book of synthesis by Brewer & Hanif (1983) was devoted entirely to this subject, which seems far from exhausted, as evidenced by regular publications.
The theory of survey sampling, which makes abundant use of the calculation of probabilities, attracted the attention of university statisticians and very quickly they reviewed all aspects of this theory that have a mathematical interest. A coherent mathematical theory of survey sampling was constructed. The statisticians very quickly came up against a difficult problem: surveys with finite populations. The proposed model postulated the identifiability of the units. This component of the model makes irrelevant the application of the reduction by sufficiency and the maximum likelihood method. Godambe (1955) states that there is no optimal linear estimator. This result is one of the many pieces of evidence showing the impossibility of defining optimal estimation procedures for general sampling designs in finite populations. Next, Basu (1969) and Basu & Ghosh (1967) demonstrated that the reduction by sufficiency is limited to the suppression of the information concerning the multiplicity of the units and therefore of the nonoperationality of this method. Several approaches were examined, including one from the theory of the decision. New properties, such as hyperadmissibility (see Hanurav, 1968), are defined for estimators applicable in finite populations.
A purely theoretical school of survey sampling developed rapidly. This theory attracted the attention of researchers specializing in mathematical statistics, such as Debabrata Basu, who was interested in the specifics of the theory of survey sampling. However, many of the proposed results were theorems of the nonexistence of optimal solutions. Research on the question of the foundations of inference in survey theory was becoming so important that it was the subject of a symposium in Waterloo, Canada, in 1971. At this symposium, the intervention of Calyampudi Radhakrishna Rao (1971, p. 178), began with a very pessimistic statement:
I may mention that in statistical methodology, the existence of uniformly optimum procedures (such as UMV unbiased estimator, uniformly most powerful critical region for testing a hypothesis) is a rare exception rather than a rule. That is the reason why ad hoc criteria are introduced to restrict the class of procedures in which an optimum may be sought. It is not surprising that the same situation is obtained in sampling for a finite situation. However, it presents some further complications which do not seem to exist for sampling from infinite populations.
This introduction announced the direction of current research.
In survey sampling theory, there is no theorem showing the optimality of an estimation procedure for general sampling designs. Optimal estimation methods can only be found by restricting them to particular classes of procedures. Even if one limits oneself to a particular class of estimators (such as the class of linear or unbiased estimators), it is not possible to obtain interesting results. One possible way out of this impasse is to change the formalization of the problem, for example by assuming that the population itself is random.
1.7 Modeling the Population
The absence of tangible general results concerning certain classes of estimators led to the development of population modeling by means of a model called “superpopulation”. In this model‐based approach, it is assumed that the values taken by the variable of interest on the observation units of the population are the realizations of random variables. The superpopulation model defines a class of distributions to which these random variables are supposed to belong. The sample is then derived from a double random experiment: a realization of the model that generates the population and then the choice of the sample. The idea of modeling the population was present in Brewer (1963a), but it was developed by Royall (1970b, 1971, 1976b) (see also Valliant et al., 2000; Chambers & Clark, 2012).
Drawing on the fact that the random sample is an “ancillary” statistic, Royall proposed to work conditionally on it. In other words, he considered that once the sample is selected, the choice of units is no longer random. This new modeling allowed the development of a particular research school. The model must express a known and previously accepted relationship. According to Royall, if the superpopulation model “adequately” describes the population, the inference can be conducted only with respect to the model, conditional to the sample selection. The use of the model then allows us to determine an optimal estimator.
One can object that a model is always an approximate representation of the population. However, the model is not built to be tested for data but to “assist” the estimation. If the model is correct, then Royall's method will provide a powerful estimator. If the model is false, the bias may be so important that the confidence intervals built for the parameter are not valid. This is essentially the critique stated by Hansen et al. (1983).
The debate is interesting because the arguments are not in the domain of mathematical statistics. Mathematically, these two theories are obviously correct. The argument relates to the adequacy of formalization to reality and is therefore necessarily external to the mathematical aspect of statistical development. In addition, the modeling proposed by Royall is particular. Above all, it makes it possible to break a theoretical impasse and therefore provide optimal estimators. However, the relevance of modeling is questionable and will be considered in a completely different way depending on whether one takes the arguments of sociology, demography or econometrics, three disciplines that are intimately related to the methodology of statistics. A comment from Dalenius (see Hansen et al., 1983, p. 800) highlights this problem:
That is not to say that the arguments for or against parametric inference in the usual statistical theory are not of interest in the context of the theory of survey sampling. In our assessment of these arguments, however, we must pay attention to the relevant specifics of the applications.
According to Dalenius, it is therefore in the discipline in which the theory of survey sampling is applied that useful conclusions should be drawn concerning the adequacy of a superpopulation model.
The statistical theory of surveys mainly applies in official statistics institutes. These institutes do not develop a science but have a mission from their states. There is a fairly standard argument by the heads of national