A Poisson model for coverage problems with an application in genomic research
Changxuan Mao and Bruce G. Lindsay
Suppose a population has infinitely many individuals and is partitioned into $N$ disjoint classes. For any $k$, the abundance-$k$ coverage of a random sample from the population is defined to be the sum of the proportions of the classes that contribute exactly $k$ individuals in the sample. The sample coverage is the total proportions of the classes that contribute at least one individual in the sample. The asymptotic distribution for the abundance-$k$ coverage is developed under a Poisson model. A new derivation of the well-known Turing's estimators is presented. It shows that Turing's estimators are sensible if $N$ is large enough. As an application, a gene classification issue in genomic research is addressed. Since Turing's approach is method of moment estimation, maximum likelihood estimation is presented as an alternative approach for the coverage problem. Finally, we show that any Turing-type estimator is asymptotic fully efficient among a class of estimators satisfying the regularity conditions defined by Tierney and Lambert.
Key Words Abundance-$k$ coverage; sample coverage; number of species; Poisson process.