Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design
BARRATT BJ., PAYNE F., RANCE HE., NUTLAND S., TODD JA., CLAYTON DG.
Genotyping costs still preclude analysis of a comprehensive SNP map in thousands of individual subjects in the search for disease susceptibility loci. Allele frequency estimation in DNA pools from cases and controls offers a partial solution, but variance in these estimates will result in some loss of statistical power. However, there has been no systematic attempt to quantify the several sources of error in previous studies. We report an analysis of the magnitude of variance components of each experimental stage in DNA pooling studies, and find that a design based on the formation of numerous small pools of approximately 50 individuals is superior to the formation of fewer, larger pools and the replication of any of the experimental stages. We conclude that this approach may retain an effective sample size greater than 68% of the true sample size, whilst offering a 60‐fold reduction in DNA usage and a greater than 30‐fold saving in cost, compared to individual genotyping. The possibility of combining pooling with informed selection of haplotype tag SNPs is also considered. In this way further savings in efficiency may be possible by using pooled allele frequency estimates to infer haplotype frequencies and hence, allele frequencies at untyped markers.