|
Article Excerpt We thank the editor Francisco Samaniego and an associate editor for organizing this stimulating discussion, with a conscientious effort to invite outstanding researchers from diverse backgrounds that make the discussion more thought-provoking. We are also very grateful to all discussants for their insightful and stimulating comments, touching on practical, methodological, and theoretical aspects of microarray designs, experiments, normalization, analysis, and applications, offering some original insights and outlooks. Their contributions are very timely and helpful.
The last couple of years have brought an explosion of statistical techniques for the design and analysis of microarray data. They range from the design of microarray experiments (Kerr and Churchill 2001; Yang and Speed 2002), normalization of microarray data (Tseng, Oh, Rohlin, Liao, and Wong 2001; Dudoit et al. 2002; Fan, Tam, Vande Woude, and Ren 2004; Huang, Wang, and Zhang 2003), the expression indices of Affymetrix oligonucleotide arrays (Li and Wong 2001; Irizarry et al. 2003a), significant analysis of gene expressions (Tseng et al. 2001; Tusher, Tibshirani, and Chu 2001; Lonnstedt and Speed 2002; Fan et al. 2004), classification and clustering (Tibshirani, Hastie, Narasimhan, and Chu 2003; Zhang, Yu, and Singer 2003), and time-course experiments for the expression pathways (Svrakic, Nesic, Dasu, Herndon, and Perez-Polo 2003), among others. (For an overview on the subject, see Sebastiani, Gussoni, Kohane, and Ramoni 2003; Speed 2003; Parmigiani, Garrett, Irizarry, and Zeger 2003.) They revived a surge interest in multiple testing problems (Dudoit, Shaffer, and Boldrick 2003; Storey 2003; Donoho and Jin 2004; Storey, Taylor, and Siegmund 2004; Efron 2004). They exemplify the interactions between statistics and the sciences, tackling problems of high societal impact. All of the discussants call for more statistical understanding of various procedures in use. We agree wholeheartedly with this and contribute the article under discussion in the hope that it will stimulate more statisticians to work on this area. The discipline of statistics should grow stronger when it provides methodologies that address issues of the highest societal importance while at the same time offering foundational understanding of the methodologies that push theory, methods, and applications forward.
1. REPLICATIONS OF cDNA GENES
Normalization is a critical step in removing possible systematic biases in the process of microarray experiments. The process is usually complicated, and the biases are hard to quantify. The ideal situation for assessing the systematic biases is to use within-array replications; all experimental conditions are the same except for the locations of replicated genes. Hence the observed differences of expression for two identical clones in the same array are due to random noises and possible biases. The genesis of our approach is to extract the biases from those duplicated pairs of genes.
We are very grateful to Professor Craig for his careful description on the process of fabrication of slides and to Professor Sabatti for her convincing arguments on the needs of within-slide replications. Sabatti is correct that detected distortion (biases) in cDNA microarrays should be taken more seriously. Greater understanding of the basis of biases should facilitate technological improvements. The degree of distortion can be better understood when two identical tissues are compared using the cDNA microarray experiments. We discuss this issue further in Section 3 of this rejoinder. Sabatti also raised the question of how many replicates are needed at the stage of designing cDNA microarrays. The answer depends on the complexity of models that statisticians would like to use and on the expense of appropriately replicating some of the genes. She is right that a natural model that takes care of print-tip effects would include one extra parameter per print tip due to the technological process used in spotting microarrays. Our asymptotic theory continues to apply, and our asymptotic formulas provide useful guidance for choosing the number of replicated genes. Because the number of print tips is usually large, aggregating information from other arrays is needed and makes it possible to obtain reasonable estimates of the print-tip effect. Validation tests in the next section should also be useful for checking whether systematic biases have been successfully removed. Statistical techniques should also be calibrated with biological experiments to achieve better approximations and understanding.
Craig is right that duplicated spots are very helpful for normalizing expressions of multiple arrays and assessing the effectiveness of a normalization procedure. He expresses some concerns about the costs of duplication unless the benefits outweigh the costs. We agree with such a careful attitude but are far more optimistic on the feasibility of within-array duplications.
First, printing a couple hundred duplicated spots in an array of 20,000 spots does not take up a large percentage of space. For a wide class of biological and biomedical problems, many genes are of little biological interest. Replacing them with duplicated spots enables biologists and statisticians to reduce biases in multiple-array comparisons and to verify certain biological claims. Second, the cost of printing duplicated spots should not be excessive. Once a template is designed, cDNA microarrays can be...
|