Home | Business News | Browse by Publication | J | Journal of the American Statistical Association

Semilinear high-dimensional model for normalization of microarray data: a theoretical analysis and partial consistency; Comment.

Publication: Journal of the American Statistical Association
Publication Date: 01-SEP-05
Format: Online
Delivery: Immediate Online Access

Article Excerpt
Normalization is a critical component in microarray data analysis. Its purpose is to remove systematic biases in the observed expression values and to establish baseline intensity ratios across the whole dynamic range. Many researchers have considered this problem (see, e.g., Chen, Dougherty, and Bittner 1997; Kerr, Martin, and Churchill 2000; Yang et al. 2002; Yang, Dudoit, Luu, and Speed 2001; Tseng, Oh, Rohlin, Liao, and Wong 2001; Park et al. 2003). In particular, Fan, Tam, Vande Woude, and Ren (2004) proposed a semilinear in-slide model (SLIM) method that makes use of replications of a subset of genes in an array. In the present interesting and stimulating article, Fan, Peng, and Huang generalized the SLIM method to account for across-array information, resulting in an aggregated SLIM, so that replication within an array is no longer required. A focus of the article is the efficient estimation and calculation of semiparametric information for block effects in the case of fixed numbers of replications and arrays where the gene effects cannot be estimated consistently. This elegant result is a significant contribution to the semiparametric estimation theory, because the existing theory deals mainly with the case where the "nuisance parameters" can be consistently estimated.

We have proposed a two-way semilinear model (TW-SLM) for normalization and analysis of cDNA microarray data (Huang, Kuo, Koroleva, Zhang, and Soares 2003; Huang, Wang, and Zhang 2003; Huang and Zhang 2003). There are three main features of the TW-SLM that are different from the existing methods such as global and lowess normalization. First, normalization for each array in the TW-SLM is based on pooled information from all of the arrays. Second, the TW-SLM normalization curves and the gene effect parameters are estimated simultaneously in a single regression model. Each TW-SLM normalization curve does not attempt to fit the data from an individual array; rather, it fits the data after gene effects are adjusted for. This is in contrast to the lowess method, which estimates the normalization curves without adjusting for gene effects, which may cause the differentially expressed genes to be incorrectly "normalized" and result in a loss of power for detecting differentially expressed genes, because such genes tend to pull the normalization curve toward themselves. Third, in the framework of the TW-SLM, the uncertainty due to normalization is taken into account in the estimation of the standard errors of gene effects. The models proposed by Fan et al. (2004) and Fang, Peng, and Huang and the TW-SLM deal with the same problem with philosophically similar approaches, but our studies focus on different aspects of the problem and present orthogonal theoretical results. Thus we especially appreciate the opportunity to comment on this article. Here we give a brief description of the TW-SLM and some of its extensions, and discuss their relationship to the SLIM and its aggregations.

1. THE TWO-WAY SEMILINEAR MODEL

Suppose that there are J genes and n arrays in the study and that each gene is spotted once in an array. Let [u.sub.ij] and [v.sub.ij] be the intensity levels of gene j in array i from the type 1 and type 2 samples. Let [y.sub.ij] be the log-intensity ratio of the jth gene in the ith array, and let [x.sub.ij] be...

View this article FREE - Now for a Limited Time, try Goliath Business News
Free for 3 Days!



More articles from Journal of the American Statistical Association
Nonparametric inferences for additive models., September 01, 2005
Semiparametric regression analysis of longitudinal data with informati..., September 01, 2005
Dynamical correlation for multivariate longitudinal data., September 01, 2005
Estimation of long memory in the presence of a smooth nonparametric tr..., September 01, 2005
Measurement error in linear autoregressive models., September 01, 2005

Looking for additional articles?
Search our database of over 3 million articles.

Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication name or publication date.

About Goliath
Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.

Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information.