|
...process variables; (ii) estimate functional model parameters from historical data; and (iii) formulate the optimization model to minimize the distance between the predicted and target quality, and solve for the optimal settings of the process variables.
The most important step in the above approach is to build a good functional model since the optimization of a poorly estimated function may give a solution far from the best one as illustrated in Fig. 1 (see Vining and Bohn (1998) and also Xu and Albin (2003).) In Fig. 1, the curve on the right represents the true, but unknown, quality function, f(x). The curve on the left represents the estimated quality function, g(x). Let c* and Target be the optimal setting of the process variables and the desired value of quality, respectively. If we optimize the estimated quality function g(x), which is quite different from the true quality function f(x), [x.sub.f] is obtained as a solution and yields quality response f([x.sub.f]), significantly higher than the target.
Unfortunately, in practice, it is very hard to discover the true functional form and to accurately estimate the parameters of the function since the true relationship between quality and process variables may not be simple, the process variables might be correlated, or the historical data might include outliers.
To find optimal input variables without building a functional model, Friedman and Fisher (1999) suggest an alternative method, the Patient Rule Induction Method (PRIM), which directly seeks the optimal input variables from historical data without constructing an explicit functional model. The PRIM starts with a hypercube, called a box, that includes all observations. Then PRIM iteratively peels off a fraction [alpha] of observations from one "side" of the box such that the density of good observations in the reduced box increases. In the context of this paper, good observations are those process settings that lead to better quality. This continues until the proportion of total observations in the reduced box falls below a stopping parameter [beta], say 0.05.
The peeling parameter [alpha] that controls the number of observations peeled off at each iteration must be set to a small value, typically 0.05 [less than or equal to] [alpha] [less than or equal to] 0.1. By peeling off a small number of observations in each iteration, we create a long sequence of boxes. Thus, each peeling becomes less important in determining the final box and unfortunate peelings that remove good observations can be mitigated in later steps. The method is called the "patient strategy" because of the numerous small peelings and in contrast to other rule discovery algorithms such as CN2 (Clark and Niblett, 1989), FOIL (Quinlan, 1990), RIPPER (Cohen, 1995), Data Surveyor (Holsheimer et al., 1996), CART (Breiman et al., 1984), C4.5 (Quinlan, 1994, 1995), etc. Detailed discussions about the PRIM can be found in Friedman and Fisher (1999).
[FIGURE 1 OMITTED]
Several successful applications of the PRIM have recently been presented in various areas such as geology (Friedman and Fisher, 1999), marketing (Friedman and Fisher, 1999), finance (Becker and Fahrmeir, 2001), medicine (Kehl and Ulm, 2006), bioinformatics (Cole et al., 2003; Liu et al., 2004), and process optimization (Chong and Jun, 2005a). An embedded assumption in PRIM is that the variables are uncorrelated. Thus, to successfully apply the PRIM in process optimization where the variables are correlated, it is necessary to modify the PRIM.
The purpose of this paper is to develop a new PRIM-like method that considers the correlation structure of process data and determines the optimal settings of process variables from historical data without an explicit functional model. Firstly, the proposed method creates new variables out of the process variables, called latent variables, by applying the Partial Least Squares (PLS) method to historical data. These latent variables are uncorrelated and expressed as a linear combination of process variables. Typically, the number of latent variables is much smaller than that of process variables. For details about the PLS method, see Geladi and Kowalski (1986a, b), and Wold et al. (2001). Secondly, the boxes are obtained by doing PRIM on data projected on the latent variables. Finally, the optimal settings in the process variables are determined from the optimal box in the latent variables space.
Figure 2 illustrates the concept of the proposed method with one quality variable y and two process variables [x.sub.1] and [x.sub.2]. The actual problem we address in this case would have one quality variable and many process variables. The process is assumed to be initially set at [x.sub.0] = (0, 0). The coordinates of each circle represent the process variables and the number within the circle gives the quality. For example, when [x.sub.1] and [x.sub.2] are zero, y is seven. Since process variables are correlated and their observations may not always be the same as [x.sub.0] due to process variations, circles are scattered around [x.sub.0] in an ellipse. Let the Target of y be two. So, the goal of this example is to find the optimal settings of process variables making the quality variable two. The optimal setting, denoted by c*, is assumed to be unknown.
[FIGURE 2 OMITTED]
The proposed method first creates latent variables of ([t.sub.1], [t.sub.2]) that are uncorrelated, and then defines a box in the latent variables space (e.g., a [less than or equal to] [t.sub.1] [less than or equal to] b and c [less than or equal to] [t.sub.2] [less than or equal to] d) large enough to include all observations. We obtain four candidate boxes by peeling from top, bottom, left, or right of the box and select the one where the average quality is closest to the Target value. This peeling procedure continues until the reduced box includes less than a proportion [beta] (say, 0.05) of total observations. The final box is expected to include c*. Finally, the optimal setting is estimated from the final box.
In real situations, process engineers often change the initial setting several times in order to create quality improvements, so there can be clusters around these settings in historical data, and c* may not...
NOTE: All illustrations and photos
have been removed from this article.

Looking for additional articles?
Search our database of over 3 million articles.
Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication
name or publication date.
About Goliath
Whether you're looking for sales prospects, competitive information, company
analysis or best practices in managing your organization,
Goliath can help you meet your business needs.
Our extensive business information databases empower business
professionals with both the breadth and depth of credible,
authoritative information they need to support their business
goals. Whether it be strategic planning, sales prospecting,
company research or defining management best practices -
Goliath is your leading source for accurate information.
|