About UsMy AccountView Cart
Browse or Search over 5 million articles »
Find Articles by Publication

Home | Industry Information | Business News | Browse by Publication | I | IIE Transactions

Failure event prediction using the Cox proportional hazard model driven by frequent failure signatures.

Article, News, Research, Information, Industry & Business News
» View article excerpt

Read this article now - Try Goliath Business News - FREE!  
You can view this article PLUS...

  • Over 5 million business articles
  • Hundreds of the most trusted magazines, newswires, and journals (see list)
  • Premium business information that is timely and relevant
  • Unlimited Access
Now for a Limited Time, try Goliath Business News - Free for 7 Days!
Tell Me More Terms and Conditions

Purchase this article for $4.95

Already a subscriber? Log in to read full article
 

Publication: IIE Transactions
Publication Date: 01-MAR-07
Delivery: Immediate Online Access
Author: Li, Zhiguo ; Zhou, Shiyu ; Choubey, Suresh ; Sievenpiper, Crispian

Article Excerpt
1. Introduction

The method of servicing equipment (e.g., medical equipment, photocopy machines and computer hardware) is moving from reactive firefighting to preventive (proactive) maintenance. The reactive servicing of equipment is expensive and results in equipment downtime which negatively affects customer satisfaction and customer profitability. Therefore, current emphasis is being placed on predicting machine faults with a sufficient lead time before actual failure to allow a preventive repair action to be scheduled.

Error/event logs and system performance data can be used to determine preventive maintenance cycles that allow downtime to be avoided. The prediction of machine failure requires a formal framework to specify causal links between failure modes and failure indicators (failure signatures). These indicators can be generated from the error and event sequences, i.e., a series of events marked with their occurrence times, logged in the system's log files.

For example, a system error/event log file for a Computerized Tomography (CT) machine can consist of several thousand records associated with several hundred different event types and their associated occurrence times during machine usage. The recorded events can be related to various machine activities and behaviors, system failures, operator/user actions, or status of a subsystem task, etc. In practice, people use event sequence data to manually identify failure signatures within a time frame, which is specified by area experts based on experience and the physical operation principles of the system. Clearly, this is a time-consuming and labor intensive method.

A simple case of such an event sequence is illustrated in Fig. 1. In this figure, A, B, C and K are the different event types that occur at various points along the time line. Hereafter we let K represent the key failure event we are interested in, and in most cases event K occurs recurrently in the event sequence as shown in Fig. 1.

The event sequence contains considerable system information which can be used to monitor and diagnose faults in the process, or predict the future behavior of the process, say, the occurrence of some event(s) of interest. For instance, by analyzing the log file of a CT imaging system, service engineers can identify a frequently occurring failure signature (event sequence segment) consisting of five events with the last event representing the scan hardware error. The last event in this signature is a failure event, whereas the first four events contained in the signature are called trigger events. Knowledge of this failure signature allows the identification of the root cause of a system failure, and thus creates the potential for opportunistic maintenance, for example, part replacement, etc. On the other hand, if the occurrence of a failure could be predicted based on the trigger events, then preventative maintenance measures could be taken before the system breakdown and thus the downtime cost will be reduced.

[FIGURE 1 OMITTED]

In this paper, we are interested in building a statistical failure prediction model for a single event sequence based on failure signatures. Formally, an event sequence S is a triple ([T.sub.s.sup.s], [T.sub.e.sup.s], s) on a set of event types E, where [T.sub.s.sup.s] and [T.sub.e.sup.s] are the starting time and ending time respectively, and s = is an ordered sequence of events such that [E.sub.i] [member of] E for all i = 1, 2,..., m and the individual [t.sub.i.sup.s] are the occurrence time of the corresponding event with [T.sub.s.sup.s] [less than or equal to] [t.sub.1.sup.s] [less than or equal to] ... [less than or equal to] [t.sub.m.sup.s] [less than or equal to] [T.sub.e.sup.s] (Mannila et al., 1997). The problem of building a failure prediction model is formulated as follows: given the event sequence S containing failure event K, how do we construct a statistical model that can predict the occurrence of system failure K, i.e., during what time interval and with what probability will the failure event K occur in the system?

Some techniques to predict failure event(s) based on the analysis of event sequence data already exist. These methods can be roughly classified into design-based methods and data-driven rule-based methods. Design-based methods tend to be applied to logic fault diagnosis in automated manufacturing systems. In a design-based method, the expected event sequence is obtained from the system design and is compared with the observed event sequence. A system logic failure can be identified by use of this comparison. Sampath et al. (1994) and Chen and Provan (1997) proposed untimed and timed automata models to diagnose the faults in an automated systems. Untimed and timed Petri net models were developed by Valette et al. (1989) and Srinivasan and Jafari (1993) to represent the behavior of manufacturing systems and determine if a fault occurs. Time template models (Holloway and Chand, 1994; Holloway, 1996; Das and Holloway, 1996; Pandalai and Holloway, 2000) make use of timing and sequencing relationships of events, which are generated from either timed automata models (system design) or observations of manufacturing systems, to establish when events are expected to occur. The construction of all the abovementioned models requires us to know the designed or expected event sequences of the system. The major disadvantage of this method is that in many cases, the event occurring is random and thus there is no predefined system design information and hence no temporal relationship knowledge available.

In contrast with design-based methods, data-driven rule-based methods do not require system logic design information. Instead, they first identify the temporal patterns, i.e., the sequences of events that frequently occur, and then prediction rules are developed based on these patterns. Mannila et al. (1997) analyzed the event sequence data by identifying frequently occurring episodes (temporal patterns) through the "WINEPI" approach, in which computationally efficient algorithms are developed to identify frequent episodes and episode rules. In Klemettinen (1999), a method for recurrent pattern identification in alarm data for a telecommunications network was proposed to recognize episode rules. The technique of sequential pattern detection has also been applied to web log files by Agrawal (1996) and Xiao and Dunham (2001). Once the temporal patterns are identified, the time relationships among events in the pattern can be used to predict the occurrence of a failure event. To reach this goal, prediction rules, such as temporal association rules (Dunham, 2003) and episode rules (Mannila et al., 1997; Klemettinen, 1999), can be generated based on the identified temporal patterns. An example of a prediction rule based on a temporal pattern consisting of events A, B and K is:

IF the events A and B occur in the system

THEN the failure event K will occur

WITH [Time Interval] confidence (c%)

which means that if we observe events A and B occurring in the system, then we can predict that failure event K will occur within the time interval specified by [Time Interval] with a confidence of c%. If we try to predict the occurrence of a failure event, the prediction process begins by searching through the space of prediction rules generated from the identified temporal patterns. The available data-driven rule-based methods do not build rigorous statistical prediction models for event sequence data and thus they only provide heuristic prediction results. We would encounter the following two difficulties when using these rules for prediction.

1. Once temporal patterns are identified, the corresponding prediction rules are fixed with their parameters, i.e., the values of [Time Interval] and confidence (c%) are fixed in the above prediction rule. If people are interested in a different time interval, new temporal patterns need to be identified in terms of the changed parameters. If we need to predict the occurrence of events of interest with varying parameters, the space of prediction rules could be very large for a long event sequence.

2. The prediction becomes more complicated, if not impossible, for the case in which different trigger event sets, say, [T.sub.r1] and [T.sub.r2], occur in the system. Now we have different rules based on different trigger event sets, therefore we will have different prediction results. It is hard for us to combine all the associated prediction rules together to reach a final conclusion.

In this paper, we would like to develop a systematic methodology to construct a rigorous prediction model for failure events based on a single event sequence collected from in-service equipment. At the first step, we will isolate the meaningful failure signatures, which are a special temporal pattern, namely, a set of events that occur together frequently in the event sequence and end with the failure event, and then screen out trigger events which could affect the occurrence of failure events. Next, the Cox proportional hazard model (Klein and Moeschberger, 2003) will be built to provide rigorous statistical predictions for the system failures based on the identified failure signatures. In the procedure, we take advantage of both temporal pattern identification techniques originating from temporal data mining and the Cox PH model that...

NOTE: All illustrations and photos have been removed from this article.



More articles from IIE Transactions
Locating capacitated facilities to maximize captured demand, 01-NOV-07
Erratum, 01-NOV-07
Sequencing with limited flexibility, 01-OCT-07

Looking for additional articles?
Click here to search our database of over 3 million articles.

Looking for more in-depth information on this industry?
Click here to search our complete database of Industry & Market reports by text, subject, publication name or publication date.

About Goliath
Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.

Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information.

Home

Company Profiles

Industry Information

Business Development Resources

Business Management Resources

U.S. Job Search

Need More Information?
Start a new search.
Advertising, Privacy Policy, Refund Policy, Contact Us, Site Map, Terms & Conditions, Add to del.icio.us
Customer Service, How to Buy, Frequently Asked Questions
Copyright © 2008, ECNext, Inc., All Rights Reserved