About UsMy AccountView Cart
Browse or Search over 5 million articles »
Find Articles by Publication

Home | Industry Information | Business News | Browse by Publication | I | IIE Transactions

A distribution-free tabular CUSUM chart for autocorrelated data.

Article, News, Research, Information, Industry & Business News
» View article excerpt

Read this article now - Try Goliath Business News - FREE!  
You can view this article PLUS...

  • Over 5 million business articles
  • Hundreds of the most trusted magazines, newswires, and journals (see list)
  • Premium business information that is timely and relevant
  • Unlimited Access
Now for a Limited Time, try Goliath Business News - Free for 7 Days!
Tell Me More Terms and Conditions

Purchase this article for $4.95

Already a subscriber? Log in to read full article
 

Publication: IIE Transactions
Publication Date: 01-MAR-07
Delivery: Immediate Online Access
Author: Kim, Seong-Hee ; Alexopoulos, Christos ; Tsui, Kwok-Leung ; Wilson, James R.

Article Excerpt
1. Introduction

Given a stochastic process to be monitored, a Statistical Process Control (SPC) chart is used to detect any practically significant shift from the in-control status for that process, where the in-control status is defined as maintaining a specified target value for a given parameter of the monitored process--for example, the mean, the variance or a quantile of the marginal distribution of the process. An SPC chart is designed to yield a specified value [ARL.sub.0] for the in-control Average Run Length (ARL) of the chart--that is, the expected number of observations sampled from the in-control process before an out-of-control alarm is (incorrectly) raised. Given several alternative SPC charts whose control limits are determined in this way, one would prefer the chart with the smallest out-of-control average run length [ARL.sub.1], a performance measure analogous to [ARL.sub.0] for the situation in which the monitored process is in a specific out-of-control condition. If the monitored process consists of independent and identically distributed (i.i.d.) random variables from a known distribution, such as the normal distribution, then control limits can be determined analytically for some charts such as the Shewhart and tabular CUSUM charts as detailed in Montgomery (2001).

It is more difficult to determine control limits for an SPC chart that is applied to an autocorrelated process; and much of the recent work on this problem has been focused on developing distribution-based (or model-based) SPC charts, which require one of the following properties.

1. The in-control and out-of-control versions of the monitored process must follow specific probability distributions.

2. Certain characteristics of the monitored process--such as the first-order and second-order moments, including the entire autocovariance function--must be known.

Of course, if the underlying assumptions about the probability distributions describing the target process are violated, then these charts may not perform as advertised. Moreover, the control limits for many distribution-based charts can only be determined by trial-and-error experimentation, which can be very inconvenient in practical applications. This is especially true in circumstances that require rapid calibration of the chart and do not allow extensive preliminary experimentation on training data sets to estimate [ARL.sub.0] for various trial values of the control limits and other parameters of the chart. We illustrate these disadvantages of distribution-based charts in more detail in the next section, using an example from intrusion detection in information systems.

The limitations of distribution-based procedures can be overcome by distribution-free SPC charts. Runger and Willemain (R & W) (1995) organize the sequence of observations of the monitored process into adjacent nonoverlapping batches of equal size; and their SPC procedure is applied to the corresponding sequence of batch means. They choose a batch size large enough to ensure that the batch means are approximately i.i.d. normal, and then they apply to the batch means one of the classical SPC charts developed for i.i.d. normal data, including the Shewhart and tabular CUSUM charts. In contrast to this approach, Johnson and Bagshaw (J & B) (1974) and Kim, Alexopoulos, Goldsman, and Tsui (2006) present CUSUM-based methods that use raw (unbatched) observations instead of batch means. Computing the control limits for the latter two procedures requires an estimate of the variance parameter of the monitored process--that is, the sum of covariances at all lags. Nevertheless, these CUSUM-based charts are distribution free since one can estimate the variance parameter using a variety of distribution-free techniques that are popular in the simulation literature; see Alexopoulos et al. (2006).

For first-order autoregressive processes with a known variance parameter, Kim, Alexopoulos, Goldsman, and Tsui (2006) show that: (i) their model-free CUSUM chart called the MFC chart performs uniformly better than the J & B chart in terms of [ARL.sub.1] for a given target value of [ARL.sub.0]; and (ii) the MFC chart works better than the R & W Shewhart chart for small shifts. On the other hand, Kim, Alexopoulos, Goldsman, and Tsui (2006) find that the R & W Shewhart chart performs better than the MFC chart for large shifts. This is not surprising, given that a Shewhart-type chart is generally more effective than a CUSUM-type chart in detecting large shifts in processes consisting of independent normal observations. However, Kim, Alexopoulos, Goldsman, and Tsui (2006) show that for stationary processes with nonnormal marginals such as first-order exponential autoregressive processes, a large batch size is often required to achieve both independence and normality of the batch means. This large batch size impairs the performance of the R & W Shewhart chart, delaying legitimate out-of-control alarms for processes with a pronounced correlation structure or large shifts; and in practice it is difficult to determine a good choice for the batch size in the R & W Shewhart chart.

Another approach to developing distribution-free SPC charts is taken by Ben-Gal et al. (2003), who introduce a context-based SPC methodology for state-dependent discrete-valued data generated by a finite-memory source. Unfortunately this method is limited to univariate stochastic processes having a finite state space; and the experimental results of Ben-Gal et al. (2003) indicate that relatively large sample sizes are required to calibrate the performance of this procedure for the in-control condition.

In this paper we formulate DFTC, a distribution-free tabular CUSUM chart for monitoring autocorrelated processes. The proposed chart is a generalization of the conventional tabular CUSUM chart that is designed for i.i.d. normal random variables. Moreover, to improve upon the performance of the J & B chart, DFTC incorporates a nonzero reference value into the monitoring statistic. For a reflected Brownian motion process with drift, Bagshaw and Johnson (1975) derive the density and expected value of the first-passage time to a positive threshold; and they mention that this result can be used to approximate the ARL of a CUSUM chart with a nonzero reference value. Combining this approximation with a generalization of the Brownian-motion approximation of Siegmund (1985) for the ARL of a CUSUM-based procedure that requires i.i.d. normal random variables, we designed DFTC so that it can be used with raw correlated data or with batch means based on any batch size.

The rest of this article is organized as follows. Section 2 contains relevant background information, including a motivating example, notation, and assumptions. Section 3 presents the proposed DFTC chart for autocorrelated processes. Section 4 contains an experimental comparison of the performance of DFTC with that of existing distribution-free procedures based on three test processes whose probabilistic behavior is typical of many practical applications of SPC procedures.

* The first test process is a stationary first-order autoregressive (AR(1)) process with the following values of the autoregressive parameter (and hence also of the lag-one correlation): 0.0, 0.25, 0.5, 0.7, 0.9, 0.95, and 0.99.

* The second test process is the sequence of queue waiting times generated by the M/M/1 queueing system with traffic intensities of 30 and 60%. Thus, for each configuration of this system in steady-state operation, the queue waiting-time process has the following properties: (i) its autocorrelation function decays at an approximately exponential rate; and (ii) its marginal distribution is markedly nonnormal, with an atom at zero and an exponential tail.

* The third test process is a stationary second-order autoregressive (AR(2)) process, where the corresponding autocorrelation function exhibits exponentially damped sinusoidal behavior; and the original AR(2) process also exhibits a kind of "distorted periodicity" with the same period as the autocorrelation function.

Section 5 summarizes the main findings of this work.

2. Background

In this section we give a motivating example from the area of intrusion detection in information systems to illustrate the emerging need for distribution-free SPC methods. Then we define the notation used in this article, and we state our basic assumptions about the probabilistic behavior of the process to be monitored.

[FIGURE 1 OMITTED]

2.1. Motivating example

The MIT Lincoln Laboratory simulated the environment of a real computer network to provide a test-bed of data sets for comprehensive evaluation of the performance of various intrusion-detection systems. Ye et al. (2001), Ye et al. (2003), and Park (2005) derive event-intensity (arrival-rate) data from log files generated by the Basic Security Module (BSM) of a Sun SPARC 10 workstation running the Solaris operating system and functioning as one of the components of the network simulated by the MIT Lincoln Laboratory. These authors consider a Denial-of-Service (DoS) attack on the Sun workstation that leaves trails in the audit data--in particular, Ye et al. (2001), Ye et al. (2003), and Park (2005) capture the activities on the machine through a continuous stream of audit events whose occurrence times are recorded in the log files.

[FIGURE 2 OMITTED]

Figure 1 shows event-intensity data (that is, the number of events in successive 1-second time intervals) derived from the BSM log files for an observation period of 12000 seconds on a specific day in the data sets from the MIT Lincoln Laboratory. This data set is believed to be intrusion free. Since the Sun system performs a specific routine to create a log file every 60 seconds, the graph in Fig. 1 shows a repeated pattern every 60 seconds. After a careful analysis, Park (2005) separates the graph in Fig. 1 into the cyclic and noise components as shown in Fig. 2.

For the detection of a DoS attack, the noise events must be monitored. One can observe that the noise data are very sparse--in particular, only 60 of the 12000 1-second time intervals contained noise events not related to the generation of a log file so that the estimated probability of occurrence of at least one noise event in a given 1-second time interval is only 0.005. Conventional probability distributions (in particular, the Poisson and normal distributions) cannot provide an adequate fit to the observed noise data set because of its high standard deviation. For the sample of 60 noise-event counts...

NOTE: All illustrations and photos have been removed from this article.



More articles from IIE Transactions
Locating capacitated facilities to maximize captured demand, 01-NOV-07
Erratum, 01-NOV-07
Sequencing with limited flexibility, 01-OCT-07

Looking for additional articles?
Click here to search our database of over 3 million articles.

Looking for more in-depth information on this industry?
Click here to search our complete database of Industry & Market reports by text, subject, publication name or publication date.

About Goliath
Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.

Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information.

Home

Company Profiles

Industry Information

Business Development Resources

Business Management Resources

U.S. Job Search

Need More Information?
Start a new search.
Advertising, Privacy Policy, Refund Policy, Contact Us, Site Map, Terms & Conditions, Add to del.icio.us
Customer Service, How to Buy, Frequently Asked Questions
Copyright © 2008, ECNext, Inc., All Rights Reserved