Home | Business News | Browse by Publication | M | Mathematics of Operations Research

Convergence analysis of sample average approximation methods for a class of stochastic mathematical programs with equality constraints.

Publication: Mathematics of Operations Research
Publication Date: 01-AUG-07
Format: Online
Delivery: Immediate Online Access

Article Excerpt
In this paper we discuss the sample average approximation (SAA) method for a class of stochastic programs with nonsmooth equality constraints. We derive a uniform Strong Law of Large Numbers for random compact set-valued mappings and use it to investigate the convergence of Karush-Kuhn-Tucker points of SAA programs as the sample size increases. We also study the exponential convergence of global minimizers of the SAA problems to their counterparts of the true problem. The convergence analysis is extended to a smoothed SAA program. Finally, we apply the established results to a class of stochastic mathematical programs with complementarity constraints and report some preliminary numerical test results.

Key words: sample average approximations; strong law of large numbers; random set-valued mappings; stationary points

1. Introduction. In this paper, we consider the following stochastic minimization program

min E[f(x, y(x, [xi]:([omega])), [xi]([omega]))] (1)

s.t. x [member of] X.

Here f: [R.sup.m] x [R.sup.n] x [R.sup.k] [right arrow] R is a continuously differentiable real-valued function, [xi] [OMEGA] [right arrow] [XI] [subset] C [R.sup.k] is a random vector defined on probability space ([OMEGA], F, P), E denotes the mathematical expectation, X is the feasible set of decision variate x, which is a nonempty subset of [R.sub.m], and y(x, [xi]([omega])) is a solution of the following system of equations

H(x, y, [xi]([omega])) = 0, for x [member of X], [omega] [member of] [OMEGA] (2)

where H: [R.sup.m] x [R.sup.n] x [R.sup.k] [right arrow] [R.sup.n] is a locally Lipschitz continuous vector-valued function. We restrict our discussion to the case when H is piecewise smooth because it represents a large class of locally Lipschitz continuous functions that cover most practical problems (Scholtes [29]). Throughout this paper, we assume that the probability measure P of our considered space [OMEGA], F, P) is nonatomic and E[f(x, y(x, [xi]([omega])), [xi]([omega])] is well defined for every x [member of] X. To ease the notation, we will write [xi](to) as [xi], and this should be distinguished from [xi] being a deterministic vector of [XI] in a context.

Problem (1) is essentially a here and now stochastic optimization model where a decision has to be made before the realization of uncertainty, and an optimal decision is chosen to minimize the objective on stochastic average. The key characteristic of model (1) is that y(x, [xi]) is an implicit function defined by a system of nonsmooth equations (2). If we can solve (2) and obtain an explicit expression for y(x, [xi]), then (1) reduces to an ordinary stochastic minimization problem that has been extensively investigated (Birge and Louveaux [5], Ruszczyfiski and Shapiro [27]).

In this paper, we assume that y(x, [xi]) may not necessarily be analytically obtainable by solving (2). We also assume that (2) has a unique solution for every (x, [xi]) [member of] X x [XI]. In the case when the equation has multiple solutions, we may employ a regularization scheme to approximate (2) with a regularized equation that has a unique solution. See our follow-up work (Meng and Xu [17]) for details. An interesting application of this abstract formulation is that (2) may be reformulated from a variational inequality or a complementarity problem that describes a system equilibrium parameterized by x and ~, and consequently y(x, [xi]) represents the equilibrium solution of the system. This type of problem is known as stochastic mathematical programs with equilibrium constraints (SMPEC). See Patriksson and Wynter [18], Lin et al. [15], Xu [36, 37], Shapiro [31], and Birbil et al. [4] for recent discussions.

This paper is concerned with numerical methods for solving (1). Throughout the paper, we assume that [xi] has either a continuous or a discrete distribution. We also assume that the explicit expression of the distribution of [xi] is not obtainable, but it can be estimated from past data. In other words, we assume that a sample of [xi] can be obtained from time to time. We tackle (1) with the well-known Monte Carlo simulation-based method. The basic idea of such a method is to generate an independent identically distributed (i.i.d.) sample [[xi].sub.1], ..., [[xi].sup.N] of [xi] and solve

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3)

We refer to (1) as the true problem and (3) as the sample average approximation (SAA) problem. SAA methods have been extensively investigated in stochastic optimization. This type of methods are also called sample path (SP) methods. There has been extensive literature on both SAA and SP. See the recent work of Artstein and Wets [2], King and Wets [14], Plambeck et al. [19], Robinson [22], Shapiro [30], and Shapiro and Homem-de-Mello [32].

The focus of this paper is on the convergence analysis of Karush-Kuhn-Tucker (KKT) points of the SAA problem (3). We use the Clarke generalized implicit function theorem (Clarke [8]) to eliminate variable [y.sup.i], i = 1, ..., N, in (3) and then investigate the convergence of the reduced SAA problem. Because the reduced problem is nonsmooth, we consider a kind of generalized KKT (GKKT) condition that is defined by the sample average of some random set-valued mappings, and we then investigate the convergence of the corresponding stationary points. We achieve this by establishing a uniform strong law of large numbers for random compact set-valued mappings.

We summarize the main results that are established in this paper. First, we obtain a uniform strong law of large numbers for a random compact set-valued mapping. Then we apply the result to convergence analysis of a sequence of GKKT points of SAA problems. We show that, under some moderate conditions, an accumulation point of a sequence of weak GKKT points of the reduced SAA problem is a weak GKKT point of (1) with probability 1 (w.p. 1). We also show that with probability approaching one exponentially fast, a global minimizer of the reduced SAA problem becomes an [member of]-global minimizer of the true problem. We establish similar results when the SAA problem is smoothed. Finally, we apply the results to a class of stochastic mathematical programs with strong monotone complementarity constraints.

The rest of this paper is organized as follows. In [section]2, we investigate the existence and uniqueness of implicit solution y(x, [xi]) of system (2) and its smoothing. We then define the weak GKKT conditions for the true problem. In [section]3, we establish a uniform strong law of large numbers for a random compact set-valued mapping and use it to investigate the convergence of GKKT points of reduced SAA problems and exponential convergence of global minimizers of SAA problems. In [section]4, we present a similar analysis for the smoothed SAA problem. In [section]5, we apply the convergence results to stochastic mathematical programs with strong monotone complementarity constraints. Finally, in [section]6, we present some preliminary numerical test results on an SMPEC problem.

2. Preliminaries. In this section, we present some preliminary discussions about the measurability of a random set-valued mapping, implicit function theorem based on Clarke generalized Jacobians and GKKT conditions.

Throughout this paper, we use the following notation. We use [parallel] x [parallel] to denote the Euclidean norm of a vector, a matrix, and a compact set of matrices. Specifically, if M is a compact set of matrices, then [parallel] M [parallel] := [max.sub.M [member of] M] [parallel] M [parallel]. We use dist(x, D) := [inf.sub.x' [member of]D [parallel] x - x' [parallel] to denote the distance from point x to set D. Here, D may be a subset of [R.sup.k] or a subset of [R.sub.kxk]. Given two compact sets C and D, we use D(C, D) := [sup.sub.x [member of]C dist(x, [D]) to denote the distance from set C to set D, and use H(C, D) to denote the Hausdorff distance between the two sets, that is, H(C, D) := max(D(C, D), D(D, C)). We use S(x, [delta]) to denote the closed ball in [R.sup.m] with radius [delta] and center x; that is, S(x, [delta]) := {x' [member of] [R.sup.m]: [parallel]x' - x[parallel] [less than or equal to] [delta]}. For a vector-valued function g: [R.sup.m] [right arrow] [R.sup.l], we use [nabla] or [gradient] g(x) to denote the classical Jacobian of g(x) if g(x) is Frechet differentiable at x. In the case when 1 = 1, that is, g(x)is a real-valued function, [nabla] or [gradient] g(x) denotes the gradient of g(x), which is a row vector.

For a set-valued mapping A: [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], we use [[pi].sub.y]A(y, Z) to denote the set of all n x n matrices M such that, for some n x m matrix N, the n x (n + m) matrix [M N] belongs to A(y, z). We use [bar.lim] to denote the outer limit of a sequence of vectors and set-valued mappings.

2.1. Clarke generalized Jacobians and random sets. Let P: [R.sup.j] [right arrow] [R.sup.l] be a locally Lipschitz continuous vector-valued function. Recall that the Clarke generalized Jacobian (Clarke [8]) of P at x [member of] [R.sup.j] is defined as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where [D.sub.P] denotes the set of points at which P is Frechet differentiable, [nabla]P(y) denotes the usual Jacobian of P, which is an l x j matrix, "conv" denotes the convex hull of a set. It is well known that the Clarke generalized Jacobian [partial derivative]P(x) is a convex compact set (Clarke [8]). In the case that j = l, [partial derivative]P(x) consists of square matrices. We say [partial derivative]P(x) is nonsingular if every matrix in set [partial derivative]P(x) is nonsingular, and in this case we use [partial derivative]P[(x).sup.-1] to denote the set of inverse matrices of [partial derivative]P(x).

In later discussions, we will be concerned with the Clarke generalized Jacobians of the random function H(x, y, [xi]), such as [[partial derivative].sub.x]H(X, y, [xi]) and [[partial derivative].sub.y]H(X, y, [xi]). For fixed x and y, these Jacobians are random sets. We need to deal with the expectation of them, which is related to the measurability of random sets. In what follows, we make some preparations for...

View this article FREE - Now for a Limited Time, try Goliath Business News
Free for 3 Days!



More articles from Mathematics of Operations Research
Lagrangian relaxation via ballstep subgradient methods., August 01, 2007
Partially B-regular optimization and equilibrium problems., August 01, 2007
A large deviation principle for join the shortest queue., August 01, 2007
Subgame-perfect equilibria for stochastic games., August 01, 2007
Subsolutions of an Isaacs equation and efficient schemes for importanc..., August 01, 2007

Looking for additional articles?
Search our database of over 3 million articles.

Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication name or publication date.

About Goliath
Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.

Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information.