|
Article Excerpt This paper presents a new dual formulation for quadratically constrained convex programs. The special structure of the derived dual problem allows us to apply the gradient projection algorithm to produce a simple explicit method involving only elementary vector-matrix operations, proven to converge at a linear rate.
Key words: quadratically constrained problems; gradient projection algorithm; convex minimization; rate of convergence analysis
MSC2000 subject classification: Primary: 90C25, 90C30; secondary: 49N15, 65K05 OR/MS subject classification: Primary: programming/nonlinear
1. Introduction. Gradient methods are probably the most basic and fundamental class of optimization algorithms. For constrained problems, these methods, known as gradient projection algorithms (GPA), have been extensively studied in the literature; see for example Goldstein [12], Levitin and Polyak [13], Bertsekas [4], Dunn [10], Calamai and Mort [9], Dunn [11], Luo and Tseng [17], and Wang and Xiu [27]. More details and further references on the GPA can be found in Bertsekas [5]. The main disadvantage of these methods is that without imposing strong assumptions on the problem's data, they exhibit a slow convergence rate, e.g., sublinear, and thus do not seem to be competitive with the modern interior point methods (IPM) which, for convex optimization problems, have attractive polynomial time complexity (Nesterov and Nemirovski [22]). However, the power of IPM has some drawbacks and limits. Indeed, these IPM require sophisticated and heavy computational tasks to be performed at each iteration, e.g., solving Newton's type systems. For very large-scale problems in the decision variables, a single iteration of such a polynomial time algorithm is often too expensive to be of practical use. This has led to a revived interest in the study of simple algorithms, such as gradient-based methods. Despite their apparent lack of efficiency, gradient methods appear to remain legitimate and affordable candidate algorithms for large-scale applications, which in particular do not often require highly accurate solutions. This has been substantiated by some recent theoretical and computational studies on gradient-based algorithms; see, e.g., Auslender and Teboulle [1], Beck and Teboulle [2], Bienstock [7], Ben-Tal et al. [3], and Nemirovski [20]. In particular, we mention the successful computational experiments recently reported by Bienstock [7] for approximating linear programs, which also uses a simple algorithm (that relies on the Frank-Wolfe method applied to an appropriate potential function); the work of Ben-Tal et al. [3] for solving very large-scale image reconstruction problems, which uses a general mirror descent method (see Nemirovski and Yudin [21]), and which has been recently proven by Beck and Teboulle [2] to be equivalent to a specific GPA; and the work of Nemirovski [20] for approximating large instances of the Lovasz capacity of a graph. All the aforementioned works have clearly shown that such simple gradient-type algorithms can be a viable alternative in practical solutions to large-scale problems, especially when high accuracy is not required, and that their theoretical efficiency deserves to be further studied.
The main advantage of GPA is its simplicity, provided that the orthogonal projection on the feasible set and the gradient of the objective function can be easily computed. Indeed, for minimizing an objective function f over a set of constraints S [subset] [[??].sup.n], the basic GPA simply consists of iterating the formula
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (1)
where [P.sub.S] denotes the orthogonal projection map and t > is some appropriately chosen step size. For example, if the set of constraints is an affine space, then the resulting projection map at each iteration of GPA involves only matrix-vector multiplications. One of the main drawbacks of GPA, as just mentioned, is that its convergence rate is in general only sublinear, unless some further and often restrictive assumptions on the problem's data are made (e.g., strong convexity). A natural question is thus to identify classes of problems for which on the one hand, the rate of convergence can be improved, say to linear, under weaker or reasonable assumptions, while, on the other hand, the simplicity of the algorithm of GPA will be preserved, namely, the projections and gradients, can be easily and analytically computed.
In this paper, we prove that for a class of quadratically constrained convex minimization problems (QCQP), both requirements can be achieved. We present a new and simple dual-based projected gradient method, namely, we derive a GPA-based algorithm which involves only simple matrix-vector multiplications producing a simple and provably linearly convergent algorithm. To achieve this goal requires us to find an adequate equivalent formulation of (QCQP) and a new line of analysis for proving its linear rate of convergence. Interestingly, the class (QCQP) enlightens the difficulties encountered in the double task of deriving a simple algorithm with a linear rate of convergence. Indeed, and to the best of our knowledge, none of the known results available in the current literature can be directly applied to our problem to produce our declared tasks; see [section]3.
Our first objective is to construct a new dual problem on which the GPA can be applied, namely, where the projections can be computed explicitly, and with an objective with a Lipschitz continuous gradient that can be easily computed. It turns out that one can construct such a dual problem with the desirable affine constraints, thus rendering the computation of the projections a trivial task. The dual objective function we derive possesses an interesting structure, and is proven to be continuously differentiable with a Lipschitz continuous gradient that is easy to compute. This is developed in [section]2, where we also state the resulting explicit dual gradient projection (DGP) algorithm.
Our second and main contribution will be to prove that the proposed DGP algorithm is linearly convergent. The latter task requires us to develop a specific and novel analysis, which we develop in [section]3. We end the paper with a short appendix that includes some simple technical results that are used throughout this paper.
2. A dual approach for (QCQP). We consider the minimization of a strictly convex quadratic function under strictly convex quadratic inequalities:
(QCQP) minimize{[x.sup.T][Q.sub.0]x + 2[b.sup.T.sub.0] x: [x.sup.T][Q.sub.i]x + 2[b.sup.T.sub.i] x [less than or equal to] [c.sub.i], i = 1, 2, ..., m},
where [Q.sub.0], [Q.sub.1], ..., [Q.sub.m] are n x n positive definite matrices, [b.sub.0], [b.sub.1], ..., [b.sub.m] [member of] [[??].sup.n] and [c.sub.1], ..., [c.sub.m] [member of] [??].
A classical problem that can be cast as (QCQP) is the problem of projection onto the intersection of ellipsoids. For a recent work considering this problem and its applications to nonlinear programming, we refer the reader to the work of Lin and Han [14], and the references therein.
Throughout, we assume that (QCQP) is strictly feasible; as a result, we have [c.sub.i] + [b.sup.T.sub.i][Q.sup.-1.sub.i][b.sub.i] > [for all] i. Because we assume that Slater's condition is satisfied, we thus have that strong duality is satisfied and the optimal value of (QCQP) is equal to the attained optimal value of the dual problem. To solve (QCQP), we would like to derive a dual problem that can be easily solved via the GPA, namely, with an analytical iteration formula.
2.1. Standard dual formulations. A standard dual formulation of (QCQP) can be easily shown to be given by
maximize - [([b.sub.0] + [m.summation over (i=1)] [b.sub.i][[lambda].sub.i]).sup.T] [([Q.sub.0] + [m.summation over (i=1)] [[lambda].sub.i][Q.sub.i]).sup.-1] ([b.sub.0] + [m.summation over (i=1)] [b.sub.i][[lambda].sub.i]) - [m.summation over (i=1)] [[lambda].sub.i][c.sub.i]
s.t. [[lambda].sub.i] [greater than or equal to] 0, i = 1, 2, ..., m.
The trouble with this formulation is that each function value or gradient calculation of the dual objective function consists of inverting a matrix. Thus, for very large-scale problems in the design variables n (even with small m), a gradient-based algorithm would require us to compute the inverse of a huge matrix (and in general with no specific structure) at each iteration, a task that is practically intractable. Our goal is to define an algorithm that consists only of matrix-vector multiplications and does not involve any matrix inversion at each iteration (excluding a one-time preprocessing calculation that might involve matrix inversion). To achieve this task, we use a decomposition approach (see, e.g., Bertsekas and Tsitsiklis [6]), where we duplicate the variables to obtain simpler expressions for the dual problem. An equivalent primal problem to (QCQP) is
minimize [1/m] [m.summation over (i=1)] [x.sup.T.sub.i] [Q.sub.0][x.sub.i] + 2[b.sup.T.sub.0]x
s.t. [x.sup.T.sub.i] [Q.sub.i][x.sub.i] + 2[b.sup.T.sub.i] [x.sub.i] [less than or equal to] [c.sub.i], i = 1, 2, ..., m,
[x.sub.i] = x, i = 1, ..., m.
Assigning a Lagrange multiplier [[lambda].sub.i] [member of] [[??].sup.n] for each linear equality constraint, we obtain the following dual:
maximize {[m.summation over (i=1)][g.sub.i]([[lambda].sub.i]): [m.summation over (i=1)][[lambda].sub.i] = [b.sub.0]},
where for each i, [g.sub.i]([[lambda].sub.i]) = min{(1/m)[x.sup.T.sub.i][Q.sub.0][x.sub.i] + 2[[lambda].sup.T.sub.i] [x.sub.i]: [x.sup.T.sub.i][Q.sub.i][x.sub.i] + 2[b.sup.T.sub.i][x.sub.i] [less than or equal to] [c.sub.i]}. The difficulty here is that the function [g.sub.i] does not have an explicit expression. The only case where it is possible to find an explicit expression for [g.sub.i] is the case where [Q.sub.0] = [alpha][Q.sub.i] for some [alpha] > and for every i. The motivation behind the proposed new formulation given below is, thus, somehow to enforce the situation where [Q.sub.0] is equal to [alpha][Q.sub.i] for some [alpha] > 0. It turns out that this can be done by adding a redundant constraint.
2.2. A new dual formulation for (QCQP). One of the key arguments in establishing the new dual formulation is to guarantee that we can write [Q.sub.0] as a positive linear combination of the matrices [Q.sub.i], i.e., that [Q.sub.0] = [[summation].sup.m.sub.i=1][[beta].sub.i][Q.sub.i] where [[beta].sub.1], [[beta].sub.2], ..., [[beta].sub.m] > 0. Of course, there is no guarantee that there exists such a linear combination. This is the reason why we will add a redundant constraint to the original problem (QCQP), which will enforce the validity of such a linear combination. The following lemma allows us to do that. In the sequel, [[lambda].sub.min](Q) ([[lambda].sub.max](Q))...
|