Refs:

```
suppressWarnings(library(CVXR, warn.conflicts=FALSE))
library(magrittr)
library(ggplot2)
library(ggforce)
```

A **convex combination** of \(x_1 \ldots x_k\) is any point

\[x = \theta_1 x_1 + \ldots + \theta_k x_k\]

with \(\theta_1 + \ldots + \theta_k = 1, \theta_i \geq 0\)

A **convex hull** of set S is the set of all convex combinations of points in S.

A **conic combination** of \(x_1, x_2\) is any point

\[x = \theta_1 x_1 + \theta_2 x_2\]

with \(\theta_i \geq 0\)

A **convex cone** of set S is the set of all conic combinations of points in S.

A **hyperplane** is a set of form \(\{ x | a^T x = b\}\), where \(a \in \mathbb{R}^n, a \neq 0, b \in \mathbb{R}\). In 2D (n=2) this is a line, in 3D it’s a plane.

\(a\) can be seen as the normal vector to the hyperplane, and \(b\) its offset from the origin.

A hyperplane divides \(\mathbb{R}^n\) into two **hyperplanes**.

An **Euclidian ball** in \(\mathbb{R}^n\) with center \(x_c\) and radius \(r\) has the form

\[B(x_c, r) = \{ x | \Vert x-x_c \Vert_2 \leq r \} = \{ x|(x-x_c)^T(x-x_c) \leq r^2 \}\]

where \(\Vert \Vert_2\) is the Euclidean norm, i.e., \(\Vert u \Vert_2 = (u^T u)^{1/2}\)

An **ellipsoid** is a set of the form:

\[\{ x|(x-x_c)^T P^{-1}(x-x_c) \leq 1 \}\]

where \(P\) is a symmetric and positive-definite matrix, i.e., \(P=P^T \succ 0\). The matrix \(P\) determines hor far the ellipsoid extends in every direction form the center \(x_c\). A Euclidean ball is an ellipsoid when \(P=r^2 I\).

A **norm ball** given norm \(\Vert \cdot \Vert\) is \(\{ x | \Vert x-x_c \Vert \leq r \}\).

Remember that a function is a norm if \(\Vert x \Vert \geq 0\), \(\Vert t x \Vert = |t| \Vert x \Vert\), and \(\Vert x+y \Vert \leq \Vert x \Vert + \Vert y \Vert\).

A **norm cone** is a set \(\{ (x,t) | \Vert x \Vert \leq t \}\)

If the norm is the Euclidean norm, then the norm cone is called **second-order cone**.

A **Polyhedra** is the solution set of several linear inequalities and equalities

\[Ax \preceq b, \quad Cx = d\]

with \(A \in \mathbb{R}^{m \times n}, C \in \mathbb{R}^{p \times n}\).

A **Polyhedra** is the intersection of a finite number of halfspaces and halfplanes.

An optimization problem in standard form:

`minimize`

\(f_0(x)\)

`subject to`

\(f_i(x) \leq 0, i=1,\dots,m\)

\(h_i(x) = 0, i=1,\dots,p\)

\(x\) is the **optimization variable** (can be a vector or matrix), \(f_0\) is the **objective or cost function**, \(f_i\) are the **inequality constraints** and \(h_i\) the **equality constraints**.

If instead of minimizing, we wish to maximize, \(f_0\) usually is called the utility or profit function.

\(x\) is **feasible** if satisfies the constraints and belongs to the domain of \(f_0\).

The **optimal value** \(p^*\) is the solution we want, if it exists. If \(p^* = \infty\) the problem is infeasible (no \(x\) satisfies the constraints). If \(p^* = - \infty\) the problem in unbounded below which is a sign of a incorrect model.

The **set of optimal points** is \(X_{opt} = \{x | p^*=f_0(x) \land x \in \text{dom} f_0 \}\)

There are implicit constraints that determine that \(x\) must fall into the domain of functions \(f_0, f_i, h_i\).

The next eg does not have explicit constraints

Say we have some data:

```
set.seed(100)
k <- 3
A <- matrix(rnorm(k*k, 10), k, k)
b <- rnorm(k, 10)
```

and this is our optimization problem:

`minimize`

\(f_0(x) = - \sum_{i=1}^k \log (b_i - a_i^T x)\)

but \(a^T_ix < b_i\) is an implicit constraint.

Using CVXR:

```
x <- Variable(k) # define variable as a vector 3x1
objective <- Minimize( -sum(log(b-A%*%x)) ) # define objective function
problem <- Problem(objective) # create problem instance
## Checking problem solution
solution <- solve(problem)
solution$status # can return optimal, optimal_inaccurate, infeasible or unbounded
## [1] "optimal_inaccurate"
solution$getValue(x) # an optimal x
## [,1]
## [1,] 13352326
## [2,] -10067209
## [3,] -9496189
solution$value # p* = f0(x)
## [1] -51.70338
```

Package CVXR deals with **convex optimization problems**. These are optimization problems where functions \(f_0, f_i\) are convex, and the equality constraints are affine (ie, linear functions with translation):

`minimize`

\(f_0(x)\)

`subject to`

\(f_i(x) \leq 0, i=1,\dots,m\)

- \(a_i^T x = b_i, i=1,\dots,p\) or just \(Ax = b\)

An important property is that the feasible set of a convex problem is itself convex.

Another important property is that a *locally optimal point of a convex problem is (globally) optimal*.

One important point is that *convexity is an attribute of the description of the problem*.

Consider the following problem

`minimize`

\(f_0(x) = x_1^2 + x_2^2\)

`subject to`

\(f_1(x) = x_1 / (1+x_2^2) \leq 0\)

- \(h_1(x) = (x_1 + x_2)^2 = 0\)

which definitely is not a convex problem (\(h_1\) is not affine, for eg).

However, asking if some \(x^2 = 0\) is the same as asking if some \(x=0\)

Also, \(1+x_2^2\) is always positive.

So the original problem can be transformed into an equivalent problem:

`minimize`

\(f_0(x) = x_1^2 + x_2^2\)

`subject to`

\(f_1(x) = x_1 \leq 0\)

- \(h_1(x) = x_1 + x_2 = 0\)

which is a convex problem!

```
x1 <- Variable(1) # a scalar
x2 <- Variable(1) # a scalar
objective <- Minimize( x1^2 + x2^2 )
constraints <- list(x1 <= 0, x1 + x2 == 0)
problem <- Problem(objective, constraints)
## Checking problem solution
solution <- solve(problem)
solution$status
## [1] "optimal"
solution$getValue(x1)
## [1] -9.065648e-06
solution$getValue(x2)
## [1] 9.065651e-06
solution$value
## [1] -1.021385e-09
```

A linear program (LP) has an affine objective and constraint functions.

The feasible set is a polyhedron (the simplex method is an algorithm that moves in this polyhedron to find the optimal solution).

Example: choose cheapest healthy diet \(x\) with quantities \(x_1 \ldots x_n\) of \(n\) foods, where (a) one unit of food \(j\) costs \(c_j\) and contains amount \(a_{ij}\) of nutrient \(i\); (b) healthy diet requires nutrient \(i\) in quantity at least \(b_i\).

`minimize`

\(c^T x\)

`subject to`

\(Ax \geq b\)

- \(x \geq 0\)

```
n <- 4 # number of foods
i <- 3 # number nutrients
# nutrient's amounts (column is food, row is nutrient)
A <- matrix(c(10, 40, 0, 20,
5, 20, 15, 11,
20, 10, 6, 0), ncol=n, nrow=i, byrow = T)
# minimal amounts
b <- c(50, 10, 20)
# costs
c <- c(30, 10, 5, 15)
```

The problem specification:

```
x <- Variable(4) # 4x1 vector
objective <- Minimize( t(c) %*% x )
constraints <- list(A%*%x >= b, x >= 0)
problem <- Problem(objective, constraints)
## Checking problem solution
solution <- solve(problem)
solution$status
## [1] "optimal"
round(solution$getValue(x),3)
## [,1]
## [1,] 0.00
## [2,] 1.25
## [3,] 1.25
## [4,] 0.00
solution$value
## [1] 18.75
```

These are problems like:

`minimize`

\(\text{max}_{i=1\ldots m} a_i^T x + b_i\)

graphically, it’s the minimum of a given section of a polyhedra

`minimize`

\(t\)

`subject to`

- \(a_i^T x + b_i \leq t, i=1\ldots m\)

Example: find the Chebyshev center of a polyhedron.

The Chebyshex center \(x_c\) is the center of the largest inscribed circle (in a sense, the deepest point inside the polyhedron). Be \(r\) the radius of that circle.

```
n <- 2
A <- matrix(c(2, 1,
2, -1,
-1, 2,
-1, -2), nrow=4, ncol=n, byrow = TRUE)
b <- c(1,1,1,1)
norm_A <- rowSums(A^2)^0.5 # given A, norms are just scalars
```

```
ggplot() +
geom_abline(slope = -A[,1]/A[,2], intercept = b/A[,2]) +
labs(x = "x", y = "y") +
theme_bw() +
xlim(-1, 1) + ylim(-1, 1)
```

This problem can be solved by the following LP:

`maximize`

\(r\)

`subject to`

- \(a_i^T x_c + r \Vert a_i \Vert_2 \leq b_i, i=1\ldots m\)

```
r <- Variable(1, name="radius")
x_c <- Variable(n, name="center")
objective <- Maximize( r )
constraints <- list(A%*%x_c + r*norm_A <= b)
problem <- Problem(objective, constraints)
## Checking problem solution
solution <- solve(problem)
solution$status
## [1] "optimal"
solution$getValue(x_c)
## [,1]
## [1,] -2.643728e-10
## [2,] -5.614499e-17
solution$value
## [1] 0.4472136
```

```
center <- round(as.double(solution$getValue(x_c)),3)
radius <- round(as.double(solution$getValue(r)),3)
ggplot() +
geom_abline(slope = -A[,1]/A[,2], intercept = b/A[,2]) +
geom_circle(mapping = aes(x0 = center[1], y0 = center[2], r = radius), color = "blue") +
geom_point(mapping = aes(x = center[1], y = center[2]), color = "red", size = 2) +
labs(x = "x", y = "y") +
theme_bw() +
xlim(-1, 1) + ylim(-1, 1)
```

A convex optimization problem is called a quadratic programmin (QP) if the objective function is a convex quadratic and the constraint functions are affine.

An example:

`minimize`

\(x^2 + y^2\)

`subject to`

\(x \geq 0\)

- \(2x+y=1\)

```
x <- Variable(1)
y <- Variable(1)
objective <- Minimize(x^2 + y^2)
constraints <- list(x >= 0, 2*x + y == 1)
problem <- Problem(objective, constraints)
# Probleolution
solution <- solve(problem)
solution$status
## [1] "optimal"
solution$getValue(x)
## [1] 0.3999978
solution$getValue(y)
## [1] 0.2000044
solution$value
## [1] 0.2
```

A very well known problem that’s QP is least squares (linear regression)

Let’s generate some data based on a linear model: \(Y = X \beta + \epsilon\)

```
set.seed(123)
n <- 100
p <- 10
beta <- -4:5 # the true parameters are [−4,...,−1,0,1,...,5]
X <- matrix(rnorm(n * p), nrow=n)
Y <- X %*% beta + rnorm(n)
```

The QP is

`minimize`

\(\|Y - X\beta\|_2^2\)

```
betaHat <- Variable(p)
objective <- Minimize(sum((Y - X %*% betaHat)^2))
problem <- Problem(objective)
result <- solve(problem)
result$getValue(betaHat) %>% round(3)
```

```
## [,1]
## [1,] -3.920
## [2,] -3.012
## [3,] -2.125
## [4,] -0.867
## [5,] 0.091
## [6,] 0.949
## [7,] 2.076
## [8,] 3.127
## [9,] 3.961
## [10,] 5.135
```

We can compare with the use of R’s `lm`

:

```
ls.model <- lm(Y ~ 0 + X)
matrix(coef(ls.model), ncol = 1) %>% round(3)
```

```
## [,1]
## [1,] -3.920
## [2,] -3.012
## [3,] -2.125
## [4,] -0.867
## [5,] 0.091
## [6,] 0.949
## [7,] 2.076
## [8,] 3.127
## [9,] 3.961
## [10,] 5.135
```

So we just emulate `lm`

… But this QP can be updated to include variants of the classical method. Consider we would like all estimates of \(\beta\) to be positive (nonnegative least squares regression).

While the classical regression has an analytical solution, this variant does not. However, it’s just an extra constraint for the QP:

```
constraints <- list(betaHat >= 0)
problem <- Problem(objective, constraints)
result <- solve(problem)
result$getValue(betaHat) %>% round(3)
```

```
## [,1]
## [1,] 0.000
## [2,] 0.000
## [3,] 0.000
## [4,] 0.000
## [5,] 1.237
## [6,] 0.623
## [7,] 2.123
## [8,] 2.804
## [9,] 4.445
## [10,] 5.207
```

Another eg. If we would like to force \(2*\beta_2 + \beta_3 = 0\):

```
A <- matrix(c(0, 2, 1, rep(0, 7)), nrow = 1)
constraints <- list(A %*% betaHat == 0)
problem <- Problem(objective, constraints)
result <- solve(problem)
result$getValue(betaHat) %>% round(3)
```

```
## [,1]
## [1,] -3.526
## [2,] 0.262
## [3,] -0.524
## [4,] -1.078
## [5,] 0.797
## [6,] 0.319
## [7,] 1.836
## [8,] 3.512
## [9,] 4.187
## [10,] 5.478
```

Let’s make some data:

```
set.seed(101)
n <- 20
X1 <- c(rnorm(n), rnorm(n, mean=1.5))
X2 <- c(rnorm(n), rnorm(n, mean=1.5))
X <- matrix(c(X1,X2), ncol=2)
Y <- c(rep(1,n), rep(-1,n))
ggplot() +
geom_point(mapping = aes(x = X[,1], y = X[,2]), color = Y+2, size = 2) +
labs(x = "x1", y = "x2") +
theme_bw()
```

The Support Vector Classifier (SVC) is an affine function (hyperplane) that separates two sets of points by the widest margin. When the sets are not linearly separable, the SVC is determined by a trade-off between the width of the margin and the number of points that are misclassified.

Variable \(\xi\) is the amount by which a point can violate the separating hyperplane. Its cost is the penalty \(C>0\) (as \(C\) increases, fewer misclassifications will be allowed).

The model is:

`minimize`

\(\frac{1}{2} \Vert \beta \Vert^2 + C\sum_{i=1}^m \xi_i\)

`subject to`

\(\xi_i \geq 0\)

- \(y_i(x_i^T\beta + \beta_0) \geq 1 - \xi_i, \quad i = 1,\ldots,m\)

```
C <- 100
beta0 <- Variable()
beta <- Variable(2)
slack <- Variable(2*n)
# Form problem
objective <- (1/2) * sum_squares(vstack(beta, beta0)) + C * sum(slack)
constraints <- list(Y * (X %*% beta + beta0) >= 1 - slack, slack >= 0)
problem <- Problem(Minimize(objective), constraints)
solution <- solve(problem)
```

To plot the separating hyperplane (here, a line) we need to get the \(\beta\) to find its slope and intercept:

```
beta.hat <- solution$getValue(beta)
beta0.hat <- solution$getValue(beta0)
ggplot() +
geom_point(mapping = aes(x = X[,1], y = X[,2]), color = Y+2, size = 2) +
geom_abline(slope = -beta.hat[1,]/beta.hat[2,], intercept = -beta0.hat/beta.hat[2,]) +
labs(x = "x1", y = "x2") +
theme_bw()
```

Vector optimization occurs when the objective function \(f_0(x)\) to minimize is vector-valued, i.e., \(x \in \mathbb{R}^n\).

For this problem to work there must be a proper cone \(K\) in order to compare objective values. If \(x,y\) are two feasible points (they satisfy the contraints), the respective objective values \(f_0(x), f_0(y)\) are to be compared using the inequality \(\preceq_K\). If \(f_0(x) \preceq_K f_0(y)\) the we interpret that by saying \(x\) is better or equal than \(y\).

The main problem is that \(f_0(x), f_0(y)\) might not be comparable, something that cannot happen in scalar objective functions.

If for every feasible \(y\), \(f_0(x) \preceq_K f_0(y)\) we say that \(x\) is a **optimal value**. If it exists, it will be unique (\(K\) is a cone). This, however, is not common.

A feasible \(x\) is **Pareto optimal** if, for every feasible \(y\), \(f_0(y) \preceq_K f_0(x) \implies f_0(y) = f_0(x)\). That is, among all comparable values, \(x\) is the smallest.

A vector optimization problem usually has many Pareto optimal values. Every Pareto optimal value lies in the boundary of the set of feasible objective values.

Scalarization is a technique to find Pareto optimal values.

Chose any vector \(\lambda \succ_K 0\) and transform the objective function into

`minimize`

\(f_0(x) = \lambda^T f_0(x)\)
If \(x^*\) is the optimal value of this problem, then \(x^*\) is a Pareto optimal for the original vector optimization problem.

Vector \(\lambda\) is called the **weight vector**, satisfying \(\lambda \succ_K 0\). It is a free parameter which we can vary to obtain different Pareto optimal solutions. However, there might be Pareto optimal solutions that cannot be found using scalarization.