Skip to content

Commit

Permalink
Add logistic regression tutorial from JuMPTutorials (#2499)
Browse files Browse the repository at this point in the history
  • Loading branch information
frapac authored Mar 2, 2021
1 parent 749c828 commit 442aee8
Show file tree
Hide file tree
Showing 2 changed files with 263 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ const _TUTORIAL_SUBDIR = [
"Mixed-integer linear programs",
"Nonlinear programs",
"Quadratic programs",
"Conic programs",
"Semidefinite programs",
"Optimization concepts",
]
Expand Down
262 changes: 262 additions & 0 deletions docs/src/tutorials/Conic programs/logistic_regression.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
# MIT License #src
# #src
# Copyright (c) 2020 François Pacaud #src
# #src
# Permission is hereby granted, free of charge, to any person obtaining a copy #src
# of this software and associated documentation files (the "Software"), to deal #src
# in the Software without restriction, including without limitation the rights #src
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell #src
# copies of the Software, and to permit persons to whom the Software is #src
# furnished to do so, subject to the following conditions: #src
# #src
# The above copyright notice and this permission notice shall be included in all #src
# copies or substantial portions of the Software. #src
# #src
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR #src
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, #src
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE #src
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER #src
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, #src
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE #src
# SOFTWARE. #src

# # Logistic Regression

# **Originally Contributed by**: François Pacaud

# This tutorial shows how to solve a logistic regression problem
# with JuMP. Logistic regression is a well known method in machine learning,
# useful when we want to classify binary variables with the help of
# a given set of features. To this goal,
# we find the optimal combination of features maximizing
# the (log)-likelihood onto a training set. From a modern optimization glance,
# the resulting problem is convex and differentiable. On a modern optimization
# glance, it is even conic representable.
#
# ## Formulating the logistic regression problem
#
# Suppose we have a set of training data-point $i = 1, \cdots, n$, where
# for each $i$ we have a vector of features $x_i \in \mathbb{R}^p$ and a
# categorical observation $y_i \in \{-1, 1\}$.
#
# The log-likelihood is given by
#
# ```math
# l(\theta) = \sum_{i=1}^n \log(\dfrac{1}{1 + \exp(-y_i \theta^\top x_i)})
# ```
#
# and the optimal $\theta$ minimizes the logistic loss function:
#
# ```math
# \min_{\theta}\; \sum_{i=1}^n \log(1 + \exp(-y_i \theta^\top x_i)).
# ```
#
# Most of the time, instead of solving directly the previous optimization problem, we
# prefer to add a regularization term:
#
# ```math
# \min_{\theta}\; \sum_{i=1}^n \log(1 + \exp(-y_i \theta^\top x_i)) + \lambda \| \theta \|
# ```
#
# with $\lambda \in \mathbb{R}_+$ a penalty and $\|.\|$ a norm function. By adding
# such a regularization term, we avoid overfitting on the training set and usually
# achieve a greater score in cross-validation.

# ## Reformulation as a conic optimization problem
#
# By introducing auxiliary variables $t_1, \cdots, t_n$ and $r$,
# the optimization problem is equivalent to
#
# ```math
# \begin{aligned}
# \min_{t, r, \theta} \;& \sum_{i=1}^n t_i + \lambda r \\
# \text{subject to } & \quad t_i \geq \log(1 + \exp(- y_i \theta^\top x_i)) \\
# & \quad r \geq \|\theta\|
# \end{aligned}
# ```
#
# Now, the trick is to reformulate the constraints $t_i \geq \log(1 + \exp(- y_i \theta^\top x_i))$
# with the help of the *exponential cone*
#
# ```math
# K_{exp} = \{ (x, y, z) \in \mathbb{R}^3 : \; y \exp(x / y) \leq z \} .
# ```
#
# Indeed, by passing to the exponential, we
# see that for all $i=1, \cdots, n$, the constraint $t_i \geq \log(1 + \exp(- y_i \theta^\top x_i))$
# is equivalent to
#
# ```math
# \exp(-t_i) + \exp(u_i - t_i) \leq 1
# ```
#
# with $u_i = -y_i \theta^\top x_i$. Then, by adding two auxiliary variables
# $z_{i1}$ and $z_{i2}$ such that $z_{i1} \geq \exp(u_i-t_i)$ and $z_{i2} \geq \exp(-t_i)$, we get
# the equivalent formulation
#
# ```math
# \left\{
# \begin{aligned}
# (u_i -t_i , 1, z_{i1}) & \in K_{exp} \\
# (-t_i , 1, z_{i2}) & \in K_{exp} \\
# z_{i1} + z_{i2} & \leq 1
# \end{aligned}
# \right.
# ```
#
# In this setting, the conic version of the logistic regression problems writes out
#
# ```math
# \begin{aligned}
# \min_{t, z, r, \theta}& \; \sum_{i=1}^n t_i + \lambda r \\
# \text{subject to } & \quad (u_i -t_i , 1, z_{i1}) \in K_{exp} \\
# & \quad (-t_i , 1, z_{i2}) \in K_{exp} \\
# & \quad z_{i1} + z_{i2} \leq 1 \\
# & \quad u_i = -y_i x_i^\top \theta \\
# & \quad r \geq \|\theta\|
# \end{aligned}
# ```
#
# and thus encompasses $3n + p + 1$ variables and $3n + 1$ constraints ($u_i = -y_i \theta^\top x_i$
# is only a virtual constraint used to clarify the notation).
# Thus, if $n \gg 1$, we get a large number of variables and constraints.


# ## Fitting logistic regression with a conic solver
#
# It is now time to pass to the implementation. We choose SCS as a conic solver.
using JuMP
import Random
import SCS

Random.seed!(2713);

# We start by implementing a function to generate a fake dataset, and where
# we could tune the correlation between the feature variables. The function
# is a direct transcription of the one used in [this blog post](http://fa.bianp.net/blog/2013/numerical-optimizers-for-logistic-regression/).
function generate_dataset(n_samples=100, n_features=10; shift=0.0)
X = randn(n_samples, n_features)
w = randn(n_features)
y = sign.(X * w)
X .+= 0.8 * randn(n_samples, n_features) # add noise
X .+= shift # shift the points in the feature space
X = hcat(X, ones(n_samples, 1))
return X, y
end

# We write a `softplus` function to formulate each constraint
# $t \geq \log(1 + \exp(u))$ with two exponential cones.
function softplus(model, t, u)
z = @variable(model, [1:2], lower_bound=0.0)
@constraint(model, sum(z) <= 1.0)
@constraint(model, [u - t, 1, z[1]] in MOI.ExponentialCone())
@constraint(model, [-t, 1, z[2]] in MOI.ExponentialCone())
end

# ### $\ell_2$ regularized logistic regression
# Then, with the help of the `softplus` function, we could write our
# optimization model. In the $\ell_2$ regularization case, the constraint
# $r \geq \|\theta\|_2$ rewrites as a second order cone constraint.
function build_logit_model(X, y, λ)
n, p = size(X)
model = Model()
@variable(model, θ[1:p])
@variable(model, t[1:n])
for i in 1:n
u = - (X[i, :]' * θ) * y[i]
softplus(model, t[i], u)
end
## Add ℓ2 regularization
@variable(model, 0.0 <= reg)
@constraint(model, [reg; θ] in MOI.SecondOrderCone(p+1))
## Define objective
@objective(model, Min, sum(t) + λ * reg)
return model
end

# We generate the dataset.
#
# !!! warning
# Be careful here, for large n and p SCS could fail to converge!
#
n, p = 200, 10
X, y = generate_dataset(n, p, shift=10.0);

## We could now solve the logistic regression problem
λ = 10.0
model = build_logit_model(X, y, λ)
set_optimizer(model, SCS.Optimizer)
JuMP.optimize!(model)

#-

θ♯ = JuMP.value.(model[])

# It appears that the speed of convergence is not that impacted by the correlation
# of the dataset, nor by the penalty $\lambda$.


# ### $\ell_1$ regularized logistic regression
#
# We now formulate the logistic problem with a $\ell_1$ regularization term.
# The $\ell_1$ regularization ensures sparsity in the optimal
# solution of the resulting optimization problem. Luckily, the $\ell_1$ norm
# is implemented as a set in `MathOptInterface`. Thus, we could easily formulate
# the sparse logistic regression problem with the help of a `MOI.NormOneCone`
# set.
function build_sparse_logit_model(X, y, λ)
n, p = size(X)
model = Model()
@variable(model, θ[1:p])
@variable(model, t[1:n])
for i in 1:n
u = - (X[i, :]' * θ) * y[i]
softplus(model, t[i], u)
end
## Add ℓ1 regularization
@variable(model, 0.0 <= reg)
@constraint(model, [reg; θ] in MOI.NormOneCone(p+1))
## Define objective
@objective(model, Min, sum(t) + λ * reg)
return model
end

## Auxiliary function to count non-null components:
count_nonzero(v::Vector; tol=1e-6) = sum(abs.(v) .>= tol)

## We solve the sparse logistic regression problem on the same dataset as before.
λ = 10.0
sparse_model = build_sparse_logit_model(X, y, λ)
set_optimizer(sparse_model, SCS.Optimizer)
JuMP.optimize!(sparse_model)

#-

θ♯ = JuMP.value.(sparse_model[])
println("Number of non-zero components: ", count_nonzero(θ♯),
" (out of ", p, " features)")


# ### Extensions
# A direct extension would be to consider the sparse logistic regression with
# *hard* thresholding, which, on contrary to the *soft* version using a $\ell_1$ regularization,
# adds an explicit cardinality constraint in its formulation:
#
# ```math
# \begin{aligned}
# \min_{\theta} & \; \sum_{i=1}^n \log(1 + \exp(-y_i \theta^\top x_i)) + \lambda \| \theta \|_2^2 \\
# \text{subject to } & \quad \| \theta \|_0 <= k
# \end{aligned}
# ```
#
# where $k$ is the maximum number of non-zero components in the vector $\theta$,
# and $\|.\|_0$ is the $\ell_0$ pseudo-norm:
#
# ```math
# \| x\|_0 = \#\{i : \; x_i \neq 0\}
# ```
#
# The cardinality constraint $\|\theta\|_0 \leq k$ could be reformulated with
# binary variables. Thus the hard sparse regression problem could be solved
# by any solver supporting mixed integer conic problems.
#

0 comments on commit 442aee8

Please sign in to comment.