Logpdfs #215

zenna · 2022-06-17T16:51:17Z

zenna
Jun 17, 2022
Maintainer

An important issue I have thought about and discussed in various places is the problem of doing likelihood based inference in Omega.

Context

Basically, all probabilistic programming languages, including Turing, Pyro, and Gen are based primarily on likelihood based inference. In practice, this means that conditions they are able to do inference are of the form model | X = data, where X is a random variable with a known density function or mass function, for example, a multivariate Normal distribution.

Some languages impose this restriction through the syntax, type system, or by error messages. For example they may only allow statements of the form condition(X, data) where X miust belong to a built-in set of primitives.

This restriction excludes the vast majority of models: in particular it exclude problems of the form model | f(X) = data. A simple example of this is X ~ Normal(0, 1), Y ~ Normal(0,1), (X, Y) | X + Y == 1

Problems of the form model | f(y) = data are excluded because they can be difficult for a number of reasons:

Inversion/consistency problem: Finding values of variables in the model that are consistent with the condition can be a hard constraint problem---it requires inverting f, which may be arbitrarily hard without constraints on f.
Integration problem: Even we can find consistent values, as shown by Bayes rule, computing the conditional density can require an intractable integration. In essence, we have to integrate over all the possible ways that f(y) == data are true.

Question: Most probabilistic programming languages allow you to make very complex probabilistic simulations

Design Goals

Omega is the programming language analog of a causal graphical model, whereas the aforementioned PPLs are analogs of Bayesian networks. The advantage of this for Omega is that it allows for simple and composable causal reasoning, but conventional (likelihood based) conditioning is less well understood and developed.

What do we want?

Within Omega semantics, support conventional likelihood based inference
Provide an interface for users to manually specify likelihoods
Have everything continue to be fast
Ideally, have everything be simple

The inversion problem

As mentioned, problems of the form model | f(y) = data require inversion. Moreover, when we look at fundamental probabilistic inference, it appears that all forms of conditioning---even the conventional case---require inverting. However, conventional statistics do not talk about inversion. What is going on?

Consider the a normal-normal model

X ~ Normal(0, 1)
Y ~ Normal(X, 1)

Now suppose we want the posterior $p(X | Y = y)$ where for example $y = 1$. On the one hand, Y is a random variable, and conditioning it to be y means to eliminate all of the sample space on which the condition $y = 1$ Is not true.

However most systems do not explicitly represent a random variable as a function, and a PPL will apply the chain rule to compute:

$$p(X = x)p(Y = y | X = x)$$

Both of these terms can be computed tractably. In Julia, it would be

using Distributions
function posterior_x(x)
  y = 1
  X = Normal(0, 1)
  Y = Normal(x, 1)
  pdf(X, x) * pdf(Y, y)
end

So no explicit inversion is required. We might wonder where the problem has been hidden. If inference is about finding consistent values---or eliminating inconsistent ones---why does it seem like the conventional inference problem is unconstrained? I'll get to that later.

In Omega, however, random variables are literally functions and the inversion problem is all up in our face.

Here's a how this places out in Omega:

In OmegaCore, you would express the model as this as:

μ = 1 ~ Normal(0, 1)
x = 2 ~ Normal.(μ, 1)
x_ = 0.123
μc = μ |ᶜ x .== x_

There is some amount of syntactic sugar going on here. The functions constructed in explicit form would be:

using Distributions: Normal, quantile, cdf

"std-normal inverse cdf"
q(x) = quantile(Normal(0, 1), x)

# Let's assume ω : Int -> [9, 1]
stdunif(id, ω) = ω[id]

n(id, ω, μ_, σ_) = q(stdunif(id, ω)) *  σ_ + + μ_

μ(ω) = n(1, ω, 0, 1)
x(ω) = n(2, ω, μ(ω), 1)
μc(ω) = x(ω) == x_ ? μ(ω) : error()

The inference problem is how to generate values according the conditional distribution, which is simply the prior distribution but excluding all those which violate the constraint.

How can we generate values that satisfy the conditions?
In the example, the output of x is conditioned to be x_ = 5. We can then see the following equation:

x(ω) = x_ = n(2, ω, μ(ω), 1)

Expanding out n, we get

x_ = q(stdunif(2, ω)) *  1 + μ(ω)

Rearranging:

(x_ - μ(ω)) / σ_) = q(stdunif(2, ω))

And finally, defining c as the inverse of q, and hence the cdf of the standard normal c(x) = cdf(Normal(0, 1), x), and substituting in \sigma = 1` we get:

stdunif(2, ω) = c((x_ - μ(ω)) / 1))

So, given that x == x_, the value of stdunif(2, ω), and hence the value of ω[2] is (x_ - μ(ω)) / 1). This means we have a parametric representation of the subset of the sample space that is consistent with our conditions.

"return ω such consistent with `x == x_` as parametric function of `v \in [0,1]`"
function consistentω(v, x_)
  # Get value for μ_
  partialω = Dict(1 => v)
  μ_ = μ(partialω)

  # invert
  ω = Dict(1 => v, 2 => c(x_ - μ_))
end

Let's plot it:

using Plots
x_ = 0.1234
ω1(v) = consistentω(v, x_)[1]
ω2(v) = consistentω(v, x_)[2]
plot(ω1, ω2, 0, 1, leg=false)

And for x_ = 2.0

Connecting The Two

So key thing:

Implementation

This points to a way forward to implementing conventional conditioning in Omega, namely: if you condition model | X = x, what we need to be able to do is invert the X = x under the output being true. But the critical hypothesis is:

For all the primitive distributions under which conventional inference in tractable in conventional inference, the inversion problem is also tractable.
- This is true for inverse transform families, since the inversion procedure is just the forward transform
So, do the following procedure:
- Invert from conditions back to exogenous variables / sample space
- Compute joint pdf on the exogenous variables directly

All of the practical implementation issues come from the fact that inversion is not something that normal programming languages want you to do.

Possibilities:
-- Modulate the forward execution. By intercepting calls of the form x(\omega) (as we already do for causal interventions and a few other things), we can do a poor-man's inversion.

Let's continue with a simpler example:

x = 1 ~ StdNormalFloat64}()
y = 1 ~ StdNormal{Float64}()
z = x .+ y
cnd(@joint(x, y), z .== 1.0)

So, if I was to solve this manually, I'd do:

function gen_samples()
  using Random
  using Distributions
  n = 1_000_000
  z = 1.0
  pinvert(θ) = z - θ, θ
  function logdensity(θ_)
      x, y = pinvert(θ_)
      logpdf(Normal(0, 1), θ_) + logpdf(Normal(0, 1), y) + logpdf(Normal(0, 1), x)
  end
  propose_and_logratio(ω, x) = rand(ω, Normal(x, 0.1)), 0.0 
  rng = Random.MersenneTwister(0)
  samples = mh(rng, logdensity, n, 0.5, propose_and_logratio)
  pinvert.(samples)
end

Observations:

We haven't run the model forward at all
We're doing inference over a one-dimensional space

mh(rng, logdensity, n, state_init::X, propose_and_logratio;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logpdfs #215

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Logpdfs #215

zenna Jun 17, 2022 Maintainer

Context

Design Goals

The inversion problem

Connecting The Two

Implementation

Replies: 0 comments

zenna
Jun 17, 2022
Maintainer