You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An important issue I have thought about and discussed in various places is the problem of doing likelihood based inference in Omega.
Context
Basically, all probabilistic programming languages, including Turing, Pyro, and Gen are based primarily on likelihood based inference. In practice, this means that conditions they are able to do inference are of the form model | X = data, where X is a random variable with a known density function or mass function, for example, a multivariate Normal distribution.
Some languages impose this restriction through the syntax, type system, or by error messages. For example they may only allow statements of the form condition(X, data) where X miust belong to a built-in set of primitives.
This restriction excludes the vast majority of models: in particular it exclude problems of the form model | f(X) = data. A simple example of this is X ~ Normal(0, 1), Y ~ Normal(0,1), (X, Y) | X + Y == 1
Problems of the form model | f(y) = data are excluded because they can be difficult for a number of reasons:
Inversion/consistency problem: Finding values of variables in the model that are consistent with the condition can be a hard constraint problem---it requires inverting f, which may be arbitrarily hard without constraints on f.
Integration problem: Even we can find consistent values, as shown by Bayes rule, computing the conditional density can require an intractable integration. In essence, we have to integrate over all the possible ways that f(y) == data are true.
Question: Most probabilistic programming languages allow you to make very complex probabilistic simulations
Design Goals
Omega is the programming language analog of a causal graphical model, whereas the aforementioned PPLs are analogs of Bayesian networks. The advantage of this for Omega is that it allows for simple and composable causal reasoning, but conventional (likelihood based) conditioning is less well understood and developed.
What do we want?
Within Omega semantics, support conventional likelihood based inference
Provide an interface for users to manually specify likelihoods
Have everything continue to be fast
Ideally, have everything be simple
The inversion problem
As mentioned, problems of the form model | f(y) = data require inversion. Moreover, when we look at fundamental probabilistic inference, it appears that all forms of conditioning---even the conventional case---require inverting. However, conventional statistics do not talk about inversion. What is going on?
Consider the a normal-normal model
X ~ Normal(0, 1)
Y ~ Normal(X, 1)
Now suppose we want the posterior $p(X | Y = y)$ where for example $y = 1$. On the one hand, Y is a random variable, and conditioning it to be y means to eliminate all of the sample space on which the condition $y = 1$ Is not true.
However most systems do not explicitly represent a random variable as a function, and a PPL will apply the chain rule to compute:
$$p(X = x)p(Y = y | X = x)$$
Both of these terms can be computed tractably. In Julia, it would be
using Distributions
functionposterior_x(x)
y =1
X =Normal(0, 1)
Y =Normal(x, 1)
pdf(X, x) *pdf(Y, y)
end
So no explicit inversion is required. We might wonder where the problem has been hidden. If inference is about finding consistent values---or eliminating inconsistent ones---why does it seem like the conventional inference problem is unconstrained? I'll get to that later.
In Omega, however, random variables are literally functions and the inversion problem is all up in our face.
Here's a how this places out in Omega:
In OmegaCore, you would express the model as this as:
μ =1~Normal(0, 1)
x =2~Normal.(μ, 1)
x_ =0.123
μc = μ |ᶜ x .== x_
There is some amount of syntactic sugar going on here. The functions constructed in explicit form would be:
The inference problem is how to generate values according the conditional distribution, which is simply the prior distribution but excluding all those which violate the constraint.
How can we generate values that satisfy the conditions?
In the example, the output of x is conditioned to be x_ = 5. We can then see the following equation:
x(ω) = x_ = n(2, ω, μ(ω), 1)
Expanding out n, we get
x_ = q(stdunif(2, ω)) * 1 + μ(ω)
Rearranging:
(x_ - μ(ω)) / σ_) = q(stdunif(2, ω))
And finally, defining c as the inverse of q, and hence the cdf of the standard normal c(x) = cdf(Normal(0, 1), x), and substituting in \sigma = 1` we get:
stdunif(2, ω) = c((x_ - μ(ω)) / 1))
So, given that x == x_, the value of stdunif(2, ω), and hence the value of ω[2] is (x_ - μ(ω)) / 1). This means we have a parametric representation of the subset of the sample space that is consistent with our conditions.
"return ω such consistent with `x == x_` as parametric function of `v \in [0,1]`"functionconsistentω(v, x_)
# Get value for μ_
partialω =Dict(1=> v)
μ_ =μ(partialω)
# invert
ω =Dict(1=> v, 2=>c(x_ - μ_))
end
This points to a way forward to implementing conventional conditioning in Omega, namely: if you condition model | X = x, what we need to be able to do is invert the X = x under the output being true. But the critical hypothesis is:
For all the primitive distributions under which conventional inference in tractable in conventional inference, the inversion problem is also tractable.
This is true for inverse transform families, since the inversion procedure is just the forward transform
So, do the following procedure:
Invert from conditions back to exogenous variables / sample space
Compute joint pdf on the exogenous variables directly
All of the practical implementation issues come from the fact that inversion is not something that normal programming languages want you to do.
Possibilities:
-- Modulate the forward execution. By intercepting calls of the form x(\omega) (as we already do for causal interventions and a few other things), we can do a poor-man's inversion.
Let's continue with a simpler example:
x =1~ StdNormalFloat64}()
y =1~StdNormal{Float64}()
z = x .+ y
cnd(@joint(x, y), z .==1.0)
So, if I was to solve this manually, I'd do:
functiongen_samples()
using Random
using Distributions
n =1_000_000
z =1.0pinvert(θ) = z - θ, θ
functionlogdensity(θ_)
x, y =pinvert(θ_)
logpdf(Normal(0, 1), θ_) +logpdf(Normal(0, 1), y) +logpdf(Normal(0, 1), x)
endpropose_and_logratio(ω, x) =rand(ω, Normal(x, 0.1)), 0.0
rng = Random.MersenneTwister(0)
samples =mh(rng, logdensity, n, 0.5, propose_and_logratio)
pinvert.(samples)
end
Observations:
We haven't run the model forward at all
We're doing inference over a one-dimensional space
mh(rng, logdensity, n, state_init::X, propose_and_logratio;
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
An important issue I have thought about and discussed in various places is the problem of doing likelihood based inference in Omega.
Context
Basically, all probabilistic programming languages, including Turing, Pyro, and Gen are based primarily on likelihood based inference. In practice, this means that conditions they are able to do inference are of the form
model | X = data
, whereX
is a random variable with a known density function or mass function, for example, a multivariate Normal distribution.Some languages impose this restriction through the syntax, type system, or by error messages. For example they may only allow statements of the form
condition(X, data)
whereX
miust belong to a built-in set of primitives.This restriction excludes the vast majority of models: in particular it exclude problems of the form
model | f(X) = data
. A simple example of this isX ~ Normal(0, 1), Y ~ Normal(0,1), (X, Y) | X + Y == 1
Problems of the form
model | f(y) = data
are excluded because they can be difficult for a number of reasons:f
, which may be arbitrarily hard without constraints onf
.f(y) == data
are true.Question: Most probabilistic programming languages allow you to make very complex probabilistic simulations
Design Goals
Omega is the programming language analog of a causal graphical model, whereas the aforementioned PPLs are analogs of Bayesian networks. The advantage of this for Omega is that it allows for simple and composable causal reasoning, but conventional (likelihood based) conditioning is less well understood and developed.
What do we want?
The inversion problem
As mentioned, problems of the form
model | f(y) = data
require inversion. Moreover, when we look at fundamental probabilistic inference, it appears that all forms of conditioning---even the conventional case---require inverting. However, conventional statistics do not talk about inversion. What is going on?Consider the a normal-normal model
Now suppose we want the posterior$p(X | Y = y)$ where for example $y = 1$ . On the one hand, $y = 1$ Is not true.
Y
is a random variable, and conditioning it to bey
means to eliminate all of the sample space on which the conditionHowever most systems do not explicitly represent a random variable as a function, and a PPL will apply the chain rule to compute:
Both of these terms can be computed tractably. In Julia, it would be
So no explicit inversion is required. We might wonder where the problem has been hidden. If inference is about finding consistent values---or eliminating inconsistent ones---why does it seem like the conventional inference problem is unconstrained? I'll get to that later.
In Omega, however, random variables are literally functions and the inversion problem is all up in our face.
Here's a how this places out in Omega:
In OmegaCore, you would express the model as this as:
There is some amount of syntactic sugar going on here. The functions constructed in explicit form would be:
The inference problem is how to generate values according the conditional distribution, which is simply the prior distribution but excluding all those which violate the constraint.
How can we generate values that satisfy the conditions?
In the example, the output of
x
is conditioned to bex_ = 5
. We can then see the following equation:Expanding out
n
, we getRearranging:
And finally, defining
c
as the inverse ofq
, and hence the cdf of the standard normalc(x) = cdf(Normal(0, 1), x), and substituting in
\sigma = 1` we get:So, given that
x == x_
, the value ofstdunif(2, ω)
, and hence the value ofω[2]
is(x_ - μ(ω)) / 1)
. This means we have a parametric representation of the subset of the sample space that is consistent with our conditions.Let's plot it:
And for
x_ = 2.0
Connecting The Two
So key thing:
Implementation
This points to a way forward to implementing conventional conditioning in Omega, namely: if you condition
model | X = x
, what we need to be able to do is invert theX = x
under the output being true. But the critical hypothesis is:All of the practical implementation issues come from the fact that inversion is not something that normal programming languages want you to do.
Possibilities:
-- Modulate the forward execution. By intercepting calls of the form
x(\omega)
(as we already do for causal interventions and a few other things), we can do a poor-man's inversion.Let's continue with a simpler example:
So, if I was to solve this manually, I'd do:
Observations:
mh(rng, logdensity, n, state_init::X, propose_and_logratio;
Beta Was this translation helpful? Give feedback.
All reactions