From 300d6332345846402b7443df9eb9dd0aac18e3a9 Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Wed, 9 Oct 2024 10:55:57 +0000 Subject: [PATCH] build based on cc0aa2c --- dev/.documenter-siteinfo.json | 2 +- dev/API/architectures/index.html | 24 ++++++++++----------- dev/API/core/index.html | 30 +++++++++++++-------------- dev/API/index.html | 2 +- dev/API/loss/index.html | 6 +++--- dev/API/simulation/index.html | 12 +++++------ dev/API/utility/index.html | 28 ++++++++++++------------- dev/framework/index.html | 2 +- dev/index.html | 2 +- dev/workflow/advancedusage/index.html | 2 +- dev/workflow/examples/index.html | 2 +- dev/workflow/overview/index.html | 2 +- 12 files changed, 57 insertions(+), 57 deletions(-) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 44a7c9b..01bd5c5 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2024-10-09T09:27:30","documenter_version":"1.7.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2024-10-09T10:55:54","documenter_version":"1.7.0"}} \ No newline at end of file diff --git a/dev/API/architectures/index.html b/dev/API/architectures/index.html index 249595b..242e801 100644 --- a/dev/API/architectures/index.html +++ b/dev/API/architectures/index.html @@ -23,7 +23,7 @@ ϕ = Chain(Dense(qₜ + qₛ + qₓ, w, relu), Dense(w, p)) ds = DeepSet(ψ, ϕ; S = S) x = [rand32(qₓ) for _ ∈ eachindex(Z)] -ds((Z, x))source
NeuralEstimators.GNNSummaryType
GNNSummary(propagation, readout; globalfeatures = nothing)

A graph neural network (GNN) module designed to serve as the summary network ψ in the DeepSet representation when the data are graphical (e.g., irregularly observed spatial data).

The propagation module transforms graphical input data into a set of hidden-feature graphs. The readout module aggregates these feature graphs into a single hidden feature vector of fixed length (i.e., a vector of summary statistics). The summary network is then defined as the composition of the propagation and readout modules.

Optionally, one may also include a module that extracts features directly from the graph, through the keyword argument globalfeatures. This module, when applied to a GNNGraph, should return a matrix of features, where the columns of the matrix correspond to the independent replicates (e.g., a 5x10 matrix is expected for 5 hidden features for each of 10 independent replicates stored in the graph).

The data should be stored as a GNNGraph or Vector{GNNGraph}, where each graph is associated with a single parameter vector. The graphs may contain subgraphs corresponding to independent replicates.

Examples

using NeuralEstimators, Flux, GraphNeuralNetworks
+ds((Z, x))
source
NeuralEstimators.GNNSummaryType
GNNSummary(propagation, readout; globalfeatures = nothing)

A graph neural network (GNN) module designed to serve as the summary network ψ in the DeepSet representation when the data are graphical (e.g., irregularly observed spatial data).

The propagation module transforms graphical input data into a set of hidden-feature graphs. The readout module aggregates these feature graphs into a single hidden feature vector of fixed length (i.e., a vector of summary statistics). The summary network is then defined as the composition of the propagation and readout modules.

Optionally, one may also include a module that extracts features directly from the graph, through the keyword argument globalfeatures. This module, when applied to a GNNGraph, should return a matrix of features, where the columns of the matrix correspond to the independent replicates (e.g., a 5x10 matrix is expected for 5 hidden features for each of 10 independent replicates stored in the graph).

The data should be stored as a GNNGraph or Vector{GNNGraph}, where each graph is associated with a single parameter vector. The graphs may contain subgraphs corresponding to independent replicates.

Examples

using NeuralEstimators, Flux, GraphNeuralNetworks
 using Flux: batch
 using Statistics: mean
 
@@ -55,11 +55,11 @@
 g₃ = batch([g₁, g₂])
 θ̂(g₁)
 θ̂(g₃)
-θ̂([g₁, g₂, g₃])
source

User-defined summary statistics

The following functions correspond to summary statistics that are often useful as user-defined summary statistics in DeepSet objects.

NeuralEstimators.samplesizeFunction
samplesize(Z::AbstractArray)

Computes the sample size of a set of independent realisations Z.

Note that this function is a wrapper around numberreplicates, but this function returns the number of replicates as the eltype of Z, rather than as an integer.

source
NeuralEstimators.samplecorrelationFunction
samplecorrelation(Z::AbstractArray)

Computes the sample correlation matrix, R̂, and returns the vectorised strict lower triangle of R̂.

Examples

# 5 independent replicates of a 3-dimensional vector
+θ̂([g₁, g₂, g₃])
source

User-defined summary statistics

The following functions correspond to summary statistics that are often useful as user-defined summary statistics in DeepSet objects.

NeuralEstimators.samplesizeFunction
samplesize(Z::AbstractArray)

Computes the sample size of a set of independent realisations Z.

Note that this function is a wrapper around numberreplicates, but this function returns the number of replicates as the eltype of Z, rather than as an integer.

source
NeuralEstimators.samplecorrelationFunction
samplecorrelation(Z::AbstractArray)

Computes the sample correlation matrix, R̂, and returns the vectorised strict lower triangle of R̂.

Examples

# 5 independent replicates of a 3-dimensional vector
 z = rand(3, 5)
-samplecorrelation(z)
source
NeuralEstimators.samplecovarianceFunction
samplecovariance(Z::AbstractArray)

Computes the sample covariance matrix, Σ̂, and returns the vectorised lower triangle of Σ̂.

Examples

# 5 independent replicates of a 3-dimensional vector
+samplecorrelation(z)
source
NeuralEstimators.samplecovarianceFunction
samplecovariance(Z::AbstractArray)

Computes the sample covariance matrix, Σ̂, and returns the vectorised lower triangle of Σ̂.

Examples

# 5 independent replicates of a 3-dimensional vector
 z = rand(3, 5)
-samplecovariance(z)
source
NeuralEstimators.NeighbourhoodVariogramType
NeighbourhoodVariogram(h_max, n_bins) 
+samplecovariance(z)
source
NeuralEstimators.NeighbourhoodVariogramType
NeighbourhoodVariogram(h_max, n_bins) 
 (l::NeighbourhoodVariogram)(g::GNNGraph)

Computes the empirical variogram,

\[\hat{\gamma}(h \pm \delta) = \frac{1}{2|N(h \pm \delta)|} \sum_{(i,j) \in N(h \pm \delta)} (Z_i - Z_j)^2\]

where $N(h \pm \delta) \equiv \left\{(i,j) : \|\boldsymbol{s}_i - \boldsymbol{s}_j\| \in (h-\delta, h+\delta)\right\}$ is the set of pairs of locations separated by a distance within $(h-\delta, h+\delta)$, and $|\cdot|$ denotes set cardinality.

The distance bins are constructed to have constant width $2\delta$, chosen based on the maximum distance h_max to be considered, and the specified number of bins n_bins.

The input type is a GNNGraph, and the empirical variogram is computed based on the corresponding graph structure. Specifically, only locations that are considered neighbours will be used when computing the empirical variogram.

Examples

using NeuralEstimators, Distances, LinearAlgebra
   
 # Simulate Gaussian spatial data with exponential covariance function 
@@ -80,12 +80,12 @@
 nv = NeighbourhoodVariogram(r, 10) 
 
 # Compute the empirical variogram 
-nv(g)
source

Layers

In addition to the built-in layers provided by Flux, the following layers may be used when constructing a neural-network architecture.

NeuralEstimators.DensePositiveType
DensePositive(layer::Dense, g::Function)
+nv(g)
source

Layers

In addition to the built-in layers provided by Flux, the following layers may be used when constructing a neural-network architecture.

NeuralEstimators.DensePositiveType
DensePositive(layer::Dense, g::Function)
 DensePositive(layer::Dense; g::Function = Flux.relu)

Wrapper around the standard Dense layer that ensures positive weights (biases are left unconstrained).

This layer can be useful for constucting (partially) monotonic neural networks (see, e.g., QuantileEstimatorContinuous).

Examples

using NeuralEstimators, Flux
 
 layer = DensePositive(Dense(5 => 2))
 x = rand32(5, 64)
-layer(x)
source
NeuralEstimators.PowerDifferenceType
PowerDifference(a, b)

Function $f(x, y) = |ax - (1-a)y|^b$ for trainable parameters a ∈ [0, 1] and b > 0.

Examples

using NeuralEstimators, Flux
+layer(x)
source
NeuralEstimators.PowerDifferenceType
PowerDifference(a, b)

Function $f(x, y) = |ax - (1-a)y|^b$ for trainable parameters a ∈ [0, 1] and b > 0.

Examples

using NeuralEstimators, Flux
 
 # Generate some data
 d = 5
@@ -114,10 +114,10 @@
 
 # Estimates of a and b
 f.a
-f.b
source
NeuralEstimators.ResidualBlockType
ResidualBlock(filter, in => out; stride = 1)

Basic residual block (see here), consisting of two sequential convolutional layers and a skip (shortcut) connection that connects the input of the block directly to the output, facilitating the training of deep networks.

Examples

using NeuralEstimators
+f.b
source
NeuralEstimators.ResidualBlockType
ResidualBlock(filter, in => out; stride = 1)

Basic residual block (see here), consisting of two sequential convolutional layers and a skip (shortcut) connection that connects the input of the block directly to the output, facilitating the training of deep networks.

Examples

using NeuralEstimators
 z = rand(16, 16, 1, 1)
 b = ResidualBlock((3, 3), 1 => 32)
-b(z)
source
NeuralEstimators.SpatialGraphConvType
SpatialGraphConv(in => out, g=relu; args...)

Implements a spatial graph convolution for isotropic processes,

\[ \boldsymbol{h}^{(l)}_{j} = +b(z)

source
NeuralEstimators.SpatialGraphConvType
SpatialGraphConv(in => out, g=relu; args...)

Implements a spatial graph convolution for isotropic processes,

\[ \boldsymbol{h}^{(l)}_{j} = g\Big( \boldsymbol{\Gamma}_{\!1}^{(l)} \boldsymbol{h}^{(l-1)}_{j} + @@ -138,7 +138,7 @@ # Construct and apply spatial graph convolution layer l = SpatialGraphConv(1 => 10) -l(g)

source

Output activation functions

In addition to the standard activation functions provided by Flux, the following structs can be used at the end of an architecture to act as output activation functions that ensure valid estimates for certain models. NB: Although we refer to the following objects as "activation functions", they should be treated as layers that are included in the final stage of a Flux Chain().

NeuralEstimators.CompressType
Compress(a, b, k = 1)

Layer that compresses its input to be within the range a and b, where each element of a is less than the corresponding element of b.

The layer uses a logistic function,

\[l(θ) = a + \frac{b - a}{1 + e^{-kθ}},\]

where the arguments a and b together combine to shift and scale the logistic function to the range (a, b), and the growth rate k controls the steepness of the curve.

The logistic function given here contains an additional parameter, θ₀, which is the input value corresponding to the functions midpoint. In Compress, we fix θ₀ = 0, since the output of a randomly initialised neural network is typically around zero.

Examples

using NeuralEstimators, Flux
+l(g)
source

Output activation functions

In addition to the standard activation functions provided by Flux, the following structs can be used at the end of an architecture to act as output activation functions that ensure valid estimates for certain models. NB: Although we refer to the following objects as "activation functions", they should be treated as layers that are included in the final stage of a Flux Chain().

NeuralEstimators.CompressType
Compress(a, b, k = 1)

Layer that compresses its input to be within the range a and b, where each element of a is less than the corresponding element of b.

The layer uses a logistic function,

\[l(θ) = a + \frac{b - a}{1 + e^{-kθ}},\]

where the arguments a and b together combine to shift and scale the logistic function to the range (a, b), and the growth rate k controls the steepness of the curve.

The logistic function given here contains an additional parameter, θ₀, which is the input value corresponding to the functions midpoint. In Compress, we fix θ₀ = 0, since the output of a randomly initialised neural network is typically around zero.

Examples

using NeuralEstimators, Flux
 
 a = [25, 0.5, -pi/2]
 b = [500, 2.5, 0]
@@ -151,7 +151,7 @@
 n = 20
 θ̂ = Chain(Dense(n, p), l)
 Z = randn(n, K)
-θ̂(Z)
source
NeuralEstimators.CorrelationMatrixType
CorrelationMatrix(d)
+θ̂(Z)
source
NeuralEstimators.CorrelationMatrixType
CorrelationMatrix(d)
 (object::CorrelationMatrix)(x::Matrix, cholesky::Bool = false)

Transforms a vector 𝐯 ∈ ℝᵈ to the parameters of an unconstrained d×d correlation matrix or, if cholesky = true, the lower Cholesky factor of an unconstrained d×d correlation matrix.

The expected input is a Matrix with T(d-1) = (d-1)d÷2 rows, where T(d-1) is the (d-1)th triangular number (the number of free parameters in an unconstrained d×d correlation matrix), and the output is a Matrix of the same dimension. The columns of the input and output matrices correspond to independent parameter configurations (i.e., different correlation matrices).

Internally, the layer constructs a valid Cholesky factor 𝐋 for a correlation matrix, and then extracts the strict lower triangle from the correlation matrix 𝐑 = 𝐋𝐋'. The lower triangle is extracted and vectorised in line with Julia's column-major ordering: for example, when modelling the correlation matrix

\[\begin{bmatrix} 1 & R₁₂ & R₁₃ \\ R₂₁ & 1 & R₂₃\\ @@ -187,7 +187,7 @@ L[diagind(L)] .= sqrt.(1 .- rowwisenorm(L).^2) L end -L[1] * L[1]'

source
NeuralEstimators.CovarianceMatrixType
CovarianceMatrix(d)
+L[1] * L[1]'
source
NeuralEstimators.CovarianceMatrixType
CovarianceMatrix(d)
 (object::CovarianceMatrix)(x::Matrix, cholesky::Bool = false)

Transforms a vector 𝐯 ∈ ℝᵈ to the parameters of an unconstrained d×d covariance matrix or, if cholesky = true, the lower Cholesky factor of an unconstrained d×d covariance matrix.

The expected input is a Matrix with T(d) = d(d+1)÷2 rows, where T(d) is the dth triangular number (the number of free parameters in an unconstrained d×d covariance matrix), and the output is a Matrix of the same dimension. The columns of the input and output matrices correspond to independent parameter configurations (i.e., different covariance matrices).

Internally, the layer constructs a valid Cholesky factor 𝐋 and then extracts the lower triangle from the positive-definite covariance matrix 𝚺 = 𝐋𝐋'. The lower triangle is extracted and vectorised in line with Julia's column-major ordering: for example, when modelling the covariance matrix

\[\begin{bmatrix} Σ₁₁ & Σ₁₂ & Σ₁₃ \\ Σ₂₁ & Σ₂₂ & Σ₂₃ \\ @@ -215,4 +215,4 @@ # Obtain the Cholesky factor directly L = l(θ, true) L = [LowerTriangular(cpu(vectotril(x))) for x ∈ eachcol(L)] -L[1] * L[1]'

source
+L[1] * L[1]'source diff --git a/dev/API/core/index.html b/dev/API/core/index.html index d6ed1fb..7b189f7 100644 --- a/dev/API/core/index.html +++ b/dev/API/core/index.html @@ -2,7 +2,7 @@ Core · NeuralEstimators.jl

Core

This page documents the classes and functions that are central to the workflow of NeuralEstimators. Its organisation reflects the order in which these classes and functions appear in a standard implementation; that is, from sampling parameters from the prior distribution, to using a neural Bayes estimator to make inference with observed data sets.

Sampling parameters

Parameters sampled from the prior distribution are stored as a $p \times K$ matrix, where $p$ is the number of parameters in the statistical model and $K$ is the number of parameter vectors sampled from the prior distribution.

It can sometimes be helpful to wrap the parameter matrix in a user-defined type that also stores expensive intermediate objects needed for data simulated (e.g., Cholesky factors). In this case, the user-defined type should be a subtype of the abstract type ParameterConfigurations, whose only requirement is a field θ that stores the matrix of parameters. See Storing expensive intermediate objects for data simulation for further discussion.

NeuralEstimators.ParameterConfigurationsType
ParameterConfigurations

An abstract supertype for user-defined types that store parameters and any intermediate objects needed for data simulation.

The user-defined type must have a field θ that stores the $p$ × $K$ matrix of parameters, where $p$ is the number of parameters in the model and $K$ is the number of parameter vectors sampled from the prior distribution. There are no other restrictions.

See subsetparameters for the generic function for subsetting these objects.

Examples

struct P <: ParameterConfigurations
 	θ
 	# other expensive intermediate objects...
-end
source

Simulating data

NeuralEstimators facilitates neural estimation for arbitrary statistical models by having the user implicitly define their model via simulated data, either as fixed instances or via a function that simulates data from the statistical model.

The data are always stored as a Vector{A}, where each element of the vector corresponds to a data set of $m$ independent replicates associated with one parameter vector (note that $m$ is arbitrary), and where the type A depends on the multivariate structure of the data:

  • For univariate and unstructured multivariate data, A is a $d \times m$ matrix where $d$ is the dimension each replicate (e.g., $d=1$ for univariate data).
  • For data collected over a regular grid, A is a ($N + 2$)-dimensional array, where $N$ is the dimension of the grid (e.g., $N = 1$ for time series, $N = 2$ for two-dimensional spatial grids, etc.). The first $N$ dimensions of the array correspond to the dimensions of the grid; the penultimate dimension stores the so-called "channels" (this dimension is singleton for univariate processes, two for bivariate processes, and so on); and the final dimension stores the independent replicates. For example, to store 50 independent replicates of a bivariate spatial process measured over a 10x15 grid, one would construct an array of dimension 10x15x2x50.
  • For spatial data collected over irregular spatial locations, A is a GNNGraph with independent replicates (possibly with differing spatial locations) stored as subgraphs using the function batch.

Estimators

Several classes of neural estimators are available in the package.

The simplest class is PointEstimator, used for constructing arbitrary mappings from the sample space to the parameter space. When constructing a generic point estimator, the user defines the loss function and therefore the Bayes estimator that will be targeted.

Several classes cater for the estimation of marginal posterior quantiles, based on the quantile loss function (see quantileloss()); in particular, see IntervalEstimator and QuantileEstimatorDiscrete for estimating marginal posterior quantiles for a fixed set of probability levels, and QuantileEstimatorContinuous for estimating marginal posterior quantiles with the probability level as an input to the neural network.

In addition to point estimation, the package also provides the class RatioEstimator for approximating the so-called likelihood-to-evidence ratio. The binary classification problem at the heart of this approach proceeds based on the binary cross-entropy loss.

Users are free to choose the neural-network architecture of these estimators as they see fit (subject to some class-specific requirements), but the package also provides the convenience constructor initialise_estimator().

NeuralEstimators.PointEstimatorType
PointEstimator(deepset::DeepSet)

A neural point estimator, a mapping from the sample space to the parameter space.

The estimator leverages the DeepSet architecture. The only requirement is that number of output neurons in the final layer of the inference network (i.e., the outer network) is equal to the number of parameters in the statistical model.

source

Simulating data

NeuralEstimators facilitates neural estimation for arbitrary statistical models by having the user implicitly define their model via simulated data, either as fixed instances or via a function that simulates data from the statistical model.

The data are always stored as a Vector{A}, where each element of the vector corresponds to a data set of $m$ independent replicates associated with one parameter vector (note that $m$ is arbitrary), and where the type A depends on the multivariate structure of the data:

  • For univariate and unstructured multivariate data, A is a $d \times m$ matrix where $d$ is the dimension each replicate (e.g., $d=1$ for univariate data).
  • For data collected over a regular grid, A is a ($N + 2$)-dimensional array, where $N$ is the dimension of the grid (e.g., $N = 1$ for time series, $N = 2$ for two-dimensional spatial grids, etc.). The first $N$ dimensions of the array correspond to the dimensions of the grid; the penultimate dimension stores the so-called "channels" (this dimension is singleton for univariate processes, two for bivariate processes, and so on); and the final dimension stores the independent replicates. For example, to store 50 independent replicates of a bivariate spatial process measured over a 10x15 grid, one would construct an array of dimension 10x15x2x50.
  • For spatial data collected over irregular spatial locations, A is a GNNGraph with independent replicates (possibly with differing spatial locations) stored as subgraphs using the function batch.

Estimators

Several classes of neural estimators are available in the package.

The simplest class is PointEstimator, used for constructing arbitrary mappings from the sample space to the parameter space. When constructing a generic point estimator, the user defines the loss function and therefore the Bayes estimator that will be targeted.

Several classes cater for the estimation of marginal posterior quantiles, based on the quantile loss function (see quantileloss()); in particular, see IntervalEstimator and QuantileEstimatorDiscrete for estimating marginal posterior quantiles for a fixed set of probability levels, and QuantileEstimatorContinuous for estimating marginal posterior quantiles with the probability level as an input to the neural network.

In addition to point estimation, the package also provides the class RatioEstimator for approximating the so-called likelihood-to-evidence ratio. The binary classification problem at the heart of this approach proceeds based on the binary cross-entropy loss.

Users are free to choose the neural-network architecture of these estimators as they see fit (subject to some class-specific requirements), but the package also provides the convenience constructor initialise_estimator().

NeuralEstimators.PointEstimatorType
PointEstimator(deepset::DeepSet)

A neural point estimator, a mapping from the sample space to the parameter space.

The estimator leverages the DeepSet architecture. The only requirement is that number of output neurons in the final layer of the inference network (i.e., the outer network) is equal to the number of parameters in the statistical model.

source
NeuralEstimators.IntervalEstimatorType
IntervalEstimator(u::DeepSet, v::DeepSet = u; probs = [0.025, 0.975], g::Function = exp)
 IntervalEstimator(u::DeepSet, c::Union{Function,Compress}; probs = [0.025, 0.975], g::Function = exp)
 IntervalEstimator(u::DeepSet, v::DeepSet, c::Union{Function,Compress}; probs = [0.025, 0.975], g::Function = exp)

A neural interval estimator which, given data $Z$, jointly estimates marginal posterior credible intervals based on the probability levels probs.

The estimator employs a representation that prevents quantile crossing, namely, it constructs marginal posterior credible intervals for each parameter $\theta_i$, $i = 1, \dots, p,$ of the form,

\[[c_i(u_i(\boldsymbol{Z})), \;\; c_i(u_i(\boldsymbol{Z})) + g(v_i(\boldsymbol{Z})))],\]

where $\boldsymbol{u}(⋅) \equiv (u_1(\cdot), \dots, u_p(\cdot))'$ and $\boldsymbol{v}(⋅) \equiv (v_1(\cdot), \dots, v_p(\cdot))'$ are neural networks that transform data into $p$-dimensional vectors; $g(\cdot)$ is a monotonically increasing function (e.g., exponential or softplus); and each $c_i(⋅)$ is a monotonically increasing function that maps its input to the prior support of $\theta_i$.

The functions $c_i(⋅)$ may be defined by a $p$-dimensional object of type Compress. If these functions are unspecified, they will be set to the identity function so that the range of the intervals will be unrestricted.

If only a single neural-network architecture is provided, it will be used for both $\boldsymbol{u}(⋅)$ and $\boldsymbol{v}(⋅)$.

The return value when applied to data is a matrix with $2p$ rows, where the first and second $p$ rows correspond to the lower and upper bounds, respectively.

See also QuantileEstimatorDiscrete and QuantileEstimatorContinuous.

Examples

using NeuralEstimators, Flux
 
@@ -28,7 +28,7 @@
 
 # Apply the (untrained) interval estimator
 estimator(Z)
-interval(estimator, Z)
source
NeuralEstimators.QuantileEstimatorDiscreteType
QuantileEstimatorDiscrete(v::DeepSet; probs = [0.05, 0.25, 0.5, 0.75, 0.95], g = Flux.softplus, i = nothing)
 (estimator::QuantileEstimatorDiscrete)(Z)
 (estimator::QuantileEstimatorDiscrete)(Z, θ₋ᵢ)

A neural estimator that jointly estimates a fixed set of marginal posterior quantiles with probability levels $\{\tau_1, \dots, \tau_T\}$, controlled by the keyword argument probs.

By default, the estimator approximates the marginal quantiles for all parameters in the model, that is, the quantiles of

\[\theta_i \mid \boldsymbol{Z}\]

for parameters $\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_p)'$. Alternatively, if initialised with i set to a positive integer, the estimator approximates the quantiles of the full conditional distribution

\[\theta_i \mid \boldsymbol{Z}, \boldsymbol{\theta}_{-i},\]

where $\boldsymbol{\theta}_{-i}$ denotes the parameter vector with its $i$th element removed. For ease of exposition, when targetting marginal posteriors of the form $\theta_i \mid \boldsymbol{Z}$ (i.e., the default behaviour), we define $\text{dim}(\boldsymbol{\theta}_{-i}) ≡ 0$.

The estimator leverages the DeepSet architecture, subject to two requirements. First, the number of input neurons in the first layer of the inference network (i.e., the outer network) must be equal to the number of neurons in the final layer of the summary network plus $\text{dim}(\boldsymbol{\theta}_{-i})$. Second, the number of output neurons in the final layer of the inference network must be equal to $p - \text{dim}(\boldsymbol{\theta}_{-i})$. The estimator employs a representation that prevents quantile crossing, namely,

\[\begin{aligned} \boldsymbol{q}^{(\tau_1)}(\boldsymbol{Z}) &= \boldsymbol{v}^{(\tau_1)}(\boldsymbol{Z}),\\ @@ -106,7 +106,7 @@ q₁(Z, θ₋ᵢ) # Estimate quantiles of μ∣Z,σ with σ = 0.5 for only a single data set -q₁(Z[1], θ₋ᵢ)

source
NeuralEstimators.QuantileEstimatorContinuousType
QuantileEstimatorContinuous(deepset::DeepSet; i = nothing, num_training_probs::Integer = 1)
 (estimator::QuantileEstimatorContinuous)(Z, τ)
 (estimator::QuantileEstimatorContinuous)(Z, θ₋ᵢ, τ)

A neural estimator targetting posterior quantiles.

Given as input data $\boldsymbol{Z}$ and the desired probability level $\tau ∈ (0, 1)$, by default the estimator approximates the $\tau$-quantile of

\[\theta_i \mid \boldsymbol{Z}\]

for parameters $\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_p)'$. Alternatively, if initialised with i set to a positive integer, the estimator approximates the $\tau$-quantile of the full conditional distribution

\[\theta_i \mid \boldsymbol{Z}, \boldsymbol{\theta}_{-i},\]

where $\boldsymbol{\theta}_{-i}$ denotes the parameter vector with its $i$th element removed. For ease of exposition, when targetting marginal posteriors of the form $\theta_i \mid \boldsymbol{Z}$ (i.e., the default behaviour), we define $\text{dim}(\boldsymbol{\theta}_{-i}) ≡ 0$.

The estimator leverages the DeepSet architecture, subject to two requirements. First, the number of input neurons in the first layer of the inference network (i.e., the outer network) must be equal to the number of neurons in the final layer of the summary network plus $1 + \text{dim}(\boldsymbol{\theta}_{-i})$. Second, the number of output neurons in the final layer of the inference network must be equal to $p - \text{dim}(\boldsymbol{\theta}_{-i})$.

Although not a requirement, one may employ a (partially) monotonic neural network to prevent quantile crossing (i.e., to ensure that the $\tau_1$-quantile does not exceed the $\tau_2$-quantile for any $\tau_2 > \tau_1$). There are several ways to construct such a neural network: one simple yet effective approach is to ensure that all weights associated with $\tau$ are strictly positive (see, e.g., Cannon, 2018), and this can be done using the DensePositive layer as illustrated in the examples below.

The return value is a matrix with $p - \text{dim}(\boldsymbol{\theta}_{-i})$ rows, corresponding to the estimated quantile for each parameter not in $\boldsymbol{\theta}_{-i}$.

See also QuantileEstimatorDiscrete.

Examples

using NeuralEstimators, Flux, Distributions , InvertedIndices, Statistics
 using AlgebraOfGraphics, CairoMakie
@@ -211,7 +211,7 @@
 q̂(Z, θ₋ᵢ, τ)
 
 # Estimate quantiles for a single data set
-q̂(Z[1], θ₋ᵢ, τ)
source
NeuralEstimators.RatioEstimatorType
RatioEstimator(deepset::DeepSet)

A neural estimator that estimates the likelihood-to-evidence ratio,

\[r(\boldsymbol{Z}, \boldsymbol{\theta}) \equiv p(\boldsymbol{Z} \mid \boldsymbol{\theta})/p(\boldsymbol{Z}),\]

where $p(\boldsymbol{Z} \mid \boldsymbol{\theta})$ is the likelihood and $p(\boldsymbol{Z})$ is the marginal likelihood, also known as the model evidence.

The estimator leverages the DeepSet architecture, subject to two requirements. First, the number of input neurons in the first layer of the inference network (i.e., the outer network) must equal the number of output neurons in the final layer of the summary network plus the number of parameters in the statistical model. Second, the number of output neurons in the final layer of the inference network must be equal to one.

The ratio estimator is trained by solving a relatively straightforward binary classification problem. Specifically, consider the problem of distinguishing dependent parameter–data pairs ${(\boldsymbol{\theta}', \boldsymbol{Z}')' \sim p(\boldsymbol{Z}, \boldsymbol{\theta})}$ with class labels $Y=1$ from independent parameter–data pairs ${(\tilde{\boldsymbol{\theta}}', \tilde{\boldsymbol{Z}}')' \sim p(\boldsymbol{\theta})p(\boldsymbol{Z})}$ with class labels $Y=0$, and where the classes are balanced. Then the Bayes classifier under binary cross-entropy loss is given by

\[c(\boldsymbol{Z}, \boldsymbol{\theta}) = \frac{p(\boldsymbol{Z}, \boldsymbol{\theta})}{p(\boldsymbol{Z}, \boldsymbol{\theta}) + p(\boldsymbol{\theta})p(\boldsymbol{Z})},\]

and hence,

\[r(\boldsymbol{Z}, \boldsymbol{\theta}) = \frac{c(\boldsymbol{Z}, \boldsymbol{\theta})}{1 - c(\boldsymbol{Z}, \boldsymbol{\theta})}.\]

For numerical stability, training is done on the log-scale using $\log r(\boldsymbol{Z}, \boldsymbol{\theta}) = \text{logit}(c(\boldsymbol{Z}, \boldsymbol{\theta}))$.

When applying the estimator to data, by default the likelihood-to-evidence ratio $r(\boldsymbol{Z}, \boldsymbol{\theta})$ is returned (setting the keyword argument classifier = true will yield class probability estimates). The estimated ratio can then be used in various downstream Bayesian (e.g., Hermans et al., 2020) or Frequentist (e.g., Walchessen et al., 2023) inferential algorithms.

See also mlestimate and mapestimate for obtaining approximate maximum-likelihood and maximum-a-posteriori estimates, and sampleposterior for obtaining approximate posterior samples.

Examples

using NeuralEstimators, Flux, Statistics, Optim
+q̂(Z[1], θ₋ᵢ, τ)
source
NeuralEstimators.RatioEstimatorType
RatioEstimator(deepset::DeepSet)

A neural estimator that estimates the likelihood-to-evidence ratio,

\[r(\boldsymbol{Z}, \boldsymbol{\theta}) \equiv p(\boldsymbol{Z} \mid \boldsymbol{\theta})/p(\boldsymbol{Z}),\]

where $p(\boldsymbol{Z} \mid \boldsymbol{\theta})$ is the likelihood and $p(\boldsymbol{Z})$ is the marginal likelihood, also known as the model evidence.

The estimator leverages the DeepSet architecture, subject to two requirements. First, the number of input neurons in the first layer of the inference network (i.e., the outer network) must equal the number of output neurons in the final layer of the summary network plus the number of parameters in the statistical model. Second, the number of output neurons in the final layer of the inference network must be equal to one.

The ratio estimator is trained by solving a relatively straightforward binary classification problem. Specifically, consider the problem of distinguishing dependent parameter–data pairs ${(\boldsymbol{\theta}', \boldsymbol{Z}')' \sim p(\boldsymbol{Z}, \boldsymbol{\theta})}$ with class labels $Y=1$ from independent parameter–data pairs ${(\tilde{\boldsymbol{\theta}}', \tilde{\boldsymbol{Z}}')' \sim p(\boldsymbol{\theta})p(\boldsymbol{Z})}$ with class labels $Y=0$, and where the classes are balanced. Then the Bayes classifier under binary cross-entropy loss is given by

\[c(\boldsymbol{Z}, \boldsymbol{\theta}) = \frac{p(\boldsymbol{Z}, \boldsymbol{\theta})}{p(\boldsymbol{Z}, \boldsymbol{\theta}) + p(\boldsymbol{\theta})p(\boldsymbol{Z})},\]

and hence,

\[r(\boldsymbol{Z}, \boldsymbol{\theta}) = \frac{c(\boldsymbol{Z}, \boldsymbol{\theta})}{1 - c(\boldsymbol{Z}, \boldsymbol{\theta})}.\]

For numerical stability, training is done on the log-scale using $\log r(\boldsymbol{Z}, \boldsymbol{\theta}) = \text{logit}(c(\boldsymbol{Z}, \boldsymbol{\theta}))$.

When applying the estimator to data, by default the likelihood-to-evidence ratio $r(\boldsymbol{Z}, \boldsymbol{\theta})$ is returned (setting the keyword argument classifier = true will yield class probability estimates). The estimated ratio can then be used in various downstream Bayesian (e.g., Hermans et al., 2020) or Frequentist (e.g., Walchessen et al., 2023) inferential algorithms.

See also mlestimate and mapestimate for obtaining approximate maximum-likelihood and maximum-a-posteriori estimates, and sampleposterior for obtaining approximate posterior samples.

Examples

using NeuralEstimators, Flux, Statistics, Optim
 
 # Generate data from Z|μ,σ ~ N(μ, σ²) with μ, σ ~ U(0, 1)
 p = 2     # number of unknown parameters in the statistical model
@@ -253,7 +253,7 @@
 r̂(z, θ_grid)                              # likelihood-to-evidence ratios over grid
 mlestimate(r̂, z;  θ_grid = θ_grid)        # maximum-likelihood estimate
 mapestimate(r̂, z; θ_grid = θ_grid)        # maximum-a-posteriori estimate
-sampleposterior(r̂, z; θ_grid = θ_grid)    # posterior samples
source
NeuralEstimators.PiecewiseEstimatorType
PiecewiseEstimator(estimators, changepoints)

Creates a piecewise estimator (Sainsbury-Dale et al., 2024, sec. 2.2.2) from a collection of estimators and sample-size changepoints.

Specifically, with $l$ estimators and sample-size changepoints $m_1 < m_2 < \dots < m_{l-1}$, the piecewise etimator takes the form,

\[\hat{\boldsymbol{\theta}}(\boldsymbol{Z}) +sampleposterior(r̂, z; θ_grid = θ_grid) # posterior samples

source
NeuralEstimators.PiecewiseEstimatorType
PiecewiseEstimator(estimators, changepoints)

Creates a piecewise estimator (Sainsbury-Dale et al., 2024, sec. 2.2.2) from a collection of estimators and sample-size changepoints.

Specifically, with $l$ estimators and sample-size changepoints $m_1 < m_2 < \dots < m_{l-1}$, the piecewise etimator takes the form,

\[\hat{\boldsymbol{\theta}}(\boldsymbol{Z}) = \begin{cases} \hat{\boldsymbol{\theta}}_1(\boldsymbol{Z}) & m \leq m_1,\\ @@ -281,7 +281,7 @@ # Apply the (untrained) piecewise estimator to data Z = [rand(d, 1, m) for m ∈ (10, 50)] -θ̂(Z)

source
NeuralEstimators.EnsembleType
Ensemble(estimators)
 Ensemble(architecture::Function, J::Integer)
 (ensemble::Ensemble)(Z; aggr = median)

Defines an ensemble based on a collection of estimators which, when applied to data Z, returns the median (or another summary defined by aggr) of the estimates.

The ensemble can be initialised with a collection of trained estimators and then applied immediately to observed data. Alternatively, the ensemble can be initialised with a collection of untrained estimators (or a function defining the architecture of each estimator, and the number of estimators in the ensemble), trained with train(), and then applied to observed data. In the latter case, where the ensemble is trained directly, if savepath is specified both the ensemble and component estimators will be saved.

Note that train() currently acts sequentially on the component estimators.

The ensemble components can be accessed by indexing the ensemble directly; the number of component estimators can be obtained using length().

Examples

using NeuralEstimators, Flux
 
@@ -315,7 +315,7 @@
 rmse(assessment)
 
 # Apply to data
-ensemble(Z)
source

Training

The function train is used to train a single neural estimator, while the wrapper function trainx is useful for training multiple neural estimators over a range of sample sizes, making using of the technique known as pre-training.

Training

The function train is used to train a single neural estimator, while the wrapper function trainx is useful for training multiple neural estimators over a range of sample sizes, making using of the technique known as pre-training.

NeuralEstimators.trainFunction
train(θ̂, sampler::Function, simulator::Function; ...)
 train(θ̂, θ_train::P, θ_val::P, simulator::Function; ...) where {P <: Union{AbstractMatrix, ParameterConfigurations}}
 train(θ̂, θ_train::P, θ_val::P, Z_train::T, Z_val::T; ...) where {T, P <: Union{AbstractMatrix, ParameterConfigurations}}

Train a neural estimator θ̂.

The methods cater for different variants of "on-the-fly" simulation. Specifically, a sampler can be provided to continuously sample new parameter vectors from the prior, and a simulator can be provided to continuously simulate new data conditional on the parameters. If provided with specific sets of parameters (θ_train and θ_val) and/or data (Z_train and Z_val), they will be held fixed during training.

In all methods, the validation parameters and data are held fixed to reduce noise when evaluating the validation risk.

Keyword arguments common to all methods:

  • loss = mae
  • epochs = 100
  • batchsize = 32
  • optimiser = ADAM()
  • savepath::String = "": path to save the trained estimator and other information; if an empty string (default), nothing is saved. Otherwise, the neural-network parameters (i.e., the weights and biases) will be saved during training as bson files; the risk function evaluated over the training and validation sets will also be saved, in the first and second columns of loss_per_epoch.csv, respectively; the best parameters (as measured by validation risk) will be saved as best_network.bson.
  • stopping_epochs = 5: cease training if the risk doesn't improve in this number of epochs.
  • use_gpu = true
  • verbose = true

Keyword arguments common to train(θ̂, sampler, simulator) and train(θ̂, θ_train, θ_val, simulator):

  • m: sample sizes (either an Integer or a collection of Integers). The simulator is called as simulator(θ, m).
  • epochs_per_Z_refresh = 1: the number of passes to make through the training set before the training data are refreshed.
  • simulate_just_in_time = false: flag indicating whether we should simulate just-in-time, in the sense that only a batchsize number of parameter vectors and corresponding data are in memory at a given time.

Keyword arguments unique to train(θ̂, sampler, simulator):

  • K = 10000: number of parameter vectors in the training set; the size of the validation set is K ÷ 5.
  • ξ = nothing: an arbitrary collection of objects that, if provided, will be passed to the parameter sampler as sampler(K, ξ); otherwise, the parameter sampler will be called as sampler(K). Can also be provided as xi.
  • epochs_per_θ_refresh = 1: the number of passes to make through the training set before the training parameters are refreshed. Must be a multiple of epochs_per_Z_refresh. Can also be provided as epochs_per_theta_refresh.

Examples

using NeuralEstimators, Flux
 
@@ -352,10 +352,10 @@
 # training: fixed parameters and fixed data
 Z_train = simulator(θ_train, m)
 Z_val   = simulator(θ_val, m)
-θ̂       = train(θ̂, θ_train, θ_val, Z_train, Z_val, epochs = 5)
source
NeuralEstimators.trainxFunction
trainx(θ̂, sampler::Function, simulator::Function, m::Vector{Integer}; ...)
+θ̂       = train(θ̂, θ_train, θ_val, Z_train, Z_val, epochs = 5)
source
NeuralEstimators.trainxFunction
trainx(θ̂, sampler::Function, simulator::Function, m::Vector{Integer}; ...)
 trainx(θ̂, θ_train, θ_val, simulator::Function, m::Vector{Integer}; ...)
 trainx(θ̂, θ_train, θ_val, Z_train, Z_val, m::Vector{Integer}; ...)
-trainx(θ̂, θ_train, θ_val, Z_train::V, Z_val::V; ...) where {V <: AbstractVector{AbstractVector{Any}}}

A wrapper around train() to construct neural estimators for different sample sizes.

The positional argument m specifies the desired sample sizes. Each estimator is pre-trained with the estimator for the previous sample size. For example, if m = [m₁, m₂], the estimator for sample size m₂ is pre-trained with the estimator for sample size m₁.

The method for Z_train and Z_val subsets the data using subsetdata(Z, 1:mᵢ) for each mᵢ ∈ m. The method for Z_train::V and Z_val::V trains an estimator for each element of Z_train::V and Z_val::V and, hence, it does not need to invoke subsetdata(), which can be slow or difficult to define in some cases (e.g., for graphical data). Note that, in this case, m is inferred from the data.

The keyword arguments inherit from train(). The keyword arguments epochs, batchsize, stopping_epochs, and optimiser can each be given as vectors. For example, if training two estimators, one may use a different number of epochs for each estimator by providing epochs = [epoch₁, epoch₂].

source

Assessment/calibration

NeuralEstimators.assessFunction
assess(estimator, θ, Z)

Using an estimator (or a collection of estimators), computes estimates from data Z simulated based on true parameter vectors stored in θ.

The data Z should be a Vector, with each element corresponding to a single simulated data set. If Z contains more data sets than parameter vectors, the parameter matrix θ will be recycled by horizontal concatenation via the call θ = repeat(θ, outer = (1, J)) where J = length(Z) ÷ K is the number of simulated data sets and K = size(θ, 2) is the number of parameter vectors.

The output is of type Assessment; see ?Assessment for details.

Keyword arguments

  • estimator_names::Vector{String}: names of the estimators (sensible defaults provided).
  • parameter_names::Vector{String}: names of the parameters (sensible defaults provided). If ξ is provided with a field parameter_names, those names will be used.
  • ξ = nothing: an arbitrary collection of objects that are fixed (e.g., distance matrices). Can also be provided as xi.
  • use_ξ = false: a Bool or a collection of Bool objects with length equal to the number of estimators. Specifies whether or not the estimator uses ξ: if it does, the estimator will be applied as estimator(Z, ξ). This argument is useful when multiple estimators are provided, only some of which need ξ; hence, if only one estimator is provided and ξ is not nothing, use_ξ is automatically set to true. Can also be provided as use_xi.
  • use_gpu = true: a Bool or a collection of Bool objects with length equal to the number of estimators.
  • probs = range(0.01, stop=0.99, length=100): (relevant only for estimator::QuantileEstimatorContinuous) a collection of probability levels in (0, 1)

Examples

using NeuralEstimators, Flux
+trainx(θ̂, θ_train, θ_val, Z_train::V, Z_val::V; ...) where {V <: AbstractVector{AbstractVector{Any}}}

A wrapper around train() to construct neural estimators for different sample sizes.

The positional argument m specifies the desired sample sizes. Each estimator is pre-trained with the estimator for the previous sample size. For example, if m = [m₁, m₂], the estimator for sample size m₂ is pre-trained with the estimator for sample size m₁.

The method for Z_train and Z_val subsets the data using subsetdata(Z, 1:mᵢ) for each mᵢ ∈ m. The method for Z_train::V and Z_val::V trains an estimator for each element of Z_train::V and Z_val::V and, hence, it does not need to invoke subsetdata(), which can be slow or difficult to define in some cases (e.g., for graphical data). Note that, in this case, m is inferred from the data.

The keyword arguments inherit from train(). The keyword arguments epochs, batchsize, stopping_epochs, and optimiser can each be given as vectors. For example, if training two estimators, one may use a different number of epochs for each estimator by providing epochs = [epoch₁, epoch₂].

source

Assessment/calibration

NeuralEstimators.assessFunction
assess(estimator, θ, Z)

Using an estimator (or a collection of estimators), computes estimates from data Z simulated based on true parameter vectors stored in θ.

The data Z should be a Vector, with each element corresponding to a single simulated data set. If Z contains more data sets than parameter vectors, the parameter matrix θ will be recycled by horizontal concatenation via the call θ = repeat(θ, outer = (1, J)) where J = length(Z) ÷ K is the number of simulated data sets and K = size(θ, 2) is the number of parameter vectors.

The output is of type Assessment; see ?Assessment for details.

Keyword arguments

  • estimator_names::Vector{String}: names of the estimators (sensible defaults provided).
  • parameter_names::Vector{String}: names of the parameters (sensible defaults provided). If ξ is provided with a field parameter_names, those names will be used.
  • ξ = nothing: an arbitrary collection of objects that are fixed (e.g., distance matrices). Can also be provided as xi.
  • use_ξ = false: a Bool or a collection of Bool objects with length equal to the number of estimators. Specifies whether or not the estimator uses ξ: if it does, the estimator will be applied as estimator(Z, ξ). This argument is useful when multiple estimators are provided, only some of which need ξ; hence, if only one estimator is provided and ξ is not nothing, use_ξ is automatically set to true. Can also be provided as use_xi.
  • use_gpu = true: a Bool or a collection of Bool objects with length equal to the number of estimators.
  • probs = range(0.01, stop=0.99, length=100): (relevant only for estimator::QuantileEstimatorContinuous) a collection of probability levels in (0, 1)

Examples

using NeuralEstimators, Flux
 
 n = 10 # number of observations in each realisation
 p = 4  # number of parameters in the statistical model
@@ -388,17 +388,17 @@
 θ̂ = DeepSet(ψ, ϕ)
 x = [rand(qₓ) for _ ∈ eachindex(Z)]
 assessment = assess(θ̂, θ, (Z, x));
-risk(assessment)
source
NeuralEstimators.AssessmentType
Assessment(df::DataFrame, runtime::DataFrame)

A type for storing the output of assess(). The field runtime contains the total time taken for each estimator. The field df is a long-form DataFrame with columns:

  • estimator: the name of the estimator
  • parameter: the name of the parameter
  • truth: the true value of the parameter
  • estimate: the estimated value of the parameter
  • m: the sample size (number of iid replicates) for the given data set
  • k: the index of the parameter vector
  • j: the index of the data set (in the case that multiple data sets are associated with each parameter vector)

If estimator is an IntervalEstimator, the column estimate will be replaced by the columns lower and upper, containing the lower and upper bounds of the interval, respectively.

If estimator is a QuantileEstimator, the df will also contain a column prob indicating the probability level of the corresponding quantile estimate.

Multiple Assessment objects can be combined with merge() (used for combining assessments from multiple point estimators) or join() (used for combining assessments from a point estimator and an interval estimator).

source
NeuralEstimators.riskFunction
risk(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an estimator's Bayes risk,

\[r(\hat{\boldsymbol{\theta}}(\cdot)) +risk(assessment)

source
NeuralEstimators.AssessmentType
Assessment(df::DataFrame, runtime::DataFrame)

A type for storing the output of assess(). The field runtime contains the total time taken for each estimator. The field df is a long-form DataFrame with columns:

  • estimator: the name of the estimator
  • parameter: the name of the parameter
  • truth: the true value of the parameter
  • estimate: the estimated value of the parameter
  • m: the sample size (number of iid replicates) for the given data set
  • k: the index of the parameter vector
  • j: the index of the data set (in the case that multiple data sets are associated with each parameter vector)

If estimator is an IntervalEstimator, the column estimate will be replaced by the columns lower and upper, containing the lower and upper bounds of the interval, respectively.

If estimator is a QuantileEstimator, the df will also contain a column prob indicating the probability level of the corresponding quantile estimate.

Multiple Assessment objects can be combined with merge() (used for combining assessments from multiple point estimators) or join() (used for combining assessments from a point estimator and an interval estimator).

source
NeuralEstimators.riskFunction
risk(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an estimator's Bayes risk,

\[r(\hat{\boldsymbol{\theta}}(\cdot)) \approx -\frac{1}{K} \sum_{k=1}^K L(\boldsymbol{\theta}^{(k)}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)})),\]

where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.

Keyword arguments

  • loss = (x, y) -> abs(x - y): a binary operator defining the loss function (default absolute-error loss).
  • average_over_parameters::Bool = false: if true, the loss is averaged over all parameters; otherwise (default), the loss is averaged over each parameter separately.
  • average_over_sample_sizes::Bool = true: if true (default), the loss is averaged over all sample sizes $m$; otherwise, the loss is averaged over each sample size separately.
source
NeuralEstimators.biasFunction
bias(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an estimator's bias,

\[{\rm{bias}}(\hat{\boldsymbol{\theta}}(\cdot)) +\frac{1}{K} \sum_{k=1}^K L(\boldsymbol{\theta}^{(k)}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)})),\]

where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.

Keyword arguments

  • loss = (x, y) -> abs(x - y): a binary operator defining the loss function (default absolute-error loss).
  • average_over_parameters::Bool = false: if true, the loss is averaged over all parameters; otherwise (default), the loss is averaged over each parameter separately.
  • average_over_sample_sizes::Bool = true: if true (default), the loss is averaged over all sample sizes $m$; otherwise, the loss is averaged over each sample size separately.
source
NeuralEstimators.biasFunction
bias(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an estimator's bias,

\[{\rm{bias}}(\hat{\boldsymbol{\theta}}(\cdot)) \approx -\frac{1}{K} \sum_{k=1}^K \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)},\]

where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.

This function inherits the keyword arguments of risk (excluding the argument loss).

source
NeuralEstimators.rmseFunction
rmse(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an estimator's root-mean-squared error,

\[{\rm{rmse}}(\hat{\boldsymbol{\theta}}(\cdot)) +\frac{1}{K} \sum_{k=1}^K \hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)},\]

where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.

This function inherits the keyword arguments of risk (excluding the argument loss).

source
NeuralEstimators.rmseFunction
rmse(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an estimator's root-mean-squared error,

\[{\rm{rmse}}(\hat{\boldsymbol{\theta}}(\cdot)) \approx -\sqrt{\frac{1}{K} \sum_{k=1}^K (\hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)})^2},\]

where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.

This function inherits the keyword arguments of risk (excluding the argument loss).

source
NeuralEstimators.coverageFunction
coverage(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an interval estimator's expected coverage, as defined in Hermans et al. (2022, Definition 2.1), and the proportion of parameters below and above the lower and upper bounds, respectively.

Keyword arguments

  • average_over_parameters::Bool = false: if true, the coverage is averaged over all parameters; otherwise (default), it is computed over each parameter separately.
  • average_over_sample_sizes::Bool = true: if true (default), the coverage is averaged over all sample sizes $m$; otherwise, it is computed over each sample size separately.
source

Inference with observed data

Inference using point estimators

Inference with a neural Bayes (point) estimator proceeds simply by applying the estimator θ̂ to the observed data Z (possibly containing multiple data sets) in a call of the form θ̂(Z). To leverage a GPU, simply move the estimator and the data to the GPU using gpu(); see also estimateinbatches() to apply the estimator over batches of data, which can alleviate memory issues when working with a large number of data sets.

Uncertainty quantification often proceeds through the bootstrap distribution, which is essentially available "for free" when bootstrap data sets can be quickly generated; this is facilitated by bootstrap() and interval(). Alternatively, one may approximate a set of low and high marginal posterior quantiles using a specially constructed neural Bayes estimator, which can then be used to construct credible intervals: see IntervalEstimator, QuantileEstimatorDiscrete, and QuantileEstimatorContinuous.

NeuralEstimators.bootstrapFunction
bootstrap(θ̂, parameters::P, Z) where P <: Union{AbstractMatrix, ParameterConfigurations}
+\sqrt{\frac{1}{K} \sum_{k=1}^K (\hat{\boldsymbol{\theta}}(\boldsymbol{Z}^{(k)}) - \boldsymbol{\theta}^{(k)})^2},\]

where $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ denotes a set of $K$ parameter vectors sampled from the prior and, for each $k$, data $\boldsymbol{Z}^{(k)}$ are simulated from the statistical model conditional on $\boldsymbol{\theta}^{(k)}$.

This function inherits the keyword arguments of risk (excluding the argument loss).

source
NeuralEstimators.coverageFunction
coverage(assessment::Assessment; ...)

Computes a Monte Carlo approximation of an interval estimator's expected coverage, as defined in Hermans et al. (2022, Definition 2.1), and the proportion of parameters below and above the lower and upper bounds, respectively.

Keyword arguments

  • average_over_parameters::Bool = false: if true, the coverage is averaged over all parameters; otherwise (default), it is computed over each parameter separately.
  • average_over_sample_sizes::Bool = true: if true (default), the coverage is averaged over all sample sizes $m$; otherwise, it is computed over each sample size separately.
source

Inference with observed data

Inference using point estimators

Inference with a neural Bayes (point) estimator proceeds simply by applying the estimator θ̂ to the observed data Z (possibly containing multiple data sets) in a call of the form θ̂(Z). To leverage a GPU, simply move the estimator and the data to the GPU using gpu(); see also estimateinbatches() to apply the estimator over batches of data, which can alleviate memory issues when working with a large number of data sets.

Uncertainty quantification often proceeds through the bootstrap distribution, which is essentially available "for free" when bootstrap data sets can be quickly generated; this is facilitated by bootstrap() and interval(). Alternatively, one may approximate a set of low and high marginal posterior quantiles using a specially constructed neural Bayes estimator, which can then be used to construct credible intervals: see IntervalEstimator, QuantileEstimatorDiscrete, and QuantileEstimatorContinuous.

NeuralEstimators.bootstrapFunction
bootstrap(θ̂, parameters::P, Z) where P <: Union{AbstractMatrix, ParameterConfigurations}
 bootstrap(θ̂, parameters::P, simulator, m::Integer; B = 400) where P <: Union{AbstractMatrix, ParameterConfigurations}
-bootstrap(θ̂, Z; B = 400, blocks = nothing)

Generates B bootstrap estimates from an estimator θ̂.

Parametric bootstrapping is facilitated by passing a single parameter configuration, parameters, and corresponding simulated data, Z, whose length implicitly defines B. Alternatively, one may provide a simulator and the desired sample size, in which case the data will be simulated using simulator(parameters, m).

Non-parametric bootstrapping is facilitated by passing a single data set, Z. The argument blocks caters for block bootstrapping, and it should be a vector of integers specifying the block for each replicate. For example, with 5 replicates, the first two corresponding to block 1 and the remaining three corresponding to block 2, blocks should be [1, 1, 2, 2, 2]. The resampling algorithm aims to produce resampled data sets that are of a similar size to Z, but this can only be achieved exactly if all blocks are equal in length.

The keyword argument use_gpu is a flag determining whether to use the GPU, if it is available (default true).

The return type is a p × B matrix, where p is the number of parameters in the model.

source
NeuralEstimators.intervalFunction
interval(θ::Matrix; probs = [0.05, 0.95], parameter_names = nothing)
+bootstrap(θ̂, Z; B = 400, blocks = nothing)

Generates B bootstrap estimates from an estimator θ̂.

Parametric bootstrapping is facilitated by passing a single parameter configuration, parameters, and corresponding simulated data, Z, whose length implicitly defines B. Alternatively, one may provide a simulator and the desired sample size, in which case the data will be simulated using simulator(parameters, m).

Non-parametric bootstrapping is facilitated by passing a single data set, Z. The argument blocks caters for block bootstrapping, and it should be a vector of integers specifying the block for each replicate. For example, with 5 replicates, the first two corresponding to block 1 and the remaining three corresponding to block 2, blocks should be [1, 1, 2, 2, 2]. The resampling algorithm aims to produce resampled data sets that are of a similar size to Z, but this can only be achieved exactly if all blocks are equal in length.

The keyword argument use_gpu is a flag determining whether to use the GPU, if it is available (default true).

The return type is a p × B matrix, where p is the number of parameters in the model.

source
NeuralEstimators.intervalFunction
interval(θ::Matrix; probs = [0.05, 0.95], parameter_names = nothing)
 interval(estimator::IntervalEstimator, Z; parameter_names = nothing, use_gpu = true)

Compute a confidence interval based either on a $p$ × $B$ matrix θ of parameters (typically containing bootstrap estimates or posterior draws) with $p$ the number of parameters in the model, or from an IntervalEstimator and data Z.

When given θ, the intervals are constructed by compute quantiles with probability levels controlled by the keyword argument probs.

The return type is a $p$ × 2 matrix, whose first and second columns respectively contain the lower and upper bounds of the interval. The rows of this matrix can be named by passing a vector of strings to the keyword argument parameter_names.

Examples

using NeuralEstimators
 p = 3
 B = 50
 θ = rand(p, B)
-interval(θ)
source

Inference using likelihood and likelihood-to-evidence-ratio estimators

NeuralEstimators.mlestimateFunction
mlestimate(estimator::RatioEstimator, Z; θ₀ = nothing, θ_grid = nothing, penalty::Function = θ -> 1, use_gpu = true)

Computes the (approximate) maximum likelihood estimate given data $\boldsymbol{Z}$,

\[\argmax_{\boldsymbol{\theta}} \ell(\boldsymbol{\theta} ; \boldsymbol{Z})\]

where $\ell(\cdot ; \cdot)$ denotes the approximate log-likelihood function derived from estimator.

If a vector θ₀ of initial parameter estimates is given, the approximate likelihood is maximised by gradient descent (requires Optim.jl to be loaded). Otherwise, if a matrix of parameters θ_grid is given, the approximate likelihood is maximised by grid search.

A maximum penalised likelihood estimate,

\[\argmax_{\boldsymbol{\theta}} \ell(\boldsymbol{\theta} ; \boldsymbol{Z}) + \log p(\boldsymbol{\theta}),\]

can be obtained by specifying the keyword argument penalty that defines the penalty term $p(\boldsymbol{\theta})$.

See also mapestimate() for computing (approximate) maximum a posteriori estimates.

source
NeuralEstimators.mapestimateFunction
mapestimate(estimator::RatioEstimator, Z; θ₀ = nothing, θ_grid = nothing, prior::Function = θ -> 1, use_gpu = true)

Computes the (approximate) maximum a posteriori estimate given data $\boldsymbol{Z}$,

\[\argmax_{\boldsymbol{\theta}} \ell(\boldsymbol{\theta} ; \boldsymbol{Z}) + \log p(\boldsymbol{\theta})\]

where $\ell(\cdot ; \cdot)$ denotes the approximate log-likelihood function derived from estimator, and $p(\boldsymbol{\theta})$ denotes the prior density function controlled through the keyword argument prior (by default, a uniform prior is used).

If a vector θ₀ of initial parameter estimates is given, the approximate posterior density is maximised by gradient descent (requires Optim.jl to be loaded). Otherwise, if a matrix of parameters θ_grid is given, the approximate posterior density is maximised by grid search.

See also mlestimate() for computing (approximate) maximum likelihood estimates.

source
NeuralEstimators.sampleposteriorFunction
sampleposterior(estimator::RatioEstimator, Z, N::Integer = 1000; θ_grid, prior::Function = θ -> 1f0)

Samples from the approximate posterior distribution $p(\boldsymbol{\theta} \mid \boldsymbol{Z})$ implied by estimator.

The positional argument N controls the size of the posterior sample.

Currently, the sampling algorithm is based on a fine-gridding of the parameter space, specified through the keyword argument θ_grid (or theta_grid). The approximate posterior density is evaluated over this grid, which is then used to draw samples. This is very effective when making inference with a small number of parameters. For models with a large number of parameters, other sampling algorithms may be needed (please feel free to contact the package maintainer for discussion).

The prior distribution $p(\boldsymbol{\theta})$ is controlled through the keyword argument prior (by default, a uniform prior is used).

source
+interval(θ)source

Inference using likelihood and likelihood-to-evidence-ratio estimators

NeuralEstimators.mlestimateFunction
mlestimate(estimator::RatioEstimator, Z; θ₀ = nothing, θ_grid = nothing, penalty::Function = θ -> 1, use_gpu = true)

Computes the (approximate) maximum likelihood estimate given data $\boldsymbol{Z}$,

\[\argmax_{\boldsymbol{\theta}} \ell(\boldsymbol{\theta} ; \boldsymbol{Z})\]

where $\ell(\cdot ; \cdot)$ denotes the approximate log-likelihood function derived from estimator.

If a vector θ₀ of initial parameter estimates is given, the approximate likelihood is maximised by gradient descent (requires Optim.jl to be loaded). Otherwise, if a matrix of parameters θ_grid is given, the approximate likelihood is maximised by grid search.

A maximum penalised likelihood estimate,

\[\argmax_{\boldsymbol{\theta}} \ell(\boldsymbol{\theta} ; \boldsymbol{Z}) + \log p(\boldsymbol{\theta}),\]

can be obtained by specifying the keyword argument penalty that defines the penalty term $p(\boldsymbol{\theta})$.

See also mapestimate() for computing (approximate) maximum a posteriori estimates.

source
NeuralEstimators.mapestimateFunction
mapestimate(estimator::RatioEstimator, Z; θ₀ = nothing, θ_grid = nothing, prior::Function = θ -> 1, use_gpu = true)

Computes the (approximate) maximum a posteriori estimate given data $\boldsymbol{Z}$,

\[\argmax_{\boldsymbol{\theta}} \ell(\boldsymbol{\theta} ; \boldsymbol{Z}) + \log p(\boldsymbol{\theta})\]

where $\ell(\cdot ; \cdot)$ denotes the approximate log-likelihood function derived from estimator, and $p(\boldsymbol{\theta})$ denotes the prior density function controlled through the keyword argument prior (by default, a uniform prior is used).

If a vector θ₀ of initial parameter estimates is given, the approximate posterior density is maximised by gradient descent (requires Optim.jl to be loaded). Otherwise, if a matrix of parameters θ_grid is given, the approximate posterior density is maximised by grid search.

See also mlestimate() for computing (approximate) maximum likelihood estimates.

source
NeuralEstimators.sampleposteriorFunction
sampleposterior(estimator::RatioEstimator, Z, N::Integer = 1000; θ_grid, prior::Function = θ -> 1f0)

Samples from the approximate posterior distribution $p(\boldsymbol{\theta} \mid \boldsymbol{Z})$ implied by estimator.

The positional argument N controls the size of the posterior sample.

Currently, the sampling algorithm is based on a fine-gridding of the parameter space, specified through the keyword argument θ_grid (or theta_grid). The approximate posterior density is evaluated over this grid, which is then used to draw samples. This is very effective when making inference with a small number of parameters. For models with a large number of parameters, other sampling algorithms may be needed (please feel free to contact the package maintainer for discussion).

The prior distribution $p(\boldsymbol{\theta})$ is controlled through the keyword argument prior (by default, a uniform prior is used).

source
diff --git a/dev/API/index.html b/dev/API/index.html index 3376b7a..5ade176 100644 --- a/dev/API/index.html +++ b/dev/API/index.html @@ -1,2 +1,2 @@ -Index · NeuralEstimators.jl

Index

+Index · NeuralEstimators.jl

Index

diff --git a/dev/API/loss/index.html b/dev/API/loss/index.html index bf52247..015b61f 100644 --- a/dev/API/loss/index.html +++ b/dev/API/loss/index.html @@ -1,5 +1,5 @@ -Loss functions · NeuralEstimators.jl

Loss functions

In addition to the standard loss functions provided by Flux (e.g., mae, mse, etc.), NeuralEstimators provides the following loss functions.

NeuralEstimators.tanhlossFunction
tanhloss(θ̂, θ, k; agg = mean, joint = true)

For k > 0, computes the loss function,

\[L(θ̂, θ) = tanh(|θ̂ - θ|/k),\]

which approximates the 0-1 loss as k → 0. Compared with the kpowerloss, which may also be used as a continuous surrogate for the 0-1 loss, the gradient of the tanh loss is bounded as |θ̂ - θ| → 0, which can improve numerical stability during training.

If joint = true, the L₁ norm is computed over each parameter vector, so that, with k close to zero, the resulting Bayes estimator is the mode of the joint posterior distribution; otherwise, if joint = false, the Bayes estimator is the vector containing the modes of the marginal posterior distributions.

See also kpowerloss.

source
NeuralEstimators.kpowerlossFunction
kpowerloss(θ̂, θ, k; agg = mean, joint = true, safeorigin = true, ϵ = 0.1)

For k > 0, the k-th power absolute-distance loss function,

\[L(θ̂, θ) = |θ̂ - θ|ᵏ,\]

contains the squared-error, absolute-error, and 0-1 loss functions as special cases (the latter obtained in the limit as k → 0). It is Lipschitz continuous iff k = 1, convex iff k ≥ 1, and strictly convex iff k > 1: it is quasiconvex for all k > 0.

If joint = true, the L₁ norm is computed over each parameter vector, so that, with k close to zero, the resulting Bayes estimator is the mode of the joint posterior distribution; otherwise, if joint = false, the Bayes estimator is the vector containing the modes of the marginal posterior distributions.

If safeorigin = true, the loss function is modified to avoid pathologies around the origin, so that the resulting loss function behaves similarly to the absolute-error loss in the ϵ-interval surrounding the origin.

See also tanhloss.

source
NeuralEstimators.quantilelossFunction
quantileloss(θ̂, θ, τ; agg = mean)
+Loss functions · NeuralEstimators.jl

Loss functions

In addition to the standard loss functions provided by Flux (e.g., mae, mse, etc.), NeuralEstimators provides the following loss functions.

NeuralEstimators.tanhlossFunction
tanhloss(θ̂, θ, k; agg = mean, joint = true)

For k > 0, computes the loss function,

\[L(θ̂, θ) = tanh(|θ̂ - θ|/k),\]

which approximates the 0-1 loss as k → 0. Compared with the kpowerloss, which may also be used as a continuous surrogate for the 0-1 loss, the gradient of the tanh loss is bounded as |θ̂ - θ| → 0, which can improve numerical stability during training.

If joint = true, the L₁ norm is computed over each parameter vector, so that, with k close to zero, the resulting Bayes estimator is the mode of the joint posterior distribution; otherwise, if joint = false, the Bayes estimator is the vector containing the modes of the marginal posterior distributions.

See also kpowerloss.

source
NeuralEstimators.kpowerlossFunction
kpowerloss(θ̂, θ, k; agg = mean, joint = true, safeorigin = true, ϵ = 0.1)

For k > 0, the k-th power absolute-distance loss function,

\[L(θ̂, θ) = |θ̂ - θ|ᵏ,\]

contains the squared-error, absolute-error, and 0-1 loss functions as special cases (the latter obtained in the limit as k → 0). It is Lipschitz continuous iff k = 1, convex iff k ≥ 1, and strictly convex iff k > 1: it is quasiconvex for all k > 0.

If joint = true, the L₁ norm is computed over each parameter vector, so that, with k close to zero, the resulting Bayes estimator is the mode of the joint posterior distribution; otherwise, if joint = false, the Bayes estimator is the vector containing the modes of the marginal posterior distributions.

If safeorigin = true, the loss function is modified to avoid pathologies around the origin, so that the resulting loss function behaves similarly to the absolute-error loss in the ϵ-interval surrounding the origin.

See also tanhloss.

source
NeuralEstimators.quantilelossFunction
quantileloss(θ̂, θ, τ; agg = mean)
 quantileloss(θ̂, θ, τ::Vector; agg = mean)

The asymmetric quantile loss function,

\[ L(θ̂, θ; τ) = (θ̂ - θ)(𝕀(θ̂ - θ > 0) - τ),\]

where τ ∈ (0, 1) is a probability level and 𝕀(⋅) is the indicator function.

The method that takes τ as a vector is useful for jointly approximating several quantiles of the posterior distribution. In this case, the number of rows in θ̂ is assumed to be $pr$, where $p$ is the number of parameters and $r$ is the number probability levels in τ (i.e., the length of τ).

Examples

p = 1
 K = 10
 θ = rand(p, K)
@@ -15,6 +15,6 @@
 quantileloss(θ̂, θ, 0.1)
 
 θ̂ = rand(3p, K)
-quantileloss(θ̂, θ, [0.1, 0.5, 0.9])
source
NeuralEstimators.intervalscoreFunction
intervalscore(l, u, θ, α; agg = mean)
 intervalscore(θ̂, θ, α; agg = mean)
-intervalscore(assessment::Assessment; average_over_parameters::Bool = false, average_over_sample_sizes::Bool = true)

Given an interval [l, u] with nominal coverage 100×(1-α)% and true value θ, the interval score is defined by

\[S(l, u, θ; α) = (u - l) + 2α⁻¹(l - θ)𝕀(θ < l) + 2α⁻¹(θ - u)𝕀(θ > u),\]

where α ∈ (0, 1) and 𝕀(⋅) is the indicator function.

The method that takes a single value θ̂ assumes that θ̂ is a matrix with $2p$ rows, where $p$ is the number of parameters in the statistical model. Then, the first and second set of $p$ rows will be used as l and u, respectively.

For further discussion, see Section 6 of Gneiting, T. and Raftery, A. E. (2007), "Strictly proper scoring rules, prediction, and estimation", Journal of the American statistical Association, 102, 359–378.

source
+intervalscore(assessment::Assessment; average_over_parameters::Bool = false, average_over_sample_sizes::Bool = true)

Given an interval [l, u] with nominal coverage 100×(1-α)% and true value θ, the interval score is defined by

\[S(l, u, θ; α) = (u - l) + 2α⁻¹(l - θ)𝕀(θ < l) + 2α⁻¹(θ - u)𝕀(θ > u),\]

where α ∈ (0, 1) and 𝕀(⋅) is the indicator function.

The method that takes a single value θ̂ assumes that θ̂ is a matrix with $2p$ rows, where $p$ is the number of parameters in the statistical model. Then, the first and second set of $p$ rows will be used as l and u, respectively.

For further discussion, see Section 6 of Gneiting, T. and Raftery, A. E. (2007), "Strictly proper scoring rules, prediction, and estimation", Journal of the American statistical Association, 102, 359–378.

source
diff --git a/dev/API/simulation/index.html b/dev/API/simulation/index.html index dcf977b..ed2e24e 100644 --- a/dev/API/simulation/index.html +++ b/dev/API/simulation/index.html @@ -8,7 +8,7 @@ D = pairwise(Euclidean(), S, dims = 1) Σ = Symmetric(matern.(D, ρ, ν)) L = cholesky(Σ).L -simulategaussian(L)source
NeuralEstimators.simulatepottsFunction
simulatepotts(grid::Matrix{Int}, β)
+simulategaussian(L)
source
NeuralEstimators.simulatepottsFunction
simulatepotts(grid::Matrix{Int}, β)
 simulatepotts(grid::Matrix{Union{Int, Nothing}}, β)
 simulatepotts(nrows::Int, ncols::Int, num_states::Int, β)

Chequerboard Gibbs sampling from 2D Potts model with parameter β>0.

Approximately independent simulations can be obtained by setting nsims > 1 or num_iterations > burn. The degree to which the resulting simulations can be considered independent depends on the thinning factor (thin) and the burn-in (burn).

Keyword arguments

  • nsims = 1: number of approximately independent replicates.
  • num_iterations = 2000: number of MCMC iterations.
  • burn = num_iterations: burn-in.
  • thin = 10: thinning factor.

Examples

using NeuralEstimators 
 
@@ -32,7 +32,7 @@
 using Plots 
 grids = [simulatepotts(100, 100, 2, β) for β ∈ 0.3:0.1:1.2]
 heatmaps = heatmap.(grids, legend = false, aspect_ratio=1)
-Plots.plot(heatmaps...)
source
NeuralEstimators.simulateschlatherFunction
simulateschlather(L::Matrix, m = 1; C = 3.5, Gumbel::Bool = false)

Simulates m independent and identically distributed (i.i.d.) realisations from Schlather's max-stable model using the algorithm for approximate simulation given by Schlather (2002).

Requires the lower Cholesky factor L associated with the covariance matrix of the underlying Gaussian process.

If m is not specified, the simulated data are returned as a vector with length equal to the number of spatial locations, $n$; otherwise, the data are returned as an $n$xm matrix.

Keyword arguments

  • C = 3.5: a tuning parameter that controls the accuracy of the algorithm: small C favours computational efficiency, while large C favours accuracy. Schlather (2002) recommends the use of C = 3.
  • Gumbel = true: flag indicating whether the data should be log-transformed from the unit Fréchet scale to the Gumbel scale.

Examples

using NeuralEstimators, Distances, LinearAlgebra
+Plots.plot(heatmaps...)
source
NeuralEstimators.simulateschlatherFunction
simulateschlather(L::Matrix, m = 1; C = 3.5, Gumbel::Bool = false)

Simulates m independent and identically distributed (i.i.d.) realisations from Schlather's max-stable model using the algorithm for approximate simulation given by Schlather (2002).

Requires the lower Cholesky factor L associated with the covariance matrix of the underlying Gaussian process.

If m is not specified, the simulated data are returned as a vector with length equal to the number of spatial locations, $n$; otherwise, the data are returned as an $n$xm matrix.

Keyword arguments

  • C = 3.5: a tuning parameter that controls the accuracy of the algorithm: small C favours computational efficiency, while large C favours accuracy. Schlather (2002) recommends the use of C = 3.
  • Gumbel = true: flag indicating whether the data should be log-transformed from the unit Fréchet scale to the Gumbel scale.

Examples

using NeuralEstimators, Distances, LinearAlgebra
 
 n = 500
 ρ = 0.6
@@ -41,7 +41,7 @@
 D = pairwise(Euclidean(), S, dims = 1)
 Σ = Symmetric(matern.(D, ρ, ν))
 L = cholesky(Σ).L
-simulateschlather(L)
source

Spatial point processes

NeuralEstimators.maternclusterprocessFunction
maternclusterprocess(; λ=10, μ=10, r=0.1, xmin=0, xmax=1, ymin=0, ymax=1, unit_bounding_box=false)

Simulates a Matérn cluster process with density of parent Poisson point process λ, mean number of daughter points μ, and radius of cluster disk r, over the simulation window defined by xmin and xmax, ymin and ymax.

If unit_bounding_box is true, then the simulated points will be scaled so that the longest side of their bounding box is equal to one (this may change the simulation window).

See also the R package spatstat, which provides functions for simulating from a range of point processes and which can be interfaced from Julia using RCall.

Examples

using NeuralEstimators
+simulateschlather(L)
source

Spatial point processes

NeuralEstimators.maternclusterprocessFunction
maternclusterprocess(; λ=10, μ=10, r=0.1, xmin=0, xmax=1, ymin=0, ymax=1, unit_bounding_box=false)

Simulates a Matérn cluster process with density of parent Poisson point process λ, mean number of daughter points μ, and radius of cluster disk r, over the simulation window defined by xmin and xmax, ymin and ymax.

If unit_bounding_box is true, then the simulated points will be scaled so that the longest side of their bounding box is equal to one (this may change the simulation window).

See also the R package spatstat, which provides functions for simulating from a range of point processes and which can be interfaced from Julia using RCall.

Examples

using NeuralEstimators
 
 # Simulate a realisation from a Matérn cluster process
 S = maternclusterprocess()
@@ -57,13 +57,13 @@
 plots = map(eachindex(λ)) do i
 	S = maternclusterprocess(λ = λ[i], μ = μ[i])
 	scatterplot(S[:, 1], S[:, 2])
-end
source

Covariance functions

These covariance functions may be of use for various models.

NeuralEstimators.maternFunction
matern(h, ρ, ν, σ² = 1)

Given distance $\|\boldsymbol{h}\|$ (h), computes the Matérn covariance function,

\[C(\|\boldsymbol{h}\|) = \sigma^2 \frac{2^{1 - \nu}}{\Gamma(\nu)} \left(\frac{\|\boldsymbol{h}\|}{\rho}\right)^\nu K_\nu \left(\frac{\|\boldsymbol{h}\|}{\rho}\right),\]

where ρ is a range parameter, ν is a smoothness parameter, σ² is the marginal variance, $\Gamma(\cdot)$ is the gamma function, and $K_\nu(\cdot)$ is the modified Bessel function of the second kind of order $\nu$.

source
NeuralEstimators.paciorekFunction
paciorek(s, r, ω₁, ω₂, ρ, β)

Given spatial locations s and r, computes the nonstationary covariance function,

\[C(\boldsymbol{s}, \boldsymbol{r}) = +end

source

Covariance functions

These covariance functions may be of use for various models.

NeuralEstimators.maternFunction
matern(h, ρ, ν, σ² = 1)

Given distance $\|\boldsymbol{h}\|$ (h), computes the Matérn covariance function,

\[C(\|\boldsymbol{h}\|) = \sigma^2 \frac{2^{1 - \nu}}{\Gamma(\nu)} \left(\frac{\|\boldsymbol{h}\|}{\rho}\right)^\nu K_\nu \left(\frac{\|\boldsymbol{h}\|}{\rho}\right),\]

where ρ is a range parameter, ν is a smoothness parameter, σ² is the marginal variance, $\Gamma(\cdot)$ is the gamma function, and $K_\nu(\cdot)$ is the modified Bessel function of the second kind of order $\nu$.

source
NeuralEstimators.paciorekFunction
paciorek(s, r, ω₁, ω₂, ρ, β)

Given spatial locations s and r, computes the nonstationary covariance function,

\[C(\boldsymbol{s}, \boldsymbol{r}) = |\boldsymbol{\Sigma}(\boldsymbol{s})|^{1/4} |\boldsymbol{\Sigma}(\boldsymbol{r})|^{1/4} \left|\frac{\boldsymbol{\Sigma}(\boldsymbol{s}) + \boldsymbol{\Sigma}(\boldsymbol{r})}{2}\right|^{-1/2} C^0\big(\sqrt{Q(\boldsymbol{s}, \boldsymbol{r})}\big), \]

where $C^0(h) = \exp\{-(h/\rho)^{3/2}\}$ for range parameter $\rho > 0$, the matrix $\boldsymbol{\Sigma}(\boldsymbol{s}) = \exp(\beta\|\boldsymbol{s} - \boldsymbol{\omega}\|)\boldsymbol{I}$ is a kernel matrix (Paciorek and Schervish, 2006) with scale parameter $\beta > 0$ and $\boldsymbol{\omega} \equiv (\omega_1, \omega_2)' \in \mathcal{D}$, and

\[Q(\boldsymbol{s}, \boldsymbol{r}) = (\boldsymbol{s} - \boldsymbol{r})' \left(\frac{\boldsymbol{\Sigma}(\boldsymbol{s}) + \boldsymbol{\Sigma}(\boldsymbol{r})}{2}\right)^{-1} -(\boldsymbol{s} - \boldsymbol{r})\]

is the squared Mahalanobis distance between $\boldsymbol{s}$ and $\boldsymbol{r}$.

source

Density functions

Density functions are not needed in the workflow of NeuralEstimators. However, as part of a series of comparison studies between neural estimators and likelihood-based estimators given in various paper, we have developed the following functions for evaluating the density function for several popular distributions. We include these in NeuralEstimators to cater for the possibility that they may be of use in future comparison studies.

NeuralEstimators.gaussiandensityFunction
gaussiandensity(y::V, L::LT) where {V <: AbstractVector, LT <: LowerTriangular}
+(\boldsymbol{s} - \boldsymbol{r})\]

is the squared Mahalanobis distance between $\boldsymbol{s}$ and $\boldsymbol{r}$.

source

Density functions

Density functions are not needed in the workflow of NeuralEstimators. However, as part of a series of comparison studies between neural estimators and likelihood-based estimators given in various paper, we have developed the following functions for evaluating the density function for several popular distributions. We include these in NeuralEstimators to cater for the possibility that they may be of use in future comparison studies.

NeuralEstimators.gaussiandensityFunction
gaussiandensity(y::V, L::LT) where {V <: AbstractVector, LT <: LowerTriangular}
 gaussiandensity(y::A, L::LT) where {A <: AbstractArray, LT <: LowerTriangular}
-gaussiandensity(y::A, Σ::M) where {A <: AbstractArray, M <: AbstractMatrix}

Efficiently computes the density function for y ~ 𝑁(0, Σ) for covariance matrix Σ, and where L is lower Cholesky factor of Σ.

The method gaussiandensity(y::A, L::LT) assumes that the last dimension of y contains independent and identically distributed (iid) replicates.

The log-density is returned if the keyword argument logdensity is true (default).

source
NeuralEstimators.schlatherbivariatedensityFunction
schlatherbivariatedensity(z₁, z₂, ψ; logdensity = true)

The bivariate density function for Schlather's max-stable model.

source
+gaussiandensity(y::A, Σ::M) where {A <: AbstractArray, M <: AbstractMatrix}

Efficiently computes the density function for y ~ 𝑁(0, Σ) for covariance matrix Σ, and where L is lower Cholesky factor of Σ.

The method gaussiandensity(y::A, L::LT) assumes that the last dimension of y contains independent and identically distributed (iid) replicates.

The log-density is returned if the keyword argument logdensity is true (default).

source
NeuralEstimators.schlatherbivariatedensityFunction
schlatherbivariatedensity(z₁, z₂, ψ; logdensity = true)

The bivariate density function for Schlather's max-stable model.

source
diff --git a/dev/API/utility/index.html b/dev/API/utility/index.html index ff39c5c..1da41ba 100644 --- a/dev/API/utility/index.html +++ b/dev/API/utility/index.html @@ -1,5 +1,5 @@ -Miscellaneous · NeuralEstimators.jl

Miscellaneous

Core

These functions can appear during the core workflow, and may need to be overloaded in some applications.

NeuralEstimators.numberreplicatesFunction
numberreplicates(Z)

Generic function that returns the number of replicates in a given object. Default implementations are provided for commonly used data formats, namely, data stored as an Array or as a GNNGraph.

source
NeuralEstimators.subsetdataFunction
subsetdata(Z::V, i) where {V <: AbstractArray{A}} where {A <: Any}
+Miscellaneous · NeuralEstimators.jl

Miscellaneous

Core

These functions can appear during the core workflow, and may need to be overloaded in some applications.

NeuralEstimators.numberreplicatesFunction
numberreplicates(Z)

Generic function that returns the number of replicates in a given object. Default implementations are provided for commonly used data formats, namely, data stored as an Array or as a GNNGraph.

source
NeuralEstimators.subsetdataFunction
subsetdata(Z::V, i) where {V <: AbstractArray{A}} where {A <: Any}
 subsetdata(Z::A, i) where {A <: AbstractArray{T, N}} where {T, N}
 subsetdata(Z::G, i) where {G <: AbstractGraph}

Return replicate(s) i from each data set in Z.

If the user is working with data that are not covered by the default methods, simply overload the function with the appropriate type for Z.

For graphical data, calls getgraph(), where the replicates are assumed be to stored as batched graphs. Since this can be slow, one should consider using a method of train() that does not require the data to be subsetted when working with graphical data (use numberreplicates() to check that the training and validation data sets are equally replicated, which prevents subsetting).

Examples

using NeuralEstimators
 using GraphNeuralNetworks
@@ -19,11 +19,11 @@
 e = 8 # number of edges
 Z = [batch([rand_graph(n, e, ndata = rand(d, n)) for _ ∈ 1:m]) for k ∈ 1:K]
 subsetdata(Z, 2)   # extract second replicate from each data set
-subsetdata(Z, 1:3) # extract first 3 replicates from each data set
source
NeuralEstimators.subsetparametersFunction
subsetparameters(parameters::M, indices) where {M <: AbstractMatrix}
-subsetparameters(parameters::P, indices) where {P <: ParameterConfigurations}

Subset parameters using a collection of indices.

Arrays in parameters::P with last dimension equal in size to the number of parameter configurations, K, are also subsetted (over their last dimension) using indices. All other fields are left unchanged. To modify this default behaviour, overload subsetparameters.

source

Downstream-inference algorithms

NeuralEstimators.EMType
EM(simulateconditional::Function, MAP::Union{Function, NeuralEstimator}, θ₀ = nothing)

Implements the (Bayesian) Monte Carlo expectation-maximisation (EM) algorithm, with $l$th iteration

\[\boldsymbol{\theta}^{(l)} = +subsetdata(Z, 1:3) # extract first 3 replicates from each data set

source
NeuralEstimators.subsetparametersFunction
subsetparameters(parameters::M, indices) where {M <: AbstractMatrix}
+subsetparameters(parameters::P, indices) where {P <: ParameterConfigurations}

Subset parameters using a collection of indices.

Arrays in parameters::P with last dimension equal in size to the number of parameter configurations, K, are also subsetted (over their last dimension) using indices. All other fields are left unchanged. To modify this default behaviour, overload subsetparameters.

source

Downstream-inference algorithms

NeuralEstimators.EMType
EM(simulateconditional::Function, MAP::Union{Function, NeuralEstimator}, θ₀ = nothing)

Implements the (Bayesian) Monte Carlo expectation-maximisation (EM) algorithm, with $l$th iteration

\[\boldsymbol{\theta}^{(l)} = \argmax_{\boldsymbol{\theta}} \sum_{h = 1}^H \ell(\boldsymbol{\theta}; \boldsymbol{Z}_1, \boldsymbol{Z}_2^{(lh)}) + H\log \pi(\boldsymbol{\theta})\]

where $\ell(\cdot)$ is the complete-data log-likelihood function, $\boldsymbol{Z} \equiv (\boldsymbol{Z}_1', \boldsymbol{Z}_2')'$ denotes the complete data with $\boldsymbol{Z}_1$ and $\boldsymbol{Z}_2$ the observed and missing components, respectively, $\boldsymbol{Z}_2^{(lh)}$, $h = 1, \dots, H$, is simulated from the distribution of $\boldsymbol{Z}_2 \mid \boldsymbol{Z}_1, \boldsymbol{\theta}^{(l-1)}$, and $\pi(\boldsymbol{\theta})$ denotes the prior density.

Fields

The function simulateconditional should have a signature of the form,

simulateconditional(Z::A, θ; nsims = 1) where {A <: AbstractArray{Union{Missing, T}}} where T

The output of simulateconditional should be the completed-data Z, and it should be returned in whatever form is appropriate to be passed to the MAP estimator as MAP(Z). For example, if the data are gridded and the MAP is a neural MAP estimator based on a CNN architecture, then Z should be returned as a four-dimensional array.

The field MAP can be a function (to facilitate the conventional Monte Carlo EM algorithm) or a NeuralEstimator (to facilitate the so-called neural EM algorithm).

The starting values θ₀ may be provided during initialisation (as a vector), or when applying the EM object to data (see below). The starting values given in a function call take precedence over those stored in the object.

Methods

Once constructed, obects of type EM can be applied to data via the methods,

(em::EM)(Z::A, θ₀::Union{Nothing, Vector} = nothing; ...) where {A <: AbstractArray{Union{Missing, T}, N}} where {T, N}
-(em::EM)(Z::V, θ₀::Union{Nothing, Vector, Matrix} = nothing; ...) where {V <: AbstractVector{A}} where {A <: AbstractArray{Union{Missing, T}, N}} where {T, N}

where Z is the complete data containing the observed data and Missing values. Note that the second method caters for the case that one has multiple data sets. The keyword arguments are:

  • nsims = 1: the number $H$ of conditional simulations in each iteration.
  • niterations = 50: the maximum number of iterations.
  • nconsecutive = 3: the number of consecutive iterations for which the convergence criterion must be met.
  • ϵ = 0.01: tolerance used to assess convergence; the algorithm halts if the relative change in parameter values in successive iterations is less than ϵ.
  • return_iterates::Bool: if true, the estimate at each iteration of the algorithm is returned; otherwise, only the final estimate is returned.
  • ξ = nothing: model information needed for conditional simulation (e.g., distance matrices) or in the MAP estimator.
  • use_ξ_in_simulateconditional::Bool = false: if set to true, the conditional simulator is called as simulateconditional(Z, θ, ξ; nsims = nsims).
  • use_ξ_in_MAP::Bool = false: if set to true, the MAP estimator is called as MAP(Z, ξ).
  • use_gpu::Bool = true
  • verbose::Bool = false

Examples

# See the "Missing data" section in "Advanced usage"
source

Utility functions

NeuralEstimators.adjacencymatrixFunction
adjacencymatrix(S::Matrix, k::Integer; maxmin = false, combined = false)
+(em::EM)(Z::V, θ₀::Union{Nothing, Vector, Matrix} = nothing; ...) where {V <: AbstractVector{A}} where {A <: AbstractArray{Union{Missing, T}, N}} where {T, N}

where Z is the complete data containing the observed data and Missing values. Note that the second method caters for the case that one has multiple data sets. The keyword arguments are:

  • nsims = 1: the number $H$ of conditional simulations in each iteration.
  • niterations = 50: the maximum number of iterations.
  • nconsecutive = 3: the number of consecutive iterations for which the convergence criterion must be met.
  • ϵ = 0.01: tolerance used to assess convergence; the algorithm halts if the relative change in parameter values in successive iterations is less than ϵ.
  • return_iterates::Bool: if true, the estimate at each iteration of the algorithm is returned; otherwise, only the final estimate is returned.
  • ξ = nothing: model information needed for conditional simulation (e.g., distance matrices) or in the MAP estimator.
  • use_ξ_in_simulateconditional::Bool = false: if set to true, the conditional simulator is called as simulateconditional(Z, θ, ξ; nsims = nsims).
  • use_ξ_in_MAP::Bool = false: if set to true, the MAP estimator is called as MAP(Z, ξ).
  • use_gpu::Bool = true
  • verbose::Bool = false

Examples

# See the "Missing data" section in "Advanced usage"
source

Utility functions

NeuralEstimators.adjacencymatrixFunction
adjacencymatrix(S::Matrix, k::Integer; maxmin = false, combined = false)
 adjacencymatrix(S::Matrix, r::AbstractFloat)
 adjacencymatrix(S::Matrix, r::AbstractFloat, k::Integer; random = true)
 adjacencymatrix(M::Matrix; k, r, kwargs...)

Computes a spatially weighted adjacency matrix from spatial locations S based on either the k-nearest neighbours of each location; all nodes within a disc of fixed radius r; or, if both r and k are provided, a subset of k neighbours within a disc of fixed radius r.

Several subsampling strategies are possible when choosing a subset of k neighbours within a disc of fixed radius r. If random=true (default), the neighbours are randomly selected from within the disc (note that this also approximately preserves the distribution of distances within the neighbourhood set). If random=false, a deterministic algorithm is used that aims to preserve the distribution of distances within the neighbourhood set, by choosing those nodes with distances to the central node corresponding to the $\{0, \frac{1}{k}, \frac{2}{k}, \dots, \frac{k-1}{k}, 1\}$ quantiles of the empirical distribution function of distances within the disc. (This algorithm in fact yields $k+1$ neighbours, since both the closest and furthest nodes are always included.) Otherwise,

If maxmin=false (default) the k-nearest neighbours are chosen based on all points in the graph. If maxmin=true, a so-called maxmin ordering is applied, whereby an initial point is selected, and each subsequent point is selected to maximise the minimum distance to those points that have already been selected. Then, the neighbours of each point are defined as the k-nearest neighbours amongst the points that have already appeared in the ordering. If combined=true, the neighbours are defined to be the union of the k-nearest neighbours and the k-nearest neighbours subject to a maxmin ordering.

If S is a square matrix, it is treated as a distance matrix; otherwise, it should be an $n$ x $d$ matrix, where $n$ is the number of spatial locations and $d$ is the spatial dimension (typically $d$ = 2). In the latter case, the distance metric is taken to be the Euclidean distance. Note that use of a maxmin ordering currently requires a matrix of spatial locations (not a distance matrix).

By convention with the functionality in GraphNeuralNetworks.jl which is based on directed graphs, the neighbours of location i are stored in the column A[:, i] where A is the returned adjacency matrix. Therefore, the number of neighbours for each location is given by collect(mapslices(nnz, A; dims = 1)), and the number of times each node is a neighbour of another node is given by collect(mapslices(nnz, A; dims = 2)).

By convention, we do not consider a location to neighbour itself (i.e., the diagonal elements of the adjacency matrix are zero).

Examples

using NeuralEstimators, Distances, SparseArrays
@@ -47,31 +47,31 @@
 adjacencymatrix(D, k)
 adjacencymatrix(D, r)
 adjacencymatrix(D, r, k)
-adjacencymatrix(D, r, k; random = false)
source
NeuralEstimators.containertypeFunction
containertype(A::Type)
 containertype(::Type{A}) where A <: SubArray
 containertype(a::A) where A

Returns the container type of its argument.

If given a SubArray, returns the container type of the parent array.

Examples

a = rand(3, 4)
 containertype(a)
 containertype(typeof(a))
-[containertype(x) for x ∈ eachcol(a)]
source
NeuralEstimators.encodedataFunction
encodedata(Z::A; c::T = zero(T)) where {A <: AbstractArray{Union{Missing, T}, N}} where T, N

For data Z with missing entries, returns an encoded data set (U, W) where W encodes the missingness pattern as an indicator vector and U is the original data Z with missing entries replaced by a fixed constant c.

The indicator vector W is stored in the second-to-last dimension of Z, which should be singleton. If the second-to-last dimension is not singleton, then two singleton dimensions will be added to the array, and W will be stored in the new second-to-last dimension.

Examples

using NeuralEstimators
+[containertype(x) for x ∈ eachcol(a)]
source
NeuralEstimators.encodedataFunction
encodedata(Z::A; c::T = zero(T)) where {A <: AbstractArray{Union{Missing, T}, N}} where T, N

For data Z with missing entries, returns an encoded data set (U, W) where W encodes the missingness pattern as an indicator vector and U is the original data Z with missing entries replaced by a fixed constant c.

The indicator vector W is stored in the second-to-last dimension of Z, which should be singleton. If the second-to-last dimension is not singleton, then two singleton dimensions will be added to the array, and W will be stored in the new second-to-last dimension.

Examples

using NeuralEstimators
 
 # Generate some missing data
 Z = rand(16, 16, 1, 1)
 Z = removedata(Z, 0.25)	 # remove 25% of the data
 
 # Encode the data
-UW = encodedata(Z)
source
NeuralEstimators.estimateinbatchesFunction
estimateinbatches(θ̂, z, t = nothing; batchsize::Integer = 32, use_gpu::Bool = true, kwargs...)

Apply the estimator θ̂ on minibatches of z (and optionally other set-level information t) of size batchsize.

This can prevent memory issues that can occur with large data sets, particularly on the GPU.

Minibatching will only be done if there are multiple data sets in z; this will be inferred by z being a vector, or a tuple whose first element is a vector.

source
NeuralEstimators.estimateinbatchesFunction
estimateinbatches(θ̂, z, t = nothing; batchsize::Integer = 32, use_gpu::Bool = true, kwargs...)

Apply the estimator θ̂ on minibatches of z (and optionally other set-level information t) of size batchsize.

This can prevent memory issues that can occur with large data sets, particularly on the GPU.

Minibatching will only be done if there are multiple data sets in z; this will be inferred by z being a vector, or a tuple whose first element is a vector.

source
NeuralEstimators.IndicatorWeightsType
IndicatorWeights(h_max, n_bins::Integer)
 (w::IndicatorWeights)(h::Matrix)

For spatial locations $\boldsymbol{s}$ and $\boldsymbol{u}$, creates a spatial weight function defined as

\[\boldsymbol{w}(\boldsymbol{s}, \boldsymbol{u}) \equiv (\mathbb{I}(h \in B_k) : k = 1, \dots, K)',\]

where $\mathbb{I}(\cdot)$ denotes the indicator function, $h \equiv \|\boldsymbol{s} - \boldsymbol{u} \|$ is the spatial distance between $\boldsymbol{s}$ and $\boldsymbol{u}$, and $\{B_k : k = 1, \dots, K\}$ is a set of $K =$n_bins equally-sized distance bins covering the spatial distances between 0 and h_max.

Examples

using NeuralEstimators 
 
 h_max = 1
 n_bins = 10
 w = IndicatorWeights(h_max, n_bins)
 h = rand(1, 30) # distances between 30 pairs of spatial locations 
-w(h)
source
NeuralEstimators.initialise_estimatorFunction
initialise_estimator(p::Integer; ...)

Initialise a neural estimator for a statistical model with p unknown parameters.

The estimator is couched in the DeepSets framework (see DeepSet) so that it can be applied to data sets containing an arbitrary number of independent replicates (including the special case of a single replicate).

Note also that the user is free to initialise their neural estimator however they see fit using arbitrary Flux code; see here for Flux's API reference.

Finally, the method with positional argument data_typeis a wrapper that allows one to specify the type of their data (either "unstructured", "gridded", or "irregular_spatial").

Keyword arguments

  • architecture::String: for unstructured multivariate data, one may use a fully-connected multilayer perceptron ("MLP"); for data collected over a grid, a convolutional neural network ("CNN"); and for graphical or irregular spatial data, a graphical neural network ("GNN").
  • d::Integer = 1: for unstructured multivariate data (i.e., when architecture = "MLP"), the dimension of the data (e.g., d = 3 for trivariate data); otherwise, if architecture ∈ ["CNN", "GNN"], the argument d controls the number of input channels (e.g., d = 1 for univariate spatial processes).
  • estimator_type::String = "point": the type of estimator; either "point" or "interval".
  • depth = 3: the number of hidden layers; either a single integer or an integer vector of length two specifying the depth of the inner (summary) and outer (inference) network of the DeepSets framework.
  • width = 32: a single integer or an integer vector of length sum(depth) specifying the width (or number of convolutional filters/channels) in each hidden layer.
  • activation::Function = relu: the (non-linear) activation function of each hidden layer.
  • activation_output::Function = identity: the activation function of the output layer.
  • variance_stabiliser::Union{Nothing, Function} = nothing: a function that will be applied directly to the input, usually to stabilise the variance.
  • kernel_size = nothing: (applicable only to CNNs) a vector of length depth[1] containing integer tuples of length D, where D is the dimension of the convolution (e.g., D = 2 for two-dimensional convolution).
  • weight_by_distance::Bool = true: (applicable only to GNNs) flag indicating whether the estimator will weight by spatial distance; if true, a SpatialGraphConv layer is used in the propagation module; otherwise, a regular GraphConv layer is used.
  • probs = [0.025, 0.975]: (applicable only if estimator_type = "interval") probability levels defining the lower and upper endpoints of the posterior credible interval.

Examples

## MLP, GNN, 1D CNN, and 2D CNN for a statistical model with two parameters:
+w(h)
source
NeuralEstimators.initialise_estimatorFunction
initialise_estimator(p::Integer; ...)

Initialise a neural estimator for a statistical model with p unknown parameters.

The estimator is couched in the DeepSets framework (see DeepSet) so that it can be applied to data sets containing an arbitrary number of independent replicates (including the special case of a single replicate).

Note also that the user is free to initialise their neural estimator however they see fit using arbitrary Flux code; see here for Flux's API reference.

Finally, the method with positional argument data_typeis a wrapper that allows one to specify the type of their data (either "unstructured", "gridded", or "irregular_spatial").

Keyword arguments

  • architecture::String: for unstructured multivariate data, one may use a fully-connected multilayer perceptron ("MLP"); for data collected over a grid, a convolutional neural network ("CNN"); and for graphical or irregular spatial data, a graphical neural network ("GNN").
  • d::Integer = 1: for unstructured multivariate data (i.e., when architecture = "MLP"), the dimension of the data (e.g., d = 3 for trivariate data); otherwise, if architecture ∈ ["CNN", "GNN"], the argument d controls the number of input channels (e.g., d = 1 for univariate spatial processes).
  • estimator_type::String = "point": the type of estimator; either "point" or "interval".
  • depth = 3: the number of hidden layers; either a single integer or an integer vector of length two specifying the depth of the inner (summary) and outer (inference) network of the DeepSets framework.
  • width = 32: a single integer or an integer vector of length sum(depth) specifying the width (or number of convolutional filters/channels) in each hidden layer.
  • activation::Function = relu: the (non-linear) activation function of each hidden layer.
  • activation_output::Function = identity: the activation function of the output layer.
  • variance_stabiliser::Union{Nothing, Function} = nothing: a function that will be applied directly to the input, usually to stabilise the variance.
  • kernel_size = nothing: (applicable only to CNNs) a vector of length depth[1] containing integer tuples of length D, where D is the dimension of the convolution (e.g., D = 2 for two-dimensional convolution).
  • weight_by_distance::Bool = true: (applicable only to GNNs) flag indicating whether the estimator will weight by spatial distance; if true, a SpatialGraphConv layer is used in the propagation module; otherwise, a regular GraphConv layer is used.
  • probs = [0.025, 0.975]: (applicable only if estimator_type = "interval") probability levels defining the lower and upper endpoints of the posterior credible interval.

Examples

## MLP, GNN, 1D CNN, and 2D CNN for a statistical model with two parameters:
 p = 2
 initialise_estimator(p, architecture = "MLP")
 initialise_estimator(p, architecture = "GNN")
 initialise_estimator(p, architecture = "CNN", kernel_size = [10, 5, 3])
-initialise_estimator(p, architecture = "CNN", kernel_size = [(10, 10), (5, 5), (3, 3)])
source
NeuralEstimators.materncholsFunction
maternchols(D, ρ, ν, σ² = 1; stack = true)

Given a matrix D of distances, constructs the Cholesky factor of the covariance matrix under the Matérn covariance function with range parameter ρ, smoothness parameter ν, and marginal variance σ².

Providing vectors of parameters will yield a three-dimensional array of Cholesky factors (note that the vectors must of the same length, but a mix of vectors and scalars is allowed). A vector of distance matrices D may also be provided.

If stack = true, the Cholesky factors will be "stacked" into a three-dimensional array (this is only possible if all distance matrices in D are the same size).

Examples

using NeuralEstimators
+initialise_estimator(p, architecture = "CNN", kernel_size = [(10, 10), (5, 5), (3, 3)])
source
NeuralEstimators.materncholsFunction
maternchols(D, ρ, ν, σ² = 1; stack = true)

Given a matrix D of distances, constructs the Cholesky factor of the covariance matrix under the Matérn covariance function with range parameter ρ, smoothness parameter ν, and marginal variance σ².

Providing vectors of parameters will yield a three-dimensional array of Cholesky factors (note that the vectors must of the same length, but a mix of vectors and scalars is allowed). A vector of distance matrices D may also be provided.

If stack = true, the Cholesky factors will be "stacked" into a three-dimensional array (this is only possible if all distance matrices in D are the same size).

Examples

using NeuralEstimators
 using LinearAlgebra: norm
 n  = 10
 S  = rand(n, 2)
@@ -90,7 +90,7 @@
 
 S̃  = rand(2n, 2)
 D̃  = [norm(sᵢ - sⱼ) for sᵢ ∈ eachrow(S̃), sⱼ ∈ eachrow(S̃)]
-maternchols([D, D̃], ρ, ν, σ²; stack = false)
source
NeuralEstimators.removedataFunction
removedata(Z::Array, Iᵤ::Vector{Integer})
 removedata(Z::Array, p::Union{Float, Vector{Float}}; prevent_complete_missing = true)
 removedata(Z::Array, n::Integer; fixed_pattern = false, contiguous_pattern = false, variable_proportion = false)

Replaces elements of Z with missing.

The simplest method accepts a vector of integers Iᵤ that give the specific indices of the data to be removed.

Alternatively, there are two methods available to generate data that are missing completely at random (MCAR).

First, a vector p may be given that specifies the proportion of missingness for each element in the response vector. Hence, p should have length equal to the dimension of the response vector. If a single proportion is given, it will be replicated accordingly. If prevent_complete_missing = true, no replicates will contain 100% missingness (note that this can slightly alter the effective values of p).

Second, if an integer n is provided, all replicates will contain n observations after the data are removed. If fixed_pattern = true, the missingness pattern is fixed for all replicates. If contiguous_pattern = true, the data will be removed in a contiguous block. If variable_proportion = true, the proportion of missingness will vary across replicates, with each replicate containing between 1 and n observations after data removal, sampled uniformly (note that variable_proportion overrides fixed_pattern).

The return type is Array{Union{T, Missing}}.

Examples

d = 5           # dimension of each replicate
 m = 2000        # number of replicates
@@ -102,7 +102,7 @@
 
 # Passing a desired final sample size
 n = 3  # number of observed elements of each replicate: must have n <= d
-removedata(Z, n)
source
NeuralEstimators.spatialgraphFunction
spatialgraph(S)
 spatialgraph(S, Z)
 spatialgraph(g::GNNGraph, Z)

Given spatial data Z measured at spatial locations S, constructs a GNNGraph ready for use in a graph neural network that employs SpatialGraphConv layers.

When $m$ independent replicates are collected over the same set of $n$ spatial locations,

\[\{\boldsymbol{s}_1, \dots, \boldsymbol{s}_n\} \subset \mathcal{D},\]

where $\mathcal{D} \subset \mathbb{R}^d$ denotes the spatial domain of interest, Z should be given as an $n \times m$ matrix and S should be given as an $n \times d$ matrix. Otherwise, when $m$ independent replicates are collected over differing sets of spatial locations,

\[\{\boldsymbol{s}_{ij}, \dots, \boldsymbol{s}_{in_i}\} \subset \mathcal{D}, \quad i = 1, \dots, m,\]

Z should be given as an $m$-vector of $n_i$-vectors, and S should be given as an $m$-vector of $n_i \times d$ matrices.

The spatial information between neighbours is stored as an edge feature, with the specific information controlled by the keyword arguments stationary and isotropic. Specifically, the edge feature between node $j$ and node $j'$ stores the spatial distance $\|\boldsymbol{s}_{j'} - \boldsymbol{s}_j\|$ (if isotropic), the spatial displacement $\boldsymbol{s}_{j'} - \boldsymbol{s}_j$ (if stationary), or the matrix of locations $(\boldsymbol{s}_{j'}, \boldsymbol{s}_j)$ (if !stationary).

Additional keyword arguments inherit from adjacencymatrix() to determine the neighbourhood of each node, with the default being a randomly selected set of k=30 neighbours within a disc of radius r=0.15 units.

Examples

using NeuralEstimators
 
@@ -120,14 +120,14 @@
 n = rand(50:100, m)
 S = rand.(n, d)
 Z = rand.(n)
-g = spatialgraph(S, Z)
source
NeuralEstimators.stackarraysFunction
stackarrays(v::V; merge = true) where {V <: AbstractVector{A}} where {A <: AbstractArray{T, N}} where {T, N}

Stack a vector of arrays v along the last dimension of each array, optionally merging the final dimension of the stacked array.

The arrays must be of the same size for the first N-1 dimensions. However, if merge = true, the size of the final dimension can vary.

Examples

# Vector containing arrays of the same size:
+g = spatialgraph(S, Z)
source
NeuralEstimators.stackarraysFunction
stackarrays(v::V; merge = true) where {V <: AbstractVector{A}} where {A <: AbstractArray{T, N}} where {T, N}

Stack a vector of arrays v along the last dimension of each array, optionally merging the final dimension of the stacked array.

The arrays must be of the same size for the first N-1 dimensions. However, if merge = true, the size of the final dimension can vary.

Examples

# Vector containing arrays of the same size:
 Z = [rand(2, 3, m) for m ∈ (1, 1)];
 stackarrays(Z)
 stackarrays(Z, merge = false)
 
 # Vector containing arrays with differing final dimension size:
 Z = [rand(2, 3, m) for m ∈ (1, 2)];
-stackarrays(Z)
source
NeuralEstimators.vectotrilFunction
vectotril(v; strict = false)
 vectotriu(v; strict = false)

Converts a vector v of length $d(d+1)÷2$ (a triangular number) into a $d × d$ lower or upper triangular matrix.

If strict = true, the matrix will be strictly lower or upper triangular, that is, a $(d+1) × (d+1)$ triangular matrix with zero diagonal.

Note that the triangular matrix is constructed on the CPU, but the returned matrix will be a GPU array if v is a GPU array. Note also that the return type is not of type Triangular matrix (i.e., the zeros are materialised) since Traingular matrices are not always compatible with other GPU operations.

Examples

using NeuralEstimators
 
 d = 4
@@ -136,4 +136,4 @@
 vectotril(v)
 vectotriu(v)
 vectotril(v; strict = true)
-vectotriu(v; strict = true)
source
+vectotriu(v; strict = true)source diff --git a/dev/framework/index.html b/dev/framework/index.html index 8da83e1..48ddc92 100644 --- a/dev/framework/index.html +++ b/dev/framework/index.html @@ -2,4 +2,4 @@ Framework · NeuralEstimators.jl

Framework

In this section, we provide an overview of point estimation using neural Bayes estimators. For a more detailed discussion on the framework and its implementation, see the paper Likelihood-Free Parameter Estimation with Neural Bayes Estimators. For an accessible introduction to amortised neural inferential methods more broadly, see the review paper Neural Methods for Amortised Inference.

Neural Bayes estimators

A parametric statistical model is a set of probability distributions on a sample space $\mathcal{Z} \subseteq \mathbb{R}^n$, where the probability distributions are parameterised via some parameter vector $\boldsymbol{\theta}$ on a parameter space $\Theta \subseteq \mathbb{R}^p$. Suppose that we have data from one such distribution, which we denote as $\boldsymbol{Z}$. Then, the goal of parameter point estimation is to come up with an estimate of the unknown $\boldsymbol{\theta}$ from $\boldsymbol{Z}$ using an estimator,

\[ \hat{\boldsymbol{\theta}} : \mathcal{Z} \to \Theta,\]

which is a mapping from the sample space to the parameter space.

Estimators can be constructed within a decision-theoretic framework. Consider a nonnegative loss function, $L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z}))$, which assesses an estimator $\hat{\boldsymbol{\theta}}(\cdot)$ for a given $\boldsymbol{\theta}$ and data set $\boldsymbol{Z} \sim f(\boldsymbol{z} \mid \boldsymbol{\theta})$, where $f(\boldsymbol{z} \mid \boldsymbol{\theta})$ is the probability density function of the data conditional on $\boldsymbol{\theta}$. An estimator's Bayes risk is its loss averaged over all possible parameter values and data realisations,

\[\int_\Theta \int_{\mathcal{Z}} L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}(\boldsymbol{z}))f(\boldsymbol{z} \mid \boldsymbol{\theta}) \rm{d} \boldsymbol{z} \rm{d} \Pi(\boldsymbol{\theta}), \]

where $\Pi(\cdot)$ is a prior measure for $\boldsymbol{\theta}$. Any minimiser of the Bayes risk is said to be a Bayes estimator with respect to $L(\cdot, \cdot)$ and $\Pi(\cdot)$.

Bayes estimators are theoretically attractive: for example, unique Bayes estimators are admissible and, under suitable regularity conditions and the squared-error loss, are consistent and asymptotically efficient. Further, for a large class of prior distributions, every set of conditions that imply consistency of the maximum likelihood (ML) estimator also imply consistency of Bayes estimators. Importantly, Bayes estimators are not motivated purely by asymptotics: by construction, they are Bayes irrespective of the sample size and model class. Unfortunately, however, Bayes estimators are typically unavailable in closed form for the complex models often encountered in practice. A way forward is to assume a flexible parametric model for $\hat{\boldsymbol{\theta}}(\cdot)$, and to optimise the parameters within that model in order to approximate the Bayes estimator. Neural networks are ideal candidates, since they are universal function approximators, and because they are also fast to evaluate, usually involving only simple matrix-vector operations.

Let $\hat{\boldsymbol{\theta}}(\boldsymbol{Z}; \boldsymbol{\gamma})$ denote a neural network that returns a point estimate from data $\boldsymbol{Z}$, where $\boldsymbol{\gamma}$ contains the neural-network parameters. Bayes estimators may be approximated with $\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma}^*)$ by solving the optimisation problem,

\[\boldsymbol{\gamma}^* \equiv \underset{\boldsymbol{\gamma}}{\mathrm{arg\,min}} \; -\frac{1}{K} \sum_{k = 1}^K L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}(\boldsymbol{z}; \boldsymbol{\gamma})),\]

whose objective function is a Monte Carlo approximation of the Bayes risk made using a set $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ of parameter vectors sampled from the prior $\Pi(\cdot)$ and, for each $k$, data $\boldsymbol{Z}^{(k)}$ simulated from $f(\boldsymbol{z} \mid \boldsymbol{\theta})$. Note that this Monte Carlo approximation does not involve evaluation, or knowledge, of the likelihood function.

The Monte Carlo approximation of the Bayes risk can be straightforwardly minimised with respect to $\boldsymbol{\gamma}$ using back-propagation and stochastic gradient descent. For sufficiently flexible architectures, the point estimator targets a Bayes estimator with respect to $L(\cdot, \cdot)$ and $\Pi(\cdot)$. We therefore call the fitted neural point estimator a neural Bayes estimator. Like Bayes estimators, neural Bayes estimators target a specific point summary of the posterior distribution. For instance, the absolute-error and squared-error loss functions lead to neural Bayes estimators that approximate the posterior median and mean, respectively.

Construction of neural Bayes estimators

The neural Bayes estimator is conceptually simple and can be used in a wide range of problems where other approaches, such as maximum-likelihood estimation, are computationally infeasible. The estimator also has marked practical appeal, as the general workflow for its construction is only loosely connected to the statistical or physical model being considered. The workflow is as follows:

  1. Define the prior, $\Pi(\cdot)$.
  2. Choose a loss function, $L(\cdot, \cdot)$, typically the mean-absolute-error or mean-squared-error loss.
  3. Design a suitable neural-network architecture for the neural point estimator $\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma})$.
  4. Sample parameters from $\Pi(\cdot)$ to form training/validation/test parameter sets.
  5. Given the above parameter sets, simulate data from the model, to form training/validation/test data sets.
  6. Train the neural network (i.e., estimate $\boldsymbol{\gamma}$) by minimising the loss function averaged over the training sets. During training, monitor performance and convergence using the validation sets.
  7. Assess the fitted neural Bayes estimator, $\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma}^*)$, using the test set.
+\frac{1}{K} \sum_{k = 1}^K L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}(\boldsymbol{z}; \boldsymbol{\gamma})),\]

whose objective function is a Monte Carlo approximation of the Bayes risk made using a set $\{\boldsymbol{\theta}^{(k)} : k = 1, \dots, K\}$ of parameter vectors sampled from the prior $\Pi(\cdot)$ and, for each $k$, data $\boldsymbol{Z}^{(k)}$ simulated from $f(\boldsymbol{z} \mid \boldsymbol{\theta})$. Note that this Monte Carlo approximation does not involve evaluation, or knowledge, of the likelihood function.

The Monte Carlo approximation of the Bayes risk can be straightforwardly minimised with respect to $\boldsymbol{\gamma}$ using back-propagation and stochastic gradient descent. For sufficiently flexible architectures, the point estimator targets a Bayes estimator with respect to $L(\cdot, \cdot)$ and $\Pi(\cdot)$. We therefore call the fitted neural point estimator a neural Bayes estimator. Like Bayes estimators, neural Bayes estimators target a specific point summary of the posterior distribution. For instance, the absolute-error and squared-error loss functions lead to neural Bayes estimators that approximate the posterior median and mean, respectively.

Construction of neural Bayes estimators

The neural Bayes estimator is conceptually simple and can be used in a wide range of problems where other approaches, such as maximum-likelihood estimation, are computationally infeasible. The estimator also has marked practical appeal, as the general workflow for its construction is only loosely connected to the statistical or physical model being considered. The workflow is as follows:

  1. Define the prior, $\Pi(\cdot)$.
  2. Choose a loss function, $L(\cdot, \cdot)$, typically the mean-absolute-error or mean-squared-error loss.
  3. Design a suitable neural-network architecture for the neural point estimator $\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma})$.
  4. Sample parameters from $\Pi(\cdot)$ to form training/validation/test parameter sets.
  5. Given the above parameter sets, simulate data from the model, to form training/validation/test data sets.
  6. Train the neural network (i.e., estimate $\boldsymbol{\gamma}$) by minimising the loss function averaged over the training sets. During training, monitor performance and convergence using the validation sets.
  7. Assess the fitted neural Bayes estimator, $\hat{\boldsymbol{\theta}}(\cdot; \boldsymbol{\gamma}^*)$, using the test set.
diff --git a/dev/index.html b/dev/index.html index 56b1d41..099a0b8 100644 --- a/dev/index.html +++ b/dev/index.html @@ -8,4 +8,4 @@ pages = {1--14}, doi = {10.1080/00031305.2023.2249522}, url = {https://doi.org/10.1080/00031305.2023.2249522} -}

Papers using NeuralEstimators

Several other software packages have been developed to facilitate neural likelihood-free inference. These include:

  1. BayesFlow (TensorFlow)
  2. LAMPE (PyTorch)
  3. sbi (PyTorch)
  4. swyft (PyTorch)

A summary of the functionality in these packages is given in Zammit-Mangion et al. (2024, Section 6.1). Note that this list of related packages was created in July 2024; if you have software to add to this list, please contact the package maintainer.

+}

Papers using NeuralEstimators

Several other software packages have been developed to facilitate neural likelihood-free inference. These include:

  1. BayesFlow (TensorFlow)
  2. LAMPE (PyTorch)
  3. sbi (PyTorch)
  4. swyft (PyTorch)

A summary of the functionality in these packages is given in Zammit-Mangion et al. (2024, Section 6.1). Note that this list of related packages was created in July 2024; if you have software to add to this list, please contact the package maintainer.

diff --git a/dev/workflow/advancedusage/index.html b/dev/workflow/advancedusage/index.html index b7fbf2b..a6d24b2 100644 --- a/dev/workflow/advancedusage/index.html +++ b/dev/workflow/advancedusage/index.html @@ -244,4 +244,4 @@ θ₀ = mean.([Π...]) # initial estimate, the prior mean neuralem = EM(simulateconditional, θ̂) -neuralem(Z, θ₀, ξ = ξ, nsims = H, use_ξ_in_simulateconditional = true)

Censored data

Coming soon, based on the methodology presented in Richards et al. (2023+).

+neuralem(Z, θ₀, ξ = ξ, nsims = H, use_ξ_in_simulateconditional = true)

Censored data

Coming soon, based on the methodology presented in Richards et al. (2023+).

diff --git a/dev/workflow/examples/index.html b/dev/workflow/examples/index.html index 7052c60..0f91a01 100644 --- a/dev/workflow/examples/index.html +++ b/dev/workflow/examples/index.html @@ -185,4 +185,4 @@ θ̂(Z) # point estimates θ̃ = Parameters(θ̂(Z), S) # construct Parameters object from the point estimates bs = bootstrap(θ̂, θ̃, simulate, m) # bootstrap estimates -interval(bs) # parametric bootstrap confidence interval +interval(bs) # parametric bootstrap confidence interval diff --git a/dev/workflow/overview/index.html b/dev/workflow/overview/index.html index 3f6ebd2..5c9a1cc 100644 --- a/dev/workflow/overview/index.html +++ b/dev/workflow/overview/index.html @@ -1,2 +1,2 @@ -Overview · NeuralEstimators.jl

Overview

To develop a neural estimator with NeuralEstimators,

  • Sample parameters from the prior distribution. The parameters are stored as $p \times K$ matrices, with $p$ the number of parameters in the model and $K$ the number of parameter vectors in the given parameter set (i.e., training, validation, or test set).
  • Simulate data from the assumed model over the parameter sets generated above. These data are stored as a Vector{A}, with each element of the vector associated with one parameter configuration, and where A depends on the multivariate structure of the data and the representation of the neural estimator (e.g., an Array for CNN-based estimators, a GNNGraph for GNN-based estimators, etc.).
  • Initialise a neural network θ̂.
  • Train θ̂ under the chosen loss function using train().
  • Assess θ̂ using assess(), which uses simulation-based methods to assess the estimator with respect to its sampling distribution.

Once the estimator θ̂ has passed our assessments and is therefore deemed to be well calibrated, it may be applied to observed data. See the Examples and, once familiar with the basic workflow, see Advanced usage for practical considerations on how to most effectively construct neural estimators.

+Overview · NeuralEstimators.jl

Overview

To develop a neural estimator with NeuralEstimators,

  • Sample parameters from the prior distribution. The parameters are stored as $p \times K$ matrices, with $p$ the number of parameters in the model and $K$ the number of parameter vectors in the given parameter set (i.e., training, validation, or test set).
  • Simulate data from the assumed model over the parameter sets generated above. These data are stored as a Vector{A}, with each element of the vector associated with one parameter configuration, and where A depends on the multivariate structure of the data and the representation of the neural estimator (e.g., an Array for CNN-based estimators, a GNNGraph for GNN-based estimators, etc.).
  • Initialise a neural network θ̂.
  • Train θ̂ under the chosen loss function using train().
  • Assess θ̂ using assess(), which uses simulation-based methods to assess the estimator with respect to its sampling distribution.

Once the estimator θ̂ has passed our assessments and is therefore deemed to be well calibrated, it may be applied to observed data. See the Examples and, once familiar with the basic workflow, see Advanced usage for practical considerations on how to most effectively construct neural estimators.