Graphs#

The code is a Python package containing classes and functions for estimating integrals using various sampling algorithms. The Integral class in the package caches samples and evaluates the integral given a hyperparameter psi. It takes two callable arguments: data_given_theta, which is the joint probability of the observed data viewed as a function of parameter, and theta_given_psi, which is the conditional probability of parameters theta given hyperparameter psi. The method parameter specifies the type of sampling algorithm to use, and the options parameter is a dictionary of options for the sampling algorithm.

The package contains several sampling algorithms implemented as functions, including metropolis and langevin. metropolis is a Metropolis sampler, which takes as arguments a function that calculates the unnormalized density or log unnormalized probability, the number of samples to draw, the initial point, the scale of the proposal distribution, and a boolean indicating whether to assume log-probability. langevin is a Metropolis-adjusted Langevin (MALA) sampler, which takes as arguments a function that calculates the unnormalized density or log unnormalized probability, the number of samples to draw, the initial point, the gradient of the log unnormalized probability, the standard deviation of the proposal distribution, and a boolean indicating whether to assume log-probability.

Summary#

`graph.cmaes`(fun, x0, sigma, g_max[, trace])	CMA-ES optimization
`graph.Integral`(data_given_theta, theta_given_psi)	Caches the samples to evaluate the integral several times.
`graph.korali`(fun, draws, lo, hi[, beta, ...])	Korali TMCMC sampler
`graph.metropolis`(fun, draws, init, scale[, log])	Metropolis sampler
`graph.tmcmc`(fun, draws, lo, hi[, beta, ...])	Generates samples from the target distribution using a transitional Markov chain Monte Carlo(TMCMC) algorithm.

Functions#

class graph.Integral(data_given_theta, theta_given_psi, method='metropolis', **options)#

Caches the samples to evaluate the integral several times.

We follow the methodology developed in [1], also described in S1.4 [2]), The advantage of this approach is that the likelihoods (theta_given_psi’) not re-evaluated for each value of `psi.

Parameters#

data_given_thetacallable

The joint probability of the observed data viewed as a function of parameter.

theta_given_psicallable

The conditional probability of parameters theta given hyper-parameter psi.

methodstr or callable, optional

The type of the sampling algorithm to sample from data_given_theta. Can be one of:

‘metropolis’ (default)
‘langevin’
‘tmcmc’
‘korali’
‘hamiltonian’ (WIP)

optionsdict, optional

A dictionary of options for the sampling algorithm.

Attributes#

samplesList: A list of samples obtained from the sampling algorithm.

Raises#

ValueError: If the provided sampling method is unknown.

References#

1. Wu, S., Angelikopoulos, P., Tauriello, G., Papadimitriou, C., & Koumoutsakos, P. (2016). Fusing heterogeneous data for the calibration of molecular dynamics force fields using hierarchical Bayesian models. The Journal of Chemical Physics, 145(24), 244112.

2. Kulakova, L., Arampatzis, G., Angelikopoulos, P., Hadjidoukas, P., Papadimitriou, C., & Koumoutsakos, P. (2017). Data driven inference for the repulsive exponent of the Lennard-Jones potential in molecular dynamics simulations. Scientific reports, 7(1), 16576. (Appendix, S1.4 Hierarchical Bayesian models)

Examples#

>>> import random
>>> import numpy as np
>>> random.seed(123456)
>>> np.random.seed(123456)
>>> data = np.random.normal(0, 1, size=1000)
>>> data_given_theta = lambda theta: np.prod(np.exp(-0.5 * (data - theta[0]) ** 2))
>>> theta_given_psi = lambda theta, psi: np.exp(-0.5 * (theta[0] - psi[0]) ** 2)
>>> integral = Integral(data_given_theta, theta_given_psi, method='metropolis',
... init=[0], scale=[0.1], draws=1000)
>>> integral([0])  # Evaluate the integral for psi=0
0.9984153240011582

__call__(psi)#

Compute the integral estimate for a given hyperparameter.

Parameters#

psiarray_like: The hyperparameter values at which to evaluate the integral estimate.

Returns#

float: The estimated value of the integral.

Examples#

>>> import random
>>> from scipy.stats import norm
>>> random.seed(123456)
>>> np.random.seed(123456)
>>> def data_given_theta(theta):
...     return norm.pdf(theta[0], loc=1, scale=2) * norm.pdf(theta[0], loc=3, scale=2)
>>> def theta_given_psi(theta, psi):
...     return norm.pdf(theta[0], loc=psi[0], scale=[1])
>>> integral = Integral(data_given_theta, theta_given_psi, method="metropolis",
...    draws=1000, scale=[1.0], init=[0])
>>> integral([2])
0.2388726795076229

graph.cmaes(fun, x0, sigma, g_max, trace=False)#

CMA-ES optimization

Parameters#

funcallable: a target function
x0tuple: the initial point
sigmadouble: initial variance
g_maxint: maximum generation
tracebool: return a trace of the algorithm (default: False)

Return#

xmin : tuple

graph.korali(fun, draws, lo, hi, beta=1, return_evidence=False, num_cores=None, comm=None)#

Korali TMCMC sampler

Parameters#

funcallable: log-probability
drawsint: the number of samples to draw
lo, hituples: the bounds of the initial distribution
betafloat: The coefficient to scale the proposal distribution. Larger values of beta lead to larger proposal steps and potentially faster convergence, but may also increase the likelihood of rejecting proposals (default is 1)
return_evidencebool: If True, return a tuple containing the samples and the evidence (the logarithm of the normalization constant). If False (the default), return only the samples
tracebool: If True, return a trace of the algorithm, which is a list of tuples containing the current set of samples and the number of accepted proposals at each iteration. If False (the default), do not return a trace.
num_cores: int: The number of CPU cores to use for processing. If None (default), the code will not run concurrently
comm: mpi4py.MPI.Intracomm: MPI communicator for distributed runs. By default, the function runs in serial mode (i.e., comm=None)

Return#

sampleslist or tuple: a list of samples, a tuple of (samples, log-evidence), or a trace

graph.metropolis(fun, draws, init, scale, log=False)#

Metropolis sampler

Parameters#

funcallable: The unnormalized density or the log unnormalized probability. If log is True, fun should return the log unnormalized probability. Otherwise, it should return the unnormalized density.
drawsint: The number of samples to draw.
inittuple: The initial point.
scaletuple: The scale of the proposal distribution. Should be the same size as init.
logbool, optional: If True, assume that fun returns the log unnormalized probability. Default is False.

Returns#

sampleslist: A list of draws samples.

Examples#

Define a log unnormalized probability function for a standard normal distribution:

>>> import math
>>> import random
>>> import statistics
>>> random.seed(12345)
>>> def log_normal(x):
...     return -0.5 * x[0]**2

Draw 1000 samples from the distribution starting at 0, with proposal standard deviation 1:

>>> samples = list(metropolis(log_normal, 10000, [0], [1], log=True))

Check that the mean and standard deviation of the samples are close to 0 and 1, respectively:

>>> math.isclose(statistics.fmean(s[0] for s in samples), 0, abs_tol=0.05)
True
>>> math.isclose(statistics.fmean(s[0]**2 for s in samples), 1, abs_tol=0.05)
True

graph.tmcmc(fun, draws, lo, hi, beta=1, return_evidence=False, trace=False)#

Generates samples from the target distribution using a transitional Markov chain Monte Carlo(TMCMC) algorithm.

Parameters#

funcallable: log-probability
drawsint: the number of samples to draw
lo, hituples: the bounds of the initial distribution
betafloat: The coefficient to scale the proposal distribution. Larger values of beta lead to larger proposal steps and potentially faster convergence, but may also increase the likelihood of rejecting proposals (default is 1)
return_evidencebool: If True, return a tuple containing the samples and the evidence (the logarithm of the normalization constant). If False (the default), return only the samples
tracebool: If True, return a trace of the algorithm, which is a list of tuples containing the current set of samples and the number of accepted proposals at each iteration. If False (the default), do not return a trace.

Return#

sampleslist or tuple: a list of samples, a tuple of (samples, log-evidence), or a trace

Examples#

>>> import numpy as np
>>> np.random.seed(123)
>>> def log_prob(x):
...     return -0.5 * sum(x**2 for x in x)
>>> samples = tmcmc(log_prob, 10000, [-5, -5], [5, 5])
>>> len(samples)
10000
>>> np.abs(np.mean(samples, axis=0)) < 0.1
array([ True,  True])