Graphs#

The code is a Python package containing classes and functions for estimating integrals using various sampling algorithms. The Integral class in the package caches samples and evaluates the integral given a hyperparameter psi. It takes two callable arguments: data_given_theta, which is the joint probability of the observed data viewed as a function of parameter, and theta_given_psi, which is the conditional probability of parameters theta given hyperparameter psi. The method parameter specifies the type of sampling algorithm to use, and the options parameter is a dictionary of options for the sampling algorithm.

The package contains several sampling algorithms implemented as functions, including metropolis and langevin. metropolis is a Metropolis sampler, which takes as arguments a function that calculates the unnormalized density or log unnormalized probability, the number of samples to draw, the initial point, the scale of the proposal distribution, and a boolean indicating whether to assume log-probability. langevin is a Metropolis-adjusted Langevin (MALA) sampler, which takes as arguments a function that calculates the unnormalized density or log unnormalized probability, the number of samples to draw, the initial point, the gradient of the log unnormalized probability, the standard deviation of the proposal distribution, and a boolean indicating whether to assume log-probability.

Summary#

graph.cmaes(fun, x0, sigma, g_max[, trace])

CMA-ES optimization

graph.Integral(data_given_theta, theta_given_psi)

Caches the samples to evaluate the integral several times.

graph.korali(fun, draws, lo, hi[, beta, ...])

Korali TMCMC sampler

graph.metropolis(fun, draws, init, scale[, log])

Metropolis sampler

graph.tmcmc(fun, draws, lo, hi[, beta, ...])

Generates samples from the target distribution using a transitional Markov chain Monte Carlo(TMCMC) algorithm.

Functions#

class graph.Integral(data_given_theta, theta_given_psi, method='metropolis', **options)#

Caches the samples to evaluate the integral several times.

We follow the methodology developed in [1], also described in S1.4 [2]), The advantage of this approach is that the likelihoods (theta_given_psi’) not re-evaluated for each value of `psi.

Parameters#

data_given_thetacallable

The joint probability of the observed data viewed as a function of parameter.

theta_given_psicallable

The conditional probability of parameters theta given hyper-parameter psi.

methodstr or callable, optional

The type of the sampling algorithm to sample from data_given_theta. Can be one of:

  • ‘metropolis’ (default)

  • ‘langevin’

  • ‘tmcmc’

  • ‘korali’

  • ‘hamiltonian’ (WIP)

optionsdict, optional

A dictionary of options for the sampling algorithm.

Attributes#

samplesList

A list of samples obtained from the sampling algorithm.

Raises#

ValueError

If the provided sampling method is unknown.

References#

1. Wu, S., Angelikopoulos, P., Tauriello, G., Papadimitriou, C., & Koumoutsakos, P. (2016). Fusing heterogeneous data for the calibration of molecular dynamics force fields using hierarchical Bayesian models. The Journal of Chemical Physics, 145(24), 244112.

2. Kulakova, L., Arampatzis, G., Angelikopoulos, P., Hadjidoukas, P., Papadimitriou, C., & Koumoutsakos, P. (2017). Data driven inference for the repulsive exponent of the Lennard-Jones potential in molecular dynamics simulations. Scientific reports, 7(1), 16576. (Appendix, S1.4 Hierarchical Bayesian models)

Examples#

>>> import random
>>> import numpy as np
>>> random.seed(123456)
>>> np.random.seed(123456)
>>> data = np.random.normal(0, 1, size=1000)
>>> data_given_theta = lambda theta: np.prod(np.exp(-0.5 * (data - theta[0]) ** 2))
>>> theta_given_psi = lambda theta, psi: np.exp(-0.5 * (theta[0] - psi[0]) ** 2)
>>> integral = Integral(data_given_theta, theta_given_psi, method='metropolis',
... init=[0], scale=[0.1], draws=1000)
>>> integral([0])  # Evaluate the integral for psi=0
0.9984153240011582
__call__(psi)#

Compute the integral estimate for a given hyperparameter.

Parameters#

psiarray_like

The hyperparameter values at which to evaluate the integral estimate.

Returns#

float

The estimated value of the integral.

Examples#

>>> import random
>>> from scipy.stats import norm
>>> random.seed(123456)
>>> np.random.seed(123456)
>>> def data_given_theta(theta):
...     return norm.pdf(theta[0], loc=1, scale=2) * norm.pdf(theta[0], loc=3, scale=2)
>>> def theta_given_psi(theta, psi):
...     return norm.pdf(theta[0], loc=psi[0], scale=[1])
>>> integral = Integral(data_given_theta, theta_given_psi, method="metropolis",
...    draws=1000, scale=[1.0], init=[0])
>>> integral([2])
0.2388726795076229
graph.cmaes(fun, x0, sigma, g_max, trace=False)#

CMA-ES optimization

Parameters#

funcallable

a target function

x0tuple

the initial point

sigmadouble

initial variance

g_maxint

maximum generation

tracebool

return a trace of the algorithm (default: False)

Return#

xmin : tuple

graph.korali(fun, draws, lo, hi, beta=1, return_evidence=False, num_cores=None, comm=None)#

Korali TMCMC sampler

Parameters#

funcallable

log-probability

drawsint

the number of samples to draw

lo, hituples

the bounds of the initial distribution

betafloat

The coefficient to scale the proposal distribution. Larger values of beta lead to larger proposal steps and potentially faster convergence, but may also increase the likelihood of rejecting proposals (default is 1)

return_evidencebool

If True, return a tuple containing the samples and the evidence (the logarithm of the normalization constant). If False (the default), return only the samples

tracebool

If True, return a trace of the algorithm, which is a list of tuples containing the current set of samples and the number of accepted proposals at each iteration. If False (the default), do not return a trace.

num_cores: int

The number of CPU cores to use for processing. If None (default), the code will not run concurrently

comm: mpi4py.MPI.Intracomm

MPI communicator for distributed runs. By default, the function runs in serial mode (i.e., comm=None)

Return#

sampleslist or tuple

a list of samples, a tuple of (samples, log-evidence), or a trace

graph.metropolis(fun, draws, init, scale, log=False)#

Metropolis sampler

Parameters#

funcallable

The unnormalized density or the log unnormalized probability. If log is True, fun should return the log unnormalized probability. Otherwise, it should return the unnormalized density.

drawsint

The number of samples to draw.

inittuple

The initial point.

scaletuple

The scale of the proposal distribution. Should be the same size as init.

logbool, optional

If True, assume that fun returns the log unnormalized probability. Default is False.

Returns#

sampleslist

A list of draws samples.

Examples#

Define a log unnormalized probability function for a standard normal distribution:

>>> import math
>>> import random
>>> import statistics
>>> random.seed(12345)
>>> def log_normal(x):
...     return -0.5 * x[0]**2

Draw 1000 samples from the distribution starting at 0, with proposal standard deviation 1:

>>> samples = list(metropolis(log_normal, 10000, [0], [1], log=True))

Check that the mean and standard deviation of the samples are close to 0 and 1, respectively:

>>> math.isclose(statistics.fmean(s[0] for s in samples), 0, abs_tol=0.05)
True
>>> math.isclose(statistics.fmean(s[0]**2 for s in samples), 1, abs_tol=0.05)
True
graph.tmcmc(fun, draws, lo, hi, beta=1, return_evidence=False, trace=False)#

Generates samples from the target distribution using a transitional Markov chain Monte Carlo(TMCMC) algorithm.

Parameters#

funcallable

log-probability

drawsint

the number of samples to draw

lo, hituples

the bounds of the initial distribution

betafloat

The coefficient to scale the proposal distribution. Larger values of beta lead to larger proposal steps and potentially faster convergence, but may also increase the likelihood of rejecting proposals (default is 1)

return_evidencebool

If True, return a tuple containing the samples and the evidence (the logarithm of the normalization constant). If False (the default), return only the samples

tracebool

If True, return a trace of the algorithm, which is a list of tuples containing the current set of samples and the number of accepted proposals at each iteration. If False (the default), do not return a trace.

Return#

sampleslist or tuple

a list of samples, a tuple of (samples, log-evidence), or a trace

Examples#

>>> import numpy as np
>>> np.random.seed(123)
>>> def log_prob(x):
...     return -0.5 * sum(x**2 for x in x)
>>> samples = tmcmc(log_prob, 10000, [-5, -5], [5, 5])
>>> len(samples)
10000
>>> np.abs(np.mean(samples, axis=0)) < 0.1
array([ True,  True])