Graphs#
The code is a Python package containing classes and functions for estimating integrals using various sampling algorithms. The Integral class in the package caches samples and evaluates the integral given a hyperparameter psi. It takes two callable arguments: data_given_theta, which is the joint probability of the observed data viewed as a function of parameter, and theta_given_psi, which is the conditional probability of parameters theta given hyperparameter psi. The method parameter specifies the type of sampling algorithm to use, and the options parameter is a dictionary of options for the sampling algorithm.
The package contains several sampling algorithms implemented as functions, including metropolis and langevin. metropolis is a Metropolis sampler, which takes as arguments a function that calculates the unnormalized density or log unnormalized probability, the number of samples to draw, the initial point, the scale of the proposal distribution, and a boolean indicating whether to assume log-probability. langevin is a Metropolis-adjusted Langevin (MALA) sampler, which takes as arguments a function that calculates the unnormalized density or log unnormalized probability, the number of samples to draw, the initial point, the gradient of the log unnormalized probability, the standard deviation of the proposal distribution, and a boolean indicating whether to assume log-probability.
Summary#
|
CMA-ES optimization |
|
Caches the samples to evaluate the integral several times. |
|
Korali TMCMC sampler |
|
Metropolis sampler |
|
Generates samples from the target distribution using a transitional Markov chain Monte Carlo(TMCMC) algorithm. |
Functions#
- class graph.Integral(data_given_theta, theta_given_psi, method='metropolis', **options)#
Caches the samples to evaluate the integral several times.
We follow the methodology developed in [1], also described in S1.4 [2]), The advantage of this approach is that the likelihoods (theta_given_psi’) not re-evaluated for each value of `psi.
Parameters#
- data_given_thetacallable
The joint probability of the observed data viewed as a function of parameter.
- theta_given_psicallable
The conditional probability of parameters theta given hyper-parameter psi.
- methodstr or callable, optional
The type of the sampling algorithm to sample from data_given_theta. Can be one of:
‘metropolis’ (default)
‘langevin’
‘tmcmc’
‘korali’
‘hamiltonian’ (WIP)
- optionsdict, optional
A dictionary of options for the sampling algorithm.
Attributes#
- samplesList
A list of samples obtained from the sampling algorithm.
Raises#
- ValueError
If the provided sampling method is unknown.
References#
1. Wu, S., Angelikopoulos, P., Tauriello, G., Papadimitriou, C., & Koumoutsakos, P. (2016). Fusing heterogeneous data for the calibration of molecular dynamics force fields using hierarchical Bayesian models. The Journal of Chemical Physics, 145(24), 244112.
2. Kulakova, L., Arampatzis, G., Angelikopoulos, P., Hadjidoukas, P., Papadimitriou, C., & Koumoutsakos, P. (2017). Data driven inference for the repulsive exponent of the Lennard-Jones potential in molecular dynamics simulations. Scientific reports, 7(1), 16576. (Appendix, S1.4 Hierarchical Bayesian models)
Examples#
>>> import random >>> import numpy as np >>> random.seed(123456) >>> np.random.seed(123456) >>> data = np.random.normal(0, 1, size=1000) >>> data_given_theta = lambda theta: np.prod(np.exp(-0.5 * (data - theta[0]) ** 2)) >>> theta_given_psi = lambda theta, psi: np.exp(-0.5 * (theta[0] - psi[0]) ** 2) >>> integral = Integral(data_given_theta, theta_given_psi, method='metropolis', ... init=[0], scale=[0.1], draws=1000) >>> integral([0]) # Evaluate the integral for psi=0 0.9984153240011582
- __call__(psi)#
Compute the integral estimate for a given hyperparameter.
Parameters#
- psiarray_like
The hyperparameter values at which to evaluate the integral estimate.
Returns#
- float
The estimated value of the integral.
Examples#
>>> import random >>> from scipy.stats import norm >>> random.seed(123456) >>> np.random.seed(123456) >>> def data_given_theta(theta): ... return norm.pdf(theta[0], loc=1, scale=2) * norm.pdf(theta[0], loc=3, scale=2) >>> def theta_given_psi(theta, psi): ... return norm.pdf(theta[0], loc=psi[0], scale=[1]) >>> integral = Integral(data_given_theta, theta_given_psi, method="metropolis", ... draws=1000, scale=[1.0], init=[0]) >>> integral([2]) 0.2388726795076229
- graph.cmaes(fun, x0, sigma, g_max, trace=False)#
CMA-ES optimization
Parameters#
- funcallable
a target function
- x0tuple
the initial point
- sigmadouble
initial variance
- g_maxint
maximum generation
- tracebool
return a trace of the algorithm (default: False)
Return#
xmin : tuple
- graph.korali(fun, draws, lo, hi, beta=1, return_evidence=False, num_cores=None, comm=None)#
Korali TMCMC sampler
Parameters#
- funcallable
log-probability
- drawsint
the number of samples to draw
- lo, hituples
the bounds of the initial distribution
- betafloat
The coefficient to scale the proposal distribution. Larger values of beta lead to larger proposal steps and potentially faster convergence, but may also increase the likelihood of rejecting proposals (default is 1)
- return_evidencebool
If True, return a tuple containing the samples and the evidence (the logarithm of the normalization constant). If False (the default), return only the samples
- tracebool
If True, return a trace of the algorithm, which is a list of tuples containing the current set of samples and the number of accepted proposals at each iteration. If False (the default), do not return a trace.
- num_cores: int
The number of CPU cores to use for processing. If None (default), the code will not run concurrently
- comm: mpi4py.MPI.Intracomm
MPI communicator for distributed runs. By default, the function runs in serial mode (i.e., comm=None)
Return#
- sampleslist or tuple
a list of samples, a tuple of (samples, log-evidence), or a trace
- graph.metropolis(fun, draws, init, scale, log=False)#
Metropolis sampler
Parameters#
- funcallable
The unnormalized density or the log unnormalized probability. If log is True, fun should return the log unnormalized probability. Otherwise, it should return the unnormalized density.
- drawsint
The number of samples to draw.
- inittuple
The initial point.
- scaletuple
The scale of the proposal distribution. Should be the same size as init.
- logbool, optional
If True, assume that fun returns the log unnormalized probability. Default is False.
Returns#
- sampleslist
A list of draws samples.
Examples#
Define a log unnormalized probability function for a standard normal distribution:
>>> import math >>> import random >>> import statistics >>> random.seed(12345) >>> def log_normal(x): ... return -0.5 * x[0]**2
Draw 1000 samples from the distribution starting at 0, with proposal standard deviation 1:
>>> samples = list(metropolis(log_normal, 10000, [0], [1], log=True))
Check that the mean and standard deviation of the samples are close to 0 and 1, respectively:
>>> math.isclose(statistics.fmean(s[0] for s in samples), 0, abs_tol=0.05) True >>> math.isclose(statistics.fmean(s[0]**2 for s in samples), 1, abs_tol=0.05) True
- graph.tmcmc(fun, draws, lo, hi, beta=1, return_evidence=False, trace=False)#
Generates samples from the target distribution using a transitional Markov chain Monte Carlo(TMCMC) algorithm.
Parameters#
- funcallable
log-probability
- drawsint
the number of samples to draw
- lo, hituples
the bounds of the initial distribution
- betafloat
The coefficient to scale the proposal distribution. Larger values of beta lead to larger proposal steps and potentially faster convergence, but may also increase the likelihood of rejecting proposals (default is 1)
- return_evidencebool
If True, return a tuple containing the samples and the evidence (the logarithm of the normalization constant). If False (the default), return only the samples
- tracebool
If True, return a trace of the algorithm, which is a list of tuples containing the current set of samples and the number of accepted proposals at each iteration. If False (the default), do not return a trace.
Return#
- sampleslist or tuple
a list of samples, a tuple of (samples, log-evidence), or a trace
Examples#
>>> import numpy as np >>> np.random.seed(123) >>> def log_prob(x): ... return -0.5 * sum(x**2 for x in x) >>> samples = tmcmc(log_prob, 10000, [-5, -5], [5, 5]) >>> len(samples) 10000 >>> np.abs(np.mean(samples, axis=0)) < 0.1 array([ True, True])