NPE_B#
- class NPE_B(prior=None, density_estimator='maf', device='cpu', logging_level='WARNING', summary_writer=None, tracker=None, show_progress_bars=True)[source]#
Bases:
PosteriorEstimatorTrainerNeural Posterior Estimation algorithm (NPE-B) as in Lueckmann et al. (2017) [1].
NPE-B (also known as SNPE-B) trains a neural network to directly approximate the posterior \(p(\theta|x)\) using an importance-weighted loss. Unlike NPE-A, this importance weighting ensures convergence to the true posterior in multi-round inference, and it is not limited to Gaussian proposals. NPE-B can use flexible density estimators like normalizing flows.
For single-round inference, NPE-A, NPE-B, and NPE-C are equivalent and use plain NLL loss.
- [1] *Flexible statistical inference for mechanistic models of neural
dynamics*, Lueckmann, Gonçalves et al., NeurIPS 2017. https://arxiv.org/abs/1711.01861
Example:#
import torch from sbi.inference import NPE_B from sbi.utils import BoxUniform # 1. Setup simulator, prior, and observation prior = BoxUniform(low=torch.zeros(3), high=torch.ones(3)) x_o = torch.randn(1, 3) # Observed data def simulator(theta): return theta + torch.randn_like(theta) * 0.1 # 2. Multi-round inference inference = NPE_B(prior=prior) proposal = prior for round_idx in range(5): theta = proposal.sample((100,)) x = simulator(theta) density_estimator = inference.append_simulations(theta, x).train() posterior = inference.build_posterior(density_estimator) proposal = posterior.set_default_x(x_o) # 3. Sample from final posterior samples = posterior.sample((1000,), x=x_o)
- append_simulations(theta, x, proposal=None, exclude_invalid_x=None, data_device=None)#
Store parameters and simulation outputs to use them for later training.
Data are stored as entries in lists for each type of variable (parameter/data).
Stores \(\theta\), \(x\), prior_masks (indicating if simulations are coming from the prior or not) and an index indicating which round the batch of simulations came from.
- Parameters:
theta (Tensor) – Parameter sets.
x (Tensor) – Simulation outputs.
proposal (DirectPosterior | None) – The distribution that the parameters \(\theta\) were sampled from. Pass None if the parameters were sampled from the prior. If not None, it will trigger a different loss-function.
exclude_invalid_x (bool | None) – Whether invalid simulations are discarded during training. For single-round SNPE, it is fine to discard invalid simulations, but for multi-round SNPE (atomic), discarding invalid simulations gives systematically wrong results. If None, it will be True in the first round and False in later rounds.
data_device (str | None) – Where to store the data, default is on the same device where the training is happening. If training a large dataset on a GPU with not much VRAM can set to ‘cpu’ to store data on system memory instead.
- Returns:
NeuralInference object (returned so that this function is chainable).
- Return type:
Self
- build_posterior(density_estimator=None, prior=None, sample_with='direct', mcmc_method='slice_np_vectorized', vi_method='rKL', direct_sampling_parameters=None, mcmc_parameters=None, vi_parameters=None, rejection_sampling_parameters=None, importance_sampling_parameters=None, posterior_parameters=None)#
Build posterior from the neural density estimator.
For SNPE, the posterior distribution that is returned here implements the following functionality over the raw neural density estimator: - correct the calculation of the log probability such that it compensates for
the leakage.
reject samples that lie outside of the prior bounds.
- alternatively, if leakage is very high (which can happen for multi-round
SNPE), sample from the posterior with MCMC.
- Parameters:
density_estimator (ConditionalDensityEstimator | None) – The density estimator that the posterior is based on. If None, use the latest neural density estimator that was trained.
prior (Distribution | None) – Prior distribution.
sample_with (Literal['mcmc', 'rejection', 'vi', 'importance', 'direct']) – Method to use for sampling from the posterior. Must be one of [direct | mcmc | rejection | vi | importance].
mcmc_method (Literal['slice_np', 'slice_np_vectorized', 'hmc_pyro', 'nuts_pyro', 'slice_pymc', 'hmc_pymc', 'nuts_pymc']) – Method used for MCMC sampling, one of slice_np, slice_np_vectorized, hmc_pyro, nuts_pyro, slice_pymc, hmc_pymc, nuts_pymc. slice_np is a custom numpy implementation of slice sampling. slice_np_vectorized is identical to slice_np, but if num_chains>1, the chains are vectorized for slice_np_vectorized whereas they are run sequentially for slice_np. The samplers ending on _pyro are using Pyro, and likewise the samplers ending on _pymc are using PyMC.
vi_method (Literal['rKL', 'fKL', 'IW', 'alpha']) – Method used for VI, one of [rKL, fKL, IW, alpha]. Note some of the methods admit a mode seeking property (e.g. rKL) whereas some admit a mass covering one (e.g fKL).
direct_sampling_parameters (Dict[str, Any] | None) – Additional kwargs passed to DirectPosterior.
mcmc_parameters (Dict[str, Any] | None) – Additional kwargs passed to MCMCPosterior.
vi_parameters (Dict[str, Any] | None) – Additional kwargs passed to VIPosterior.
rejection_sampling_parameters (Dict[str, Any] | None) – Additional kwargs passed to RejectionPosterior.
importance_sampling_parameters (Dict[str, Any] | None) – Additional kwargs passed to ImportanceSamplingPosterior.
posterior_parameters (DirectPosteriorParameters | MCMCPosteriorParameters | VIPosteriorParameters | RejectionPosteriorParameters | ImportanceSamplingPosteriorParameters | None) – Configuration passed to the init method for the posterior. Must be one of the following - VIPosteriorParameters - ImportanceSamplingPosteriorParameters - MCMCPosteriorParameters - DirectPosteriorParameters - RejectionPosteriorParameters
- Returns:
Posterior \(p(\theta|x)\) with .sample() and .log_prob() methods (the returned log-probability is unnormalized).
- Return type:
NeuralPosterior
- get_dataloaders(starting_round=0, training_batch_size=200, validation_fraction=0.1, resume_training=False, dataloader_kwargs=None)#
Return dataloaders for training and validation.
- Parameters:
dataset – holding all theta and x, optionally masks.
training_batch_size (int) – training arg of inference methods.
resume_training (bool) – Whether the current call is resuming training so that no new training and validation indices into the dataset have to be created.
dataloader_kwargs (dict | None) – Additional or updated kwargs to be passed to the training and validation dataloaders (like, e.g., a collate_fn).
starting_round (int)
validation_fraction (float)
- Returns:
Tuple of dataloaders for training and validation.
- Return type:
- get_simulations(starting_round=0)#
Returns all \(\theta\), \(x\), and prior_masks from rounds >= starting_round.
If requested, do not return invalid data.
- Parameters:
starting_round (int) – The earliest round to return samples from (we start counting from zero).
warn_on_invalid – Whether to give out a warning if invalid simulations were found.
- Return type:
Returns: Parameters, simulation outputs, prior masks.
- property summary#
- train(training_batch_size=200, learning_rate=0.0005, validation_fraction=0.1, stop_after_epochs=20, max_num_epochs=2147483647, clip_max_norm=5.0, calibration_kernel=None, resume_training=False, force_first_round_loss=False, discard_prior_samples=False, retrain_from_scratch=False, show_train_summary=False, dataloader_kwargs=None)#
Return density estimator that approximates the distribution \(p(\theta|x)\).
- Parameters:
training_batch_size (int) – Training batch size.
learning_rate (float) – Learning rate for Adam optimizer.
validation_fraction (float) – The fraction of data to use for validation.
stop_after_epochs (int) – The number of epochs to wait for improvement on the validation set before terminating training.
max_num_epochs (int) – Maximum number of epochs to run. If reached, we stop training even when the validation loss is still decreasing. Otherwise, we train until validation loss increases (see also stop_after_epochs).
clip_max_norm (float | None) – Value at which to clip the total gradient norm in order to prevent exploding gradients. Use None for no clipping.
calibration_kernel (Callable | None) – A function to calibrate the loss with respect to the simulations x (optional). See Lueckmann, Gonçalves et al., NeurIPS 2017. If None, no calibration is used.
resume_training (bool) – Can be used in case training time is limited, e.g. on a cluster. If True, the split between train and validation set, the optimizer, the number of epochs, and the best validation log-prob will be restored from the last time .train() was called.
force_first_round_loss (bool) – If True, train with maximum likelihood, i.e., potentially ignoring the correction for using a proposal distribution different from the prior.
discard_prior_samples (bool) – Whether to discard samples simulated in round 1, i.e. from the prior. Training may be sped up by ignoring such less targeted samples.
retrain_from_scratch (bool) – Whether to retrain the conditional density estimator for the posterior from scratch each round.
show_train_summary (bool) – Whether to print the number of epochs and validation loss after the training.
dataloader_kwargs (dict | None) – Additional or updated kwargs to be passed to the training and validation dataloaders (like, e.g., a collate_fn)
- Returns:
Density estimator that approximates the distribution \(p(\theta|x)\).
- Return type:
ConditionalDensityEstimator
- Parameters:
prior (Distribution | None)
density_estimator (Literal['nsf', 'maf', 'mdn', 'made'] | ~sbi.neural_nets.estimators.base.ConditionalEstimatorBuilder[~sbi.neural_nets.estimators.base.ConditionalDensityEstimator])
device (str)
summary_writer (SummaryWriter | None)
tracker (Tracker | None)
show_progress_bars (bool)