NPE_B#

class NPE_B(prior=None, density_estimator='maf', device='cpu', logging_level='WARNING', summary_writer=None, tracker=None, show_progress_bars=True)[source]#

Bases: PosteriorEstimatorTrainer

Neural Posterior Estimation algorithm (NPE-B) as in Lueckmann et al. (2017) [1].

NPE-B (also known as SNPE-B) trains a neural network to directly approximate the posterior \(p(\theta|x)\) using an importance-weighted loss. Unlike NPE-A, this importance weighting ensures convergence to the true posterior in multi-round inference, and it is not limited to Gaussian proposals. NPE-B can use flexible density estimators like normalizing flows.

For single-round inference, NPE-A, NPE-B, and NPE-C are equivalent and use plain NLL loss.

[1] *Flexible statistical inference for mechanistic models of neural

dynamics*, Lueckmann, Gonçalves et al., NeurIPS 2017. https://arxiv.org/abs/1711.01861

Example:#

import torch
from sbi.inference import NPE_B
from sbi.utils import BoxUniform

# 1. Setup simulator, prior, and observation
prior = BoxUniform(low=torch.zeros(3), high=torch.ones(3))
x_o = torch.randn(1, 3)  # Observed data

def simulator(theta):
    return theta + torch.randn_like(theta) * 0.1

# 2. Multi-round inference
inference = NPE_B(prior=prior)
proposal = prior

for round_idx in range(5):
    theta = proposal.sample((100,))
    x = simulator(theta)
    density_estimator = inference.append_simulations(theta, x).train()
    posterior = inference.build_posterior(density_estimator)
    proposal = posterior.set_default_x(x_o)

# 3. Sample from final posterior
samples = posterior.sample((1000,), x=x_o)
append_simulations(theta, x, proposal=None, exclude_invalid_x=None, data_device=None)#

Store parameters and simulation outputs to use them for later training.

Data are stored as entries in lists for each type of variable (parameter/data).

Stores \(\theta\), \(x\), prior_masks (indicating if simulations are coming from the prior or not) and an index indicating which round the batch of simulations came from.

Parameters:
  • theta (Tensor) – Parameter sets.

  • x (Tensor) – Simulation outputs.

  • proposal (DirectPosterior | None) – The distribution that the parameters \(\theta\) were sampled from. Pass None if the parameters were sampled from the prior. If not None, it will trigger a different loss-function.

  • exclude_invalid_x (bool | None) – Whether invalid simulations are discarded during training. For single-round SNPE, it is fine to discard invalid simulations, but for multi-round SNPE (atomic), discarding invalid simulations gives systematically wrong results. If None, it will be True in the first round and False in later rounds.

  • data_device (str | None) – Where to store the data, default is on the same device where the training is happening. If training a large dataset on a GPU with not much VRAM can set to ‘cpu’ to store data on system memory instead.

Returns:

NeuralInference object (returned so that this function is chainable).

Return type:

Self

build_posterior(density_estimator=None, prior=None, sample_with='direct', mcmc_method='slice_np_vectorized', vi_method='rKL', direct_sampling_parameters=None, mcmc_parameters=None, vi_parameters=None, rejection_sampling_parameters=None, importance_sampling_parameters=None, posterior_parameters=None)#

Build posterior from the neural density estimator.

For SNPE, the posterior distribution that is returned here implements the following functionality over the raw neural density estimator: - correct the calculation of the log probability such that it compensates for

the leakage.

  • reject samples that lie outside of the prior bounds.

  • alternatively, if leakage is very high (which can happen for multi-round

    SNPE), sample from the posterior with MCMC.

Parameters:
  • density_estimator (ConditionalDensityEstimator | None) – The density estimator that the posterior is based on. If None, use the latest neural density estimator that was trained.

  • prior (Distribution | None) – Prior distribution.

  • sample_with (Literal['mcmc', 'rejection', 'vi', 'importance', 'direct']) – Method to use for sampling from the posterior. Must be one of [direct | mcmc | rejection | vi | importance].

  • mcmc_method (Literal['slice_np', 'slice_np_vectorized', 'hmc_pyro', 'nuts_pyro', 'slice_pymc', 'hmc_pymc', 'nuts_pymc']) – Method used for MCMC sampling, one of slice_np, slice_np_vectorized, hmc_pyro, nuts_pyro, slice_pymc, hmc_pymc, nuts_pymc. slice_np is a custom numpy implementation of slice sampling. slice_np_vectorized is identical to slice_np, but if num_chains>1, the chains are vectorized for slice_np_vectorized whereas they are run sequentially for slice_np. The samplers ending on _pyro are using Pyro, and likewise the samplers ending on _pymc are using PyMC.

  • vi_method (Literal['rKL', 'fKL', 'IW', 'alpha']) – Method used for VI, one of [rKL, fKL, IW, alpha]. Note some of the methods admit a mode seeking property (e.g. rKL) whereas some admit a mass covering one (e.g fKL).

  • direct_sampling_parameters (Dict[str, Any] | None) – Additional kwargs passed to DirectPosterior.

  • mcmc_parameters (Dict[str, Any] | None) – Additional kwargs passed to MCMCPosterior.

  • vi_parameters (Dict[str, Any] | None) – Additional kwargs passed to VIPosterior.

  • rejection_sampling_parameters (Dict[str, Any] | None) – Additional kwargs passed to RejectionPosterior.

  • importance_sampling_parameters (Dict[str, Any] | None) – Additional kwargs passed to ImportanceSamplingPosterior.

  • posterior_parameters (DirectPosteriorParameters | MCMCPosteriorParameters | VIPosteriorParameters | RejectionPosteriorParameters | ImportanceSamplingPosteriorParameters | None) – Configuration passed to the init method for the posterior. Must be one of the following - VIPosteriorParameters - ImportanceSamplingPosteriorParameters - MCMCPosteriorParameters - DirectPosteriorParameters - RejectionPosteriorParameters

Returns:

Posterior \(p(\theta|x)\) with .sample() and .log_prob() methods (the returned log-probability is unnormalized).

Return type:

NeuralPosterior

get_dataloaders(starting_round=0, training_batch_size=200, validation_fraction=0.1, resume_training=False, dataloader_kwargs=None)#

Return dataloaders for training and validation.

Parameters:
  • dataset – holding all theta and x, optionally masks.

  • training_batch_size (int) – training arg of inference methods.

  • resume_training (bool) – Whether the current call is resuming training so that no new training and validation indices into the dataset have to be created.

  • dataloader_kwargs (dict | None) – Additional or updated kwargs to be passed to the training and validation dataloaders (like, e.g., a collate_fn).

  • starting_round (int)

  • validation_fraction (float)

Returns:

Tuple of dataloaders for training and validation.

Return type:

Tuple[DataLoader, DataLoader]

get_simulations(starting_round=0)#

Returns all \(\theta\), \(x\), and prior_masks from rounds >= starting_round.

If requested, do not return invalid data.

Parameters:
  • starting_round (int) – The earliest round to return samples from (we start counting from zero).

  • warn_on_invalid – Whether to give out a warning if invalid simulations were found.

Return type:

Tuple[Tensor, Tensor, Tensor]

Returns: Parameters, simulation outputs, prior masks.

property summary#
train(training_batch_size=200, learning_rate=0.0005, validation_fraction=0.1, stop_after_epochs=20, max_num_epochs=2147483647, clip_max_norm=5.0, calibration_kernel=None, resume_training=False, force_first_round_loss=False, discard_prior_samples=False, retrain_from_scratch=False, show_train_summary=False, dataloader_kwargs=None)#

Return density estimator that approximates the distribution \(p(\theta|x)\).

Parameters:
  • training_batch_size (int) – Training batch size.

  • learning_rate (float) – Learning rate for Adam optimizer.

  • validation_fraction (float) – The fraction of data to use for validation.

  • stop_after_epochs (int) – The number of epochs to wait for improvement on the validation set before terminating training.

  • max_num_epochs (int) – Maximum number of epochs to run. If reached, we stop training even when the validation loss is still decreasing. Otherwise, we train until validation loss increases (see also stop_after_epochs).

  • clip_max_norm (float | None) – Value at which to clip the total gradient norm in order to prevent exploding gradients. Use None for no clipping.

  • calibration_kernel (Callable | None) – A function to calibrate the loss with respect to the simulations x (optional). See Lueckmann, Gonçalves et al., NeurIPS 2017. If None, no calibration is used.

  • resume_training (bool) – Can be used in case training time is limited, e.g. on a cluster. If True, the split between train and validation set, the optimizer, the number of epochs, and the best validation log-prob will be restored from the last time .train() was called.

  • force_first_round_loss (bool) – If True, train with maximum likelihood, i.e., potentially ignoring the correction for using a proposal distribution different from the prior.

  • discard_prior_samples (bool) – Whether to discard samples simulated in round 1, i.e. from the prior. Training may be sped up by ignoring such less targeted samples.

  • retrain_from_scratch (bool) – Whether to retrain the conditional density estimator for the posterior from scratch each round.

  • show_train_summary (bool) – Whether to print the number of epochs and validation loss after the training.

  • dataloader_kwargs (dict | None) – Additional or updated kwargs to be passed to the training and validation dataloaders (like, e.g., a collate_fn)

Returns:

Density estimator that approximates the distribution \(p(\theta|x)\).

Return type:

ConditionalDensityEstimator

Parameters:
  • prior (Distribution | None)

  • density_estimator (Literal['nsf', 'maf', 'mdn', 'made'] | ~sbi.neural_nets.estimators.base.ConditionalEstimatorBuilder[~sbi.neural_nets.estimators.base.ConditionalDensityEstimator])

  • device (str)

  • logging_level (int | str)

  • summary_writer (SummaryWriter | None)

  • tracker (Tracker | None)

  • show_progress_bars (bool)