NPSE#
- class NPSE(prior=None, vf_estimator='mlp', score_estimator=None, density_estimator=None, sde_type='ve', device='cpu', logging_level='WARNING', summary_writer=None, tracker=None, show_progress_bars=True)[source]#
Bases:
VectorFieldTrainerNeural Posterior Score Estimation (NPSE) [1, 2].
NPSE trains a neural network to estimate the score function (gradient of the log posterior) \(\nabla_\theta \log p(\theta|x)\) using denoising score matching. NPSE learns the score of a diffusion process that transforms the prior into the posterior. The neural network can be any expressive architecture. Sampling is performed using SDE solvers (e.g., Langevin dynamics) or ODE solvers, which can be slower than flow-based NPE, but expressiveness can be higher.
NOTE: NPSE does not support multi-round inference with flexible proposals yet. You can try multi-round with truncated proposals, but this is not tested.
[1] Score modeling for simulation-based inference, Geffner et al., ICML 2023. [2] Sequential neural score estimation: Likelihood-free inference with conditional
score based diffusion models, Sharrock et al., ICML 2024.
Example:#
import torch from sbi.inference import NPSE from sbi.utils import BoxUniform # 1. Setup prior and simulate data prior = BoxUniform(low=torch.zeros(3), high=torch.ones(3)) theta = prior.sample((100,)) x = theta + torch.randn_like(theta) * 0.1 # 2. Train score estimator inference = NPSE(prior=prior, sde_type="ve") score_estimator = inference.append_simulations(theta, x).train() # 3. Build posterior (uses SDE solver by default) posterior = inference.build_posterior(score_estimator) # 4. Sample from posterior using Langevin dynamics x_o = torch.randn(1, 3) samples = posterior.sample((1000,), x=x_o)
- build_posterior(vector_field_estimator=None, prior=None, sample_with='sde', vectorfield_sampling_parameters=None, posterior_parameters=None)[source]#
Build posterior from the vector field estimator.
Note that this is the same as the FMPE posterior, but the sample_with method is set to “sde” by default.
For NPSE, the posterior distribution that is returned here implements the following functionality over the raw neural density estimator:
correct the calculation of the log probability such that samples outside of the prior bounds have log probability -inf.
reject samples that lie outside of the prior bounds.
- Parameters:
vector_field_estimator (ConditionalVectorFieldEstimator | None) – The vector field estimator that the posterior is based on. If None, use the latest vector field estimator that was trained.
prior (Distribution | None) – Prior distribution.
sample_with (Literal['ode', 'sde']) – Method to use for sampling from the posterior. Can be one of ‘sde’ (default) or ‘ode’. The ‘sde’ method uses the score to do a Langevin diffusion step, while the ‘ode’ method solves a probabilistic ODE with a numerical ODE solver.
vectorfield_sampling_parameters (Dict[str, Any] | None) – Additional keyword arguments passed to VectorFieldPosterior.
posterior_parameters (VectorFieldPosteriorParameters | None) – Configuration passed to the init method for VectorFieldPosterior.
- Returns:
Posterior \(p(\theta|x)\) with .sample() and .log_prob() methods.
- Return type:
NeuralPosterior
- append_simulations(theta, x, proposal=None, exclude_invalid_x=None, data_device=None)#
Store parameters and simulation outputs to use them for later training.
Data are stored as entries in lists for each type of variable (parameter/data).
Stores \(\theta\), \(x\), prior_masks (indicating if simulations are coming from the prior or not) and an index indicating which round the batch of simulations came from.
- Parameters:
theta (Tensor) – Parameter sets.
x (Tensor) – Simulation outputs.
proposal (DirectPosterior | None) – The distribution that the parameters \(\theta\) were sampled from. Pass None if the parameters were sampled from the prior. Multi-round training is not yet implemented, so anything other than None will raise an error.
exclude_invalid_x (bool | None) – Whether invalid simulations are discarded during training. For single-round training, it is fine to discard invalid simulations, but for multi-round sequential (atomic) training, discarding invalid simulations gives systematically wrong results. If None, it will be True in the first round and False in later rounds. Note that multi-round training is not yet implemented.
data_device (str | None) – Where to store the data, default is on the same device where the training is happening. If training a large dataset on a GPU with not much VRAM can set to ‘cpu’ to store data on system memory instead.
- Returns:
VectorFieldTrainer object (returned so that this function is chainable).
- Return type:
Self
- get_dataloaders(starting_round=0, training_batch_size=200, validation_fraction=0.1, resume_training=False, dataloader_kwargs=None)#
Return dataloaders for training and validation.
- Parameters:
dataset – holding all theta and x, optionally masks.
training_batch_size (int) – training arg of inference methods.
resume_training (bool) – Whether the current call is resuming training so that no new training and validation indices into the dataset have to be created.
dataloader_kwargs (dict | None) – Additional or updated kwargs to be passed to the training and validation dataloaders (like, e.g., a collate_fn).
starting_round (int)
validation_fraction (float)
- Returns:
Tuple of dataloaders for training and validation.
- Return type:
- get_simulations(starting_round=0)#
Returns all \(\theta\), \(x\), and prior_masks from rounds >= starting_round.
If requested, do not return invalid data.
- Parameters:
starting_round (int) – The earliest round to return samples from (we start counting from zero).
warn_on_invalid – Whether to give out a warning if invalid simulations were found.
- Return type:
Returns: Parameters, simulation outputs, prior masks.
- property summary#
- train(training_batch_size=200, learning_rate=0.0005, validation_fraction=0.1, stop_after_epochs=20, max_num_epochs=2147483647, clip_max_norm=5.0, calibration_kernel=None, ema_loss_decay=0.1, validation_times=10, validation_times_nugget=0.05, resume_training=False, force_first_round_loss=False, discard_prior_samples=False, retrain_from_scratch=False, show_train_summary=False, dataloader_kwargs=None)#
Returns a vector field estimator that approximates the posterior \(p(\theta|x)\) through a continuous transformation from the base distribution to the target posterior.
NOTE: This method is common for both score-based methods (NPSE) and flow matching methods (FMPE).
The denoising score matching loss has a high variance, which makes it more difficult to detect converegence. To reduce this variance, we evaluate the validation loss at a fixed set of times. We also use the exponential moving average of the training and validation losses, as opposed to the other trainer classes, which track the loss directly.
- Parameters:
training_batch_size (int) – Training batch size.
learning_rate (float) – Learning rate for Adam optimizer.
validation_fraction (float) – The fraction of data to use for validation.
stop_after_epochs (int) – The number of epochs to wait for improvement on the validation set before terminating training.
max_num_epochs (int) – Maximum number of epochs to run. If reached, we stop training even when the validation loss is still decreasing. Otherwise, we train until validation loss increases (see also stop_after_epochs).
clip_max_norm (float | None) – Value at which to clip the total gradient norm in order to prevent exploding gradients. Use None for no clipping.
calibration_kernel (Callable | None) – A function to calibrate the loss with respect to the simulations x (optional). See Lueckmann, Gonçalves et al., NeurIPS 2017. If None, no calibration is used.
ema_loss_decay (float) – Loss decay strength for exponential moving average of training and validation losses.
validation_times (Tensor | int) – Diffusion times at which to evaluate the validation loss to reduce variance of validation loss.
validation_times_nugget (float) – As both diffusion and flow matching losses often have high variance losses at the end, we add a small nugget to compute the validation loss. Default is 0.05 i.e. t_min + 0.05 or t_max - 0.5.
resume_training (bool) – Can be used in case training time is limited, e.g. on a cluster. If True, the split between train and validation set, the optimizer, the number of epochs, and the best validation log-prob will be restored from the last time .train() was called.
force_first_round_loss (bool) – If True, train with maximum likelihood, i.e., potentially ignoring the correction for using a proposal distribution different from the prior.
discard_prior_samples (bool) – Whether to discard samples simulated in round 1, i.e. from the prior. Training may be sped up by ignoring such less targeted samples.
retrain_from_scratch (bool) – Whether to retrain the conditional density estimator for the posterior from scratch each round.
show_train_summary (bool) – Whether to print the number of epochs and validation loss after the training.
dataloader_kwargs (dict | None) – Additional or updated kwargs to be passed to the training and validation dataloaders (like, e.g., a collate_fn)
- Returns:
Vector field estimator that approximates the posterior.
- Return type:
ConditionalVectorFieldEstimator
- Parameters:
prior (Distribution | None)
vf_estimator (Literal['mlp', 'ada_mlp', 'transformer', 'transformer_cross_attn'] | ~sbi.neural_nets.estimators.base.ConditionalEstimatorBuilder[~sbi.neural_nets.estimators.base.ConditionalVectorFieldEstimator])
score_estimator (Literal['mlp', 'ada_mlp', 'transformer', 'transformer_cross_attn'] | ~sbi.neural_nets.estimators.base.ConditionalEstimatorBuilder[~sbi.neural_nets.estimators.base.ConditionalVectorFieldEstimator] | None)
density_estimator (ConditionalEstimatorBuilder[ConditionalVectorFieldEstimator] | None)
sde_type (Literal['vp', 've', 'subvp'])
device (str)
summary_writer (SummaryWriter | None)
tracker (Tracker | None)
show_progress_bars (bool)