FMPE#

class FMPE(prior=None, vf_estimator='mlp', density_estimator=None, device='cpu', logging_level='WARNING', summary_writer=None, tracker=None, show_progress_bars=True)[source]#

Bases: VectorFieldTrainer

Flow Matching Posterior Estimation (FMPE) [1].

FMPE trains a continuous normalizing flow (CNF) to transform samples from the prior distribution to the posterior distribution using flow matching. Instead of maximum likelihood, it trains a vector field to match the marginal vector field of a conditional flow that interpolates between the prior and posterior. The neural network architecture for the vector field is not constrained like for flows and can be any expressive network. Sampling is performed by solving an ODE, which can be slower than flow-based NPE, but log_prob evaluation can also be slower.

NOTE: FMPE does not support multi-round inference with flexible proposals yet. You can try multi-round with truncated proposals, but this is not tested.

[1] Flow Matching for Generative Modeling, Lipman et al., ICLR 2023,

https://arxiv.org/abs/2210.02747

Example:#

import torch
from sbi.inference import FMPE
from sbi.utils import BoxUniform

# 1. Setup prior and simulate data
prior = BoxUniform(low=torch.zeros(3), high=torch.ones(3))
theta = prior.sample((100,))
x = theta + torch.randn_like(theta) * 0.1

# 2. Train flow matching estimator
inference = FMPE(prior=prior)
flow_estimator = inference.append_simulations(theta, x).train()

# 3. Build posterior (uses ODE solver for sampling)
posterior = inference.build_posterior(flow_estimator)

# 4. Sample from posterior
x_o = torch.randn(1, 3)
samples = posterior.sample((1000,), x=x_o)
build_posterior(vector_field_estimator=None, prior=None, sample_with='ode', vectorfield_sampling_parameters=None, posterior_parameters=None)[source]#

Build posterior from the flow matching estimator.

Note that this is the same as the NPSE posterior, but the sample_with method is set to “ode” by default.

For FMPE, the posterior distribution that is returned here implements the following functionality over the raw neural density estimator:

  • correct the calculation of the log probability such that samples outside of the prior bounds have log probability -inf.

  • reject samples that lie outside of the prior bounds.

Parameters:
  • vector_field_estimator (ConditionalVectorFieldEstimator | None) – The flow matching estimator that the posterior is based on. If None, use the latest neural flow matching estimator that was trained.

  • prior (Distribution | None) – Prior distribution.

  • sample_with (Literal['ode', 'sde']) – Method to use for sampling from the posterior. Can be one of ‘ode’ (default) or ‘sde’. The ‘ode’ method uses the velocity field to define a probabilistic ODE and solves it with a numerical ODE solver. The ‘sde’ method uses the score to do a Langevin diffusion step.

  • vectorfield_sampling_parameters (Dict[str, Any] | None) – Additional keyword arguments passed to VectorFieldPosterior.

  • posterior_parameters (VectorFieldPosteriorParameters | None) – Configuration passed to the init method for VectorFieldPosterior.

Returns:

Posterior \(p(\theta|x)\) with .sample() and .log_prob() methods.

Return type:

NeuralPosterior

append_simulations(theta, x, proposal=None, exclude_invalid_x=None, data_device=None)#

Store parameters and simulation outputs to use them for later training.

Data are stored as entries in lists for each type of variable (parameter/data).

Stores \(\theta\), \(x\), prior_masks (indicating if simulations are coming from the prior or not) and an index indicating which round the batch of simulations came from.

Parameters:
  • theta (Tensor) – Parameter sets.

  • x (Tensor) – Simulation outputs.

  • proposal (DirectPosterior | None) – The distribution that the parameters \(\theta\) were sampled from. Pass None if the parameters were sampled from the prior. Multi-round training is not yet implemented, so anything other than None will raise an error.

  • exclude_invalid_x (bool | None) – Whether invalid simulations are discarded during training. For single-round training, it is fine to discard invalid simulations, but for multi-round sequential (atomic) training, discarding invalid simulations gives systematically wrong results. If None, it will be True in the first round and False in later rounds. Note that multi-round training is not yet implemented.

  • data_device (str | None) – Where to store the data, default is on the same device where the training is happening. If training a large dataset on a GPU with not much VRAM can set to ‘cpu’ to store data on system memory instead.

Returns:

VectorFieldTrainer object (returned so that this function is chainable).

Return type:

Self

get_dataloaders(starting_round=0, training_batch_size=200, validation_fraction=0.1, resume_training=False, dataloader_kwargs=None)#

Return dataloaders for training and validation.

Parameters:
  • dataset – holding all theta and x, optionally masks.

  • training_batch_size (int) – training arg of inference methods.

  • resume_training (bool) – Whether the current call is resuming training so that no new training and validation indices into the dataset have to be created.

  • dataloader_kwargs (dict | None) – Additional or updated kwargs to be passed to the training and validation dataloaders (like, e.g., a collate_fn).

  • starting_round (int)

  • validation_fraction (float)

Returns:

Tuple of dataloaders for training and validation.

Return type:

Tuple[DataLoader, DataLoader]

get_simulations(starting_round=0)#

Returns all \(\theta\), \(x\), and prior_masks from rounds >= starting_round.

If requested, do not return invalid data.

Parameters:
  • starting_round (int) – The earliest round to return samples from (we start counting from zero).

  • warn_on_invalid – Whether to give out a warning if invalid simulations were found.

Return type:

Tuple[Tensor, Tensor, Tensor]

Returns: Parameters, simulation outputs, prior masks.

property summary#
train(training_batch_size=200, learning_rate=0.0005, validation_fraction=0.1, stop_after_epochs=20, max_num_epochs=2147483647, clip_max_norm=5.0, calibration_kernel=None, ema_loss_decay=0.1, validation_times=10, validation_times_nugget=0.05, resume_training=False, force_first_round_loss=False, discard_prior_samples=False, retrain_from_scratch=False, show_train_summary=False, dataloader_kwargs=None)#

Returns a vector field estimator that approximates the posterior \(p(\theta|x)\) through a continuous transformation from the base distribution to the target posterior.

NOTE: This method is common for both score-based methods (NPSE) and flow matching methods (FMPE).

The denoising score matching loss has a high variance, which makes it more difficult to detect converegence. To reduce this variance, we evaluate the validation loss at a fixed set of times. We also use the exponential moving average of the training and validation losses, as opposed to the other trainer classes, which track the loss directly.

Parameters:
  • training_batch_size (int) – Training batch size.

  • learning_rate (float) – Learning rate for Adam optimizer.

  • validation_fraction (float) – The fraction of data to use for validation.

  • stop_after_epochs (int) – The number of epochs to wait for improvement on the validation set before terminating training.

  • max_num_epochs (int) – Maximum number of epochs to run. If reached, we stop training even when the validation loss is still decreasing. Otherwise, we train until validation loss increases (see also stop_after_epochs).

  • clip_max_norm (float | None) – Value at which to clip the total gradient norm in order to prevent exploding gradients. Use None for no clipping.

  • calibration_kernel (Callable | None) – A function to calibrate the loss with respect to the simulations x (optional). See Lueckmann, Gonçalves et al., NeurIPS 2017. If None, no calibration is used.

  • ema_loss_decay (float) – Loss decay strength for exponential moving average of training and validation losses.

  • validation_times (Tensor | int) – Diffusion times at which to evaluate the validation loss to reduce variance of validation loss.

  • validation_times_nugget (float) – As both diffusion and flow matching losses often have high variance losses at the end, we add a small nugget to compute the validation loss. Default is 0.05 i.e. t_min + 0.05 or t_max - 0.5.

  • resume_training (bool) – Can be used in case training time is limited, e.g. on a cluster. If True, the split between train and validation set, the optimizer, the number of epochs, and the best validation log-prob will be restored from the last time .train() was called.

  • force_first_round_loss (bool) – If True, train with maximum likelihood, i.e., potentially ignoring the correction for using a proposal distribution different from the prior.

  • discard_prior_samples (bool) – Whether to discard samples simulated in round 1, i.e. from the prior. Training may be sped up by ignoring such less targeted samples.

  • retrain_from_scratch (bool) – Whether to retrain the conditional density estimator for the posterior from scratch each round.

  • show_train_summary (bool) – Whether to print the number of epochs and validation loss after the training.

  • dataloader_kwargs (dict | None) – Additional or updated kwargs to be passed to the training and validation dataloaders (like, e.g., a collate_fn)

Returns:

Vector field estimator that approximates the posterior.

Return type:

ConditionalVectorFieldEstimator

Parameters:
  • prior (Distribution | None)

  • vf_estimator (Literal['mlp', 'ada_mlp', 'transformer', 'transformer_cross_attn'] | ~sbi.neural_nets.estimators.base.ConditionalEstimatorBuilder[~sbi.neural_nets.estimators.base.ConditionalVectorFieldEstimator])

  • density_estimator (ConditionalEstimatorBuilder[ConditionalVectorFieldEstimator] | None)

  • device (str)

  • logging_level (int | str)

  • summary_writer (SummaryWriter | None)

  • tracker (Tracker | None)

  • show_progress_bars (bool)