VIPosterior#
- class VIPosterior(potential_fn, prior=None, q='maf', theta_transform=None, vi_method='rKL', device='cpu', x_shape=None, parameters=None, modules=None, num_transforms=5, hidden_features=50, z_score_theta='independent', z_score_x='independent')[source]#
Bases:
NeuralPosteriorProvides VI (Variational Inference) to sample from the posterior.
SNLE or SNRE train neural networks to approximate the likelihood (or likelihood ratios).
VIPosteriorallows learning a tractable variational posterior \(q(\theta)\) which approximates the true posterior \(p(\theta|x_o)\). After this second training stage, we can produce approximate posterior samples by sampling from \(q\) at no additional cost.For additional information, see [1] and [2].
References
- Parameters:
potential_fn (BasePotential | CustomPotential)
prior (Distribution | None)
q (Literal['maf', 'nsf', 'naf', 'unaf', 'nice', 'sospf', 'gf', 'gaussian', 'gaussian_diag'] | ~torch.distributions.distribution.Distribution | VIPosterior | ~typing.Callable)
theta_transform (Transform | None)
vi_method (Literal['rKL', 'fKL', 'IW', 'alpha'])
x_shape (Size | None)
parameters (Iterable | None)
modules (Iterable | None)
num_transforms (int)
hidden_features (int)
z_score_theta (Literal['none', 'independent', 'structured'])
z_score_x (Literal['none', 'independent', 'structured'])
- to(device)[source]#
Move all components to the given device.
- property q: Distribution | ZukoUnconditionalFlow | TransformedZukoFlow | LearnableGaussian#
Returns the variational posterior.
- set_q(q, parameters=None, modules=None)[source]#
Defines the variational family.
You can specify over which parameters/modules we optimize. This is required for custom distributions which e.g. do not inherit nn.Modules or has the function parameters or modules to give direct access to trainable parameters. Further, you can pass a function, which constructs a variational distribution if called.
- Parameters:
q (Literal['maf', 'nsf', 'naf', 'unaf', 'nice', 'sospf', 'gf', 'gaussian', 'gaussian_diag'] | ~torch.distributions.distribution.Distribution | ~sbi.inference.posteriors.vi_posterior.VIPosterior | ~typing.Callable) –
Variational distribution, either string, distribution, or a VIPosterior object. This specifies a parametric class of distribution over which the best possible posterior approximation is searched. For string input, we support normalizing flows [maf, nsf, naf, unaf, nice, sospf] via Zuko, and simple Gaussian families [gaussian, gaussian_diag] via pure PyTorch. You can also specify your own variational family by passing a parameterized distribution object i.e. a torch.distributions Distribution with methods parameters returning an iterable of all parameters (you can pass them within the parameters/modules attribute). Additionally, we allow a Callable with signature (event_shape: torch.Size, link_transform: TorchTransform, device: str) -> Distribution, which builds a custom distribution. If q is already a VIPosterior, then the arguments will be copied from it (relevant for multi-round training).
Note: For 1D parameter spaces, autoregressive normalizing flows may be unstable. Consider using q=’gaussian’ or q=’gf’ for 1D.
parameters (Iterable | None) – List of parameters associated with the distribution object.
modules (Iterable | None) – List of modules associated with the distribution object.
- Return type:
None
- set_vi_method(method)[source]#
Sets variational inference method.
- Parameters:
method (str) – One of [rKL, fKL, IW, alpha].
- Returns:
VIPosterior for chainable calls.
- Return type:
- sample(sample_shape=(), x=None, show_progress_bars=True)[source]#
Draw samples from the variational posterior distribution \(p(\theta|x)\).
For single-x mode (trained via train()): samples from q(θ) trained on x_o. For amortized mode (trained via train_amortized()): samples from q(θ|x).
- Parameters:
sample_shape (Size | Tuple[int, ...]) – Desired shape of samples that are drawn from the posterior.
x (Tensor | None) – Conditioning observation. In single-x mode, must match trained x_o (or be None to use default). In amortized mode, required and can be any observation. For batched observations, shape should be (batch_size, x_dim).
show_progress_bars (bool) – Unused for VIPosterior since sampling from the variational distribution is fast. Included for API consistency.
- Returns:
Samples from posterior with shape (*sample_shape, θ_dim) for single x, or (*sample_shape, batch_size, θ_dim) for batched observations in amortized mode.
- Raises:
ValueError – If mode requirements are not met.
- Return type:
- sample_batched(sample_shape, x, max_sampling_batch_size=10000, show_progress_bars=True)[source]#
Sample from posterior for a batch of observations.
In amortized mode, this is efficient as all x values are processed in parallel through the conditional flow.
In single-x mode, this raises NotImplementedError since the unconditional flow is trained for a specific x_o.
- Parameters:
- Returns:
Samples of shape (*sample_shape, num_obs, θ_dim).
- Raises:
NotImplementedError – If called in single-x mode.
- Return type:
- log_prob(theta, x=None, track_gradients=False)[source]#
Returns the log-probability of theta under the variational posterior.
For single-x mode: returns log q(θ). For amortized mode: returns log q(θ|x).
- Parameters:
theta (Tensor) – Parameters to evaluate, shape (batch_theta, θ_dim).
x (Tensor | None) – Observation. In single-x mode, must match trained x_o (or be None). In amortized mode, required and can be any observation. For single x, shape (1, x_dim) or (x_dim,). For batched x, shape (batch_x, x_dim).
track_gradients (bool) – Whether the returned tensor supports tracking gradients. This can be helpful for e.g. sensitivity analysis but increases memory consumption.
- Returns:
batch_theta if x has batch size 1 (broadcast x)
batch_x if theta has batch size 1 (broadcast theta)
batch_theta if batch_theta == batch_x (paired evaluation)
- Return type:
Log-probability of shape (batch,) where batch is
- Raises:
ValueError – If mode requirements are not met or batch sizes incompatible.
- train(x=None, n_particles=256, learning_rate=0.001, gamma=0.999, max_num_iters=2000, min_num_iters=10, clip_value=10.0, warm_up_rounds=100, retrain_from_scratch=False, reset_optimizer=False, show_progress_bar=True, check_for_convergence=True, quality_control=True, quality_control_metric='psis', **kwargs)[source]#
This method trains the variational posterior for a single observation.
- Parameters:
x (Tensor | None) – The observation, optional, defaults to self._x.
n_particles (int) – Number of samples to approximate expectations within the variational bounds. The larger the more accurate are gradient estimates, but the computational cost per iteration increases.
learning_rate (float) – Learning rate of the optimizer.
gamma (float) – Learning rate decay per iteration. We use an exponential decay scheduler.
max_num_iters (int) – Maximum number of iterations.
min_num_iters (int) – Minimum number of iterations.
clip_value (float) – Gradient clipping value, decreasing may help if you see invalid values.
warm_up_rounds (int) – Initialize the posterior as the prior.
retrain_from_scratch (bool) – Retrain the variational distributions from scratch.
reset_optimizer (bool) – Reset the divergence optimizer
show_progress_bar (bool) – If any progress report should be displayed.
quality_control (bool) – If False quality control is skipped.
quality_control_metric (str) – Which metric to use for evaluating the quality.
kwargs –
Hyperparameters check corresponding DivergenceOptimizer for detail eps: Determines sensitivity of convergence check. retain_graph: Boolean which decides whether to retain the computation
graph. This may be required for some exotic user-specified q’s.
- optimizer: A PyTorch Optimizer class e.g. Adam or SGD. See
DivergenceOptimizer for details.
- scheduler: A PyTorch learning rate scheduler. See
DivergenceOptimizer for details.
alpha: Only used if vi_method=`alpha`. Determines the alpha divergence. K: Only used if vi_method=`IW`. Determines the number of importance
weighted particles.
- stick_the_landing: If one should use the STL estimator (only for rKL,
IW, alpha).
dreg: If one should use the DREG estimator (only for rKL, IW, alpha). weight_transform: Callable applied to importance weights (only for fKL)
check_for_convergence (bool)
- Returns:
VIPosterior (can be used to chain calls).
- Return type:
- Raises:
ValueError – If hyperparameters are invalid.
- train_amortized(theta, x, n_particles=128, learning_rate=0.001, gamma=0.999, max_num_iters=500, clip_value=5.0, batch_size=64, validation_fraction=0.1, validation_batch_size=None, validation_n_particles=None, stop_after_iters=20, show_progress_bar=True, retrain_from_scratch=False, flow_type=None, num_transforms=None, hidden_features=None, z_score_theta=None, z_score_x=None, params=None)[source]#
Train a conditional flow q(θ|x) for amortized variational inference.
This allows sampling from q(θ|x) for any observation x without retraining. Uses the ELBO (Evidence Lower Bound) objective with early stopping based on validation loss.
- Parameters:
theta (Tensor) – Training θ values from simulations (num_sims, θ_dim).
x (Tensor) – Training x values from simulations (num_sims, x_dim).
n_particles (int) – Number of samples to estimate ELBO per x.
learning_rate (float) – Learning rate for Adam optimizer.
gamma (float) – Learning rate decay per iteration.
max_num_iters (int) – Maximum training iterations.
clip_value (float) – Gradient clipping threshold.
batch_size (int) – Number of x values per training batch.
validation_fraction (float) – Fraction of data to use for validation.
validation_batch_size (int | None) – Batch size for validation loss. Defaults to batch_size.
validation_n_particles (int | None) – Number of particles for validation loss. Defaults to n_particles.
stop_after_iters (int) – Stop training after this many iterations without improvement in validation loss.
show_progress_bar (bool) – Whether to show progress.
retrain_from_scratch (bool) – If True, rebuild the flow from scratch.
flow_type (ZukoFlowType | str | None) – Flow architecture for the variational distribution. Use ZukoFlowType.NSF, ZukoFlowType.MAF, etc., or a string. If None, uses value from params or instance default.
num_transforms (int | None) – Number of transforms in the flow. If None, uses value from params or instance default.
hidden_features (int | None) – Hidden layer size in the flow. If None, uses value from params or instance default.
z_score_theta (Literal['none', 'independent', 'structured'] | None) – Method for z-scoring θ (the parameters being modeled). One of “none”, “independent”, “structured”. If None, uses value from params or instance default.
z_score_x (Literal['none', 'independent', 'structured'] | None) – Method for z-scoring x (the conditioning variable). One of “none”, “independent”, “structured”. Use “structured” for structured data like images with spatial correlations. If None, uses value from params or instance default.
params (VIPosteriorParameters | None) – Optional VIPosteriorParameters dataclass. Values are used as fallbacks when explicit arguments are None. Priority order: explicit args > params > instance attributes (from __init__).
- Returns:
self for method chaining.
- Return type:
- property default_x: Tensor | None#
Return default x used by .sample(), .log_prob as conditioning context.
- potential(theta, x=None, track_gradients=False)#
Evaluates \(\theta\) under the potential that is used to sample the posterior.
The potential is the unnormalized log-probability of \(\theta\) under the posterior.
- set_default_x(x)#
Set new default x for .sample(), .log_prob to use as conditioning context.
Reset the MAP stored for the old default x if applicable.
This is a pure convenience to avoid having to repeatedly specify x in calls to .sample() and .log_prob() - only $ heta$ needs to be passed.
This convenience is particularly useful when the posterior is focused, i.e. has been trained over multiple rounds to be accurate in the vicinity of a particular x=x_o (you can check if your posterior object is focused by printing it).
NOTE: this method is chainable, i.e. will return the NeuralPosterior object so that calls like posterior.set_default_x(my_x).sample(mytheta) are possible.
- Parameters:
x (Tensor) – The default observation to set for the posterior \(p( heta|x)\).
- Returns:
NeuralPosterior that will use a default x when not explicitly passed.
- Return type:
NeuralPosterior
- evaluate(quality_control_metric='psis', N=50000)[source]#
This function will evaluate the quality of the variational posterior distribution. We currently support two different metrics of type psis, which checks the quality based on the tails of importance weights (there should not be much with a large one), or prop which checks the proportionality between q and potential_fn.
NOTE: In our experience prop is sensitive to distinguish
goodfromokwhereas psis is more sensitive in distinguishing very bad from ok.
- map(x=None, num_iter=1000, num_to_optimize=100, learning_rate=0.01, init_method='proposal', num_init_samples=10000, save_best_every=10, show_progress_bars=False, force_update=False)[source]#
Returns the maximum-a-posteriori estimate (MAP).
The method can be interrupted (Ctrl-C) when the user sees that the log-probability converges. The best estimate will be saved in self._map and can be accessed with self.map(). The MAP is obtained by running gradient ascent from a given number of starting positions (samples from the posterior with the highest log-probability). After the optimization is done, we select the parameter set that has the highest log-probability after the optimization.
Warning: The default values used by this function are not well-tested. They might require hand-tuning for the problem at hand.
For developers: if the prior is a BoxUniform, we carry out the optimization in unbounded space and transform the result back into bounded space.
- Parameters:
x (Tensor | None) – Deprecated - use .set_default_x() prior to .map().
num_iter (int) – Number of optimization steps that the algorithm takes to find the MAP.
learning_rate (float) – Learning rate of the optimizer.
init_method (str | Tensor) – How to select the starting parameters for the optimization. If it is a string, it can be either [posterior, prior], which samples the respective distribution num_init_samples times. If it is a tensor, the tensor will be used as init locations.
num_init_samples (int) – Draw this number of samples from the posterior and evaluate the log-probability of all of them.
num_to_optimize (int) – From the drawn num_init_samples, use the num_to_optimize with highest log-probability as the initial points for the optimization.
save_best_every (int) – The best log-probability is computed, saved in the map-attribute, and printed every save_best_every-th iteration. Computing the best log-probability creates a significant overhead (thus, the default is 10.)
show_progress_bars (bool) – Whether to show a progressbar during sampling from the posterior.
force_update (bool) – Whether to re-calculate the MAP when x is unchanged and have a cached value.
log_prob_kwargs – Will be empty for SNLE and SNRE. Will contain {‘norm_posterior’: True} for SNPE.
- Returns:
The MAP estimate.
- Return type: