How to tune hyperparameters with Optuna#
This guide shows a minimal optuna (documentation) loop for hyperparameter
tuning in sbi. Optuna is a lightweight hyperparameter optimization library. You define
an objective function that trains a model (e.g., NPE) and returns a validation metric,
and Optuna runs multiple trials to explore the search space and track the best
configuration. As validation metric, we recommend using the negative log probability of
a held-out validation set (theta, x) under the current posterior estimate (see
Lueckmann et al. 2021 for details).
Note that Optuna is not a dependency of sbi, you need to install it yourself in your
environment.
Here, we use a toy simulator and do NPE with an embedding network built using the posterior_nn helper. We tune just two hyperparameters: the embedding dimension and the number of flow transforms in an nsf density estimator.
Setup a tiny simulation task#
import optuna
import torch
from sbi.inference import NPE
from sbi.neural_nets import posterior_nn
from sbi.neural_nets.embedding_nets import FCEmbedding
from sbi.utils import BoxUniform
torch.manual_seed(0)
def simulator(theta):
return theta + 0.1 * torch.randn_like(theta)
prior = BoxUniform(low=-2 * torch.ones(2), high=2 * torch.ones(2))
theta = prior.sample((6000,))
x = simulator(theta)
# Use a separate validation data set for optuna
theta_train, x_train = theta[:5000], x[:5000]
theta_val, x_val = theta[5000:], x[5000:]
Define the Optuna objective#
Optuna expects the objective function to return a scalar value that it will optimize. When creating a study, you specify the optimization direction: direction="minimize" to find the configuration with the lowest objective value, or direction="maximize" for the highest. Here, we minimize the negative log probability (NLL) on a held-out validation set, so lower is better.
def objective(trial):
# Optuna will track these parameters internally.
embedding_dim = trial.suggest_categorical("embedding_dim", [16, 32, 64])
num_transforms = trial.suggest_int("num_transforms", 2, 6)
embedding_net = FCEmbedding(input_dim=x_train.shape[1], output_dim=embedding_dim)
density_estimator = posterior_nn(
model="nsf",
embedding_net=embedding_net,
num_transforms=num_transforms,
)
inference = NPE(prior=prior, density_estimator=density_estimator)
inference.append_simulations(theta_train, x_train)
estimator = inference.train(
training_batch_size=128,
show_train_summary=False,
)
posterior = inference.build_posterior(estimator)
with torch.no_grad():
nll = -posterior.log_prob_batched(theta_val.unsqueeze(0), x=x_val).mean().item()
# Return the metric to be optimized by Optuna.
return nll
Run the study and retrain#
Optuna defaults to the TPE (Tree-structured Parzen Estimator) sampler, which is a good starting point for many experiments. TPE is a Bayesian optimization method that
models good vs. bad trials with nonparametric densities and samples new points
that are likely to improve the objective. You can swap in other samplers (random
search, Gaussian Process-based, etc.) by passing a different sampler instance to create_study.
The TPE sampler uses n_startup_trials random trials to seed the model. With
n_trials=25 and n_startup_trials=10, the first 10 trials are random and the
remaining 15 are guided by the acquisition function. If you want to ensure to start at
the default configuration, enqueue it before optimization.
sampler = optuna.samplers.TPESampler(n_startup_trials=10)
study = optuna.create_study(direction="minimize", sampler=sampler)
# Optional: ensure the default config is evaluated
study.enqueue_trial({"embedding_dim": 32, "num_transforms": 4})
# This will run the above NPE training up to 25 times
study.optimize(objective, n_trials=25)
best_params = study.best_params
embedding_net = FCEmbedding(
input_dim=x_train.shape[1],
output_dim=best_params["embedding_dim"],
)
density_estimator = posterior_nn(
model="nsf",
embedding_net=embedding_net,
num_transforms=best_params["num_transforms"],
)
inference = NPE(prior=prior, density_estimator=density_estimator)
inference.append_simulations(theta, x)
final_estimator = inference.train(training_batch_size=128)
posterior = inference.build_posterior(final_estimator)