How to tune hyperparameters with Optuna

Navigation

How to tune hyperparameters with Optuna#

This guide shows a minimal optuna (documentation) loop for hyperparameter tuning in sbi. Optuna is a lightweight hyperparameter optimization library. You define an objective function that trains a model (e.g., NPE) and returns a validation metric, and Optuna runs multiple trials to explore the search space and track the best configuration. As validation metric, we recommend using the negative log probability of a held-out validation set (theta, x) under the current posterior estimate (see Lueckmann et al. 2021 for details).

Note that Optuna is not a dependency of sbi, you need to install it yourself in your environment.

Here, we use a toy simulator and do NPE with an embedding network built using the posterior_nn helper. We tune just two hyperparameters: the embedding dimension and the number of flow transforms in an nsf density estimator.

Setup a tiny simulation task#

import optuna
import torch

from sbi.inference import NPE
from sbi.neural_nets import posterior_nn
from sbi.neural_nets.embedding_nets import FCEmbedding
from sbi.utils import BoxUniform

torch.manual_seed(0)


def simulator(theta):
    return theta + 0.1 * torch.randn_like(theta)


prior = BoxUniform(low=-2 * torch.ones(2), high=2 * torch.ones(2))

theta = prior.sample((6000,))
x = simulator(theta)
# Use a separate validation data set for optuna
theta_train, x_train = theta[:5000], x[:5000]
theta_val, x_val = theta[5000:], x[5000:]

Define the Optuna objective#

Optuna expects the objective function to return a scalar value that it will optimize. When creating a study, you specify the optimization direction: direction="minimize" to find the configuration with the lowest objective value, or direction="maximize" for the highest. Here, we minimize the negative log probability (NLL) on a held-out validation set, so lower is better.

def objective(trial):
    # Optuna will track these parameters internally.
    embedding_dim = trial.suggest_categorical("embedding_dim", [16, 32, 64])
    num_transforms = trial.suggest_int("num_transforms", 2, 6)

    embedding_net = FCEmbedding(input_dim=x_train.shape[1], output_dim=embedding_dim)
    density_estimator = posterior_nn(
        model="nsf",
        embedding_net=embedding_net,
        num_transforms=num_transforms,
    )

    inference = NPE(prior=prior, density_estimator=density_estimator)
    inference.append_simulations(theta_train, x_train)
    estimator = inference.train(
        training_batch_size=128,
        show_train_summary=False,
    )
    posterior = inference.build_posterior(estimator)

    with torch.no_grad():
        nll = -posterior.log_prob_batched(theta_val.unsqueeze(0), x=x_val).mean().item()
    # Return the metric to be optimized by Optuna.
    return nll

Run the study and retrain#

Optuna defaults to the TPE (Tree-structured Parzen Estimator) sampler, which is a good starting point for many experiments. TPE is a Bayesian optimization method that models good vs. bad trials with nonparametric densities and samples new points that are likely to improve the objective. You can swap in other samplers (random search, Gaussian Process-based, etc.) by passing a different sampler instance to create_study.

The TPE sampler uses n_startup_trials random trials to seed the model. With n_trials=25 and n_startup_trials=10, the first 10 trials are random and the remaining 15 are guided by the acquisition function. If you want to ensure to start at the default configuration, enqueue it before optimization.

sampler = optuna.samplers.TPESampler(n_startup_trials=10)
study = optuna.create_study(direction="minimize", sampler=sampler)
# Optional: ensure the default config is evaluated
study.enqueue_trial({"embedding_dim": 32, "num_transforms": 4})
# This will run the above NPE training up to 25 times
study.optimize(objective, n_trials=25)

best_params = study.best_params
embedding_net = FCEmbedding(
    input_dim=x_train.shape[1],
    output_dim=best_params["embedding_dim"],
)
density_estimator = posterior_nn(
    model="nsf",
    embedding_net=embedding_net,
    num_transforms=best_params["num_transforms"],
)

inference = NPE(prior=prior, density_estimator=density_estimator)
inference.append_simulations(theta, x)
final_estimator = inference.train(training_batch_size=128)
posterior = inference.build_posterior(final_estimator)