How to use GPUs

Navigation

How to use GPUs#

sbi supports GPU training. GPU will speed up training if you use a very large batch-size, if you use a large embedding network, or if your simulation outputs are high-dimensional. This guide shows you how to train and perform inferencce on GPU.

Main syntax#

inference = NPE(prior, device="cuda", density_estimator="maf")
density_estimator = inference.append_simulations(theta, x, data_device="cpu").train()

More explanation#

When creating the inference object, you can pass the device as an argument. This will be the device that the neural network lies on, and thus also the device that it is trained on. It is also the device that the returned density_estimator is on.

Often, you do not want to have your entire simulated data on GPU, but instead only transfer individual batches to GPU. To do this, pass .append_simulations(..., data_device="cpu").

Note that the prior must be on the training device already, e.g., when passing device="cuda:0", make sure to pass a prior object that was created on that device, e.g.,

prior = torch.distributions.MultivariateNormal(
    loc=torch.zeros(2, device="cuda:0"),
    covariance_matrix=torch.eye(2, device="cuda:0")
)

Supported devices#

The device is set to "cpu" by default. But it can be set to anything, as long as it maps to an existing PyTorch GPU device, e.g., device="cuda" or device="cuda:2". sbi will take care of copying the net and the training data to and from the device. We also support MPS as a GPU device for GPU-accelarated training on an Apple Silicon chip, e.g., it is possible to pass device="mps".

Performance#

Whether or not you reduce your training time when training on a GPU depends on the problem at hand. We provide a couple of default density estimators for NPE, NLE and NRE, e.g., a mixture density network (density_estimator="mdn") or a Masked Autoregressive Flow (density_estimator="maf"). For these default density estimators, we do not expect a speed-up. This is because the underlying neural networks are relatively shallow and not tall, e.g., they do not have many parameters or matrix operations that benefit from being executed on the GPU.

A speed-up through training on the GPU will most likely become visible when using convolutional modules in your neural networks. E.g., when passing an embedding net for image processing like in this example.