How to use GPUs#
sbi supports GPU training. GPU will speed up training if you use a very large
batch-size, if you use a large embedding network, or if your simulation outputs are
high-dimensional. This guide shows you how to train and perform inferencce on GPU.
Main syntax#
inference = NPE(prior, device="cuda", density_estimator="maf")
density_estimator = inference.append_simulations(theta, x, data_device="cpu").train()
More explanation#
When creating the inference object, you can pass the device as an argument. This
will be the device that the neural network lies on, and thus also the device that
it is trained on. It is also the device that the returned density_estimator is on.
Often, you do not want to have your entire simulated data on GPU, but instead only
transfer individual batches to GPU. To do this, pass .append_simulations(..., data_device="cpu").
Note that the prior must be on the training device already, e.g., when passing
device="cuda:0", make sure to pass a prior object that was created on that
device, e.g.,
prior = torch.distributions.MultivariateNormal(
loc=torch.zeros(2, device="cuda:0"),
covariance_matrix=torch.eye(2, device="cuda:0")
)
Supported devices#
The device is set to "cpu" by default. But it can be set to anything, as long
as it maps to an existing PyTorch GPU device, e.g., device="cuda" or
device="cuda:2". sbi will take care of copying the net and the training
data to and from the device.
We also support MPS as a GPU device for GPU-accelarated training on an Apple
Silicon chip, e.g., it is possible to pass device="mps".
Performance#
Whether or not you reduce your training time when training on a GPU depends on
the problem at hand. We provide a couple of default density estimators for
NPE, NLE and NRE, e.g., a mixture density network
(density_estimator="mdn") or a Masked Autoregressive Flow
(density_estimator="maf"). For these default density estimators, we do not
expect a speed-up. This is because the underlying neural networks are relatively
shallow and not tall, e.g., they do not have many parameters or matrix
operations that benefit from being executed on the GPU.
A speed-up through training on the GPU will most likely become visible when using convolutional modules in your neural networks. E.g., when passing an embedding net for image processing like in this example.