How to choose a diagnostic tool

How to choose a diagnostic tool#

sbi implements a range of diagnostic tools. Here, we provide an overview of what they do:

expected coverage: allows you to evaluate whether the joint posterior is over- or under-confident (on average across prior predictives). Unlike SBC (see below), it can also identify issues in posterior correlation. Requires relatively few additional simulations (~300).
simulation-based calibration (SBC): allows you to evaluate, for every parameter, whether it is too narrow, too wide, or skewed (on average across prior predictives). Requires relatively few additional simulations (~300).
TARP: can provide a sufficient condition for the posterior to be correct, but requires tuning additional hyperparameters (which can make the diagnostic tool less powerful if chosen poorly)
L-C2ST: can provide a sufficient condition for the posterior for a specific observation to be correct. However, it requires training an additional neural network, and poor convergence of that neural network can make the diagnostic less powerful. Requires many additional simulations (>1k) to train the additional neural network.
model misspecification checks: These diagnostic tools are different from all other methods: they do not evaluate the quality of the posterior. Instead, they evaluate whether the observation can be generated by the simulator. If these checks fail, then any sbi method will likely perform poorly.

All of these checks can provide a different angle on the correctness of the posterior, but they all require additional simulations. As such, we recommend running as many diagnostic tools as possible with your simulation budget.