    \caption{Comparison in DeepSea environment}\label{fig:deepsea_best_run_8}
  \hspace*{\fill}   % maximize separation between the subfigures
    \caption{Comparison of LSVI-PHE for different $M$ values in DeepSea.}\label{fig:deepsea_sweep_M_sig0.0005}
(a) The results are averaged over 5 independent runs and error bars are reported for the return per episode plots. For this plot, $\beta = 5 \times 10^{-3}$ for LSVI-UCB and $\sigma^2 = 5 \times 10^{-5}$ for LSVI-PHE. (b) The results are averaged over 5 runs and error bars are reported for the return per episode plots. For this plot we fix $\sigma^2 = 5 \times 10^{-4}$.





