snbvbs/horseshoe.md at master · marinavannucci/snbvbs · GitHub

Horseshoe Shrinkage MCMC

We would like to see whether adaptive shrinkage prior such as Horseshoe has better performance in variable selection under the negative binomial model. Recall that the hierarchical model with the horseshoe prior on $\boldsymbol{\beta}$ is

$\begin{align} \left[\beta_{k}\mid\lambda_{k}\right] & \overset{\text{indep}}{\sim}\text{Normal}\left(0,\lambda_{k}^{2}\right),\\ \left[\lambda_{k}\mid A\right] & \overset{\text{iid}}{\sim}C^{+}\left(0,A\right),\\ A & \sim\text{Uniform}\left(0,10\right), \end{align}$

where

is adaptively shrunken by

whose distribution is an independent identical half-Cauchy distribution (iid) and

controls the global level of shrinkage. We can rewrite the Horseshoe prior using the following parameter expansion:

$\begin{align*} \left[\beta_{k}\mid\eta_{k}\right] & \overset{\text{indep}}{\sim}\text{Normal}\left(0,\eta_{k}^{-1}\right),\\ \left[\eta_{k}\mid\gamma_{k}\right] & \overset{\text{indep}}{\sim}\text{Gamma}\left(\frac{1}{2},\gamma_{k}\right),\\ \left[\gamma_{k}\mid\tau_{A}\right] & \overset{\text{indep}}{\sim}\text{Gamma}\left(\frac{1}{2},\tau_{A}\right),\\ \tau_{A} &\sim\text{Gamma}_{\left[0.01,\infty\right)}\left(-\frac{1}{2},0\right). \end{align*}$

where we note that $\eta_{k}=\lambda_{k}^{-2}$ , $\tau_{A}=A^{-2}$ and $\tau_{A}$ follows a truncated Gamma distribution. The posterior distribution for all parameters of interests is as follows where we again eliminate the additive effect of the intercept by integrating it out of the posterior distribution

$p\left(\boldsymbol{\beta},\beta_{0},\boldsymbol{\eta},\boldsymbol{\gamma},\tau_{A}\mid\boldsymbol{X},\boldsymbol{y},r,\boldsymbol{\omega}\right)\propto p\left(\boldsymbol{y}\mid\boldsymbol{\psi},\boldsymbol{\omega},r\right)p\left(\boldsymbol{\beta}\mid\boldsymbol{\eta}\right)p\left(\boldsymbol{\eta}\mid\boldsymbol{\gamma}\right)p\left(\boldsymbol{\gamma}\mid\tau_{A}\right)p\left(\tau_{A}\right)p\left(\beta_{0}\right),$

$\int p\left(\boldsymbol{y}\mid\boldsymbol{\psi},\boldsymbol{\omega},r\right)p\left(\beta_{0}\right)d\beta_{0}\propto\frac{1}{\sqrt{\bar{\boldsymbol{\omega}}}}\exp\left(\frac{\bar{\boldsymbol{\kappa}}^{2}}{2\bar{\boldsymbol{\omega}}}\right)\exp\left(\hat{\boldsymbol{\kappa}}^{T}\boldsymbol{X}\boldsymbol{\beta}-\frac{1}{2}\boldsymbol{\beta}^{T}\boldsymbol{X}^{T}\hat{\boldsymbol{\Omega}}\boldsymbol{X}\boldsymbol{\beta}\right),$

where $\bar{\boldsymbol{\kappa}}=\sum_{i}\kappa_{i}$ , $\bar{\boldsymbol{\omega}}=\sum_{i}\omega_{i}$ and $\hat{\boldsymbol{\Omega}}$ is the normalizing kernel and $\hat{\boldsymbol{\kappa}}$ is normalized version of $\boldsymbol{\kappa}$ .

Full conditional distributions

The posterior of $\boldsymbol{\beta}$ follows a -dimensional multivariate distribution

$\begin{align*} \left[\boldsymbol{\beta}\mid\cdots\right] & \sim\text{MVN}\left(\boldsymbol{Q}_{\beta}^{-1}\boldsymbol{\ell}_{\beta},\boldsymbol{Q}_{\beta}^{-1}\right),\\ \boldsymbol{Q}_{\beta} & =\boldsymbol{X}^{T}\hat{\boldsymbol{\Omega}}\boldsymbol{X}+\boldsymbol{H},\\ \boldsymbol{\ell}_{\beta} & =\boldsymbol{X}^{T}\hat{\boldsymbol{\kappa}}, \end{align*}$

given

$\boldsymbol{H}=\diag\left(\left[\eta_{1},\eta_{2},\cdots,\eta_{p}\right]\right)$

.

The posterior of $\eta_{k}$ follows a Gamma distribution

$\left[\eta_{k}\mid\cdots\right]\sim\text{Gamma}\left(1,\frac{1}{2}\beta_{k}^{2}+\gamma_{k}\right).$

The posterior of $\gamma_{k}$ follows a Gamma distribution

$\left[\gamma_{k}\mid\cdots\right]\sim\text{Gamma}\left(1,\eta_{k}+\tau_{A}\right).$

The posterior of

follows a truncated Gamma distribution

$\left[\tau_{A}\mid\cdots\right]\sim\text{Gamma}_{\left[0.01,\infty\right)}\left(\frac{p}{2},\sum_{j=1}^{p}\gamma_{j}\right).$

The condition distributions for overdispersion parameter

,

and for each

are the same as the paper.