Skip to content

Latest commit

 

History

History
35 lines (21 loc) · 7.81 KB

horseshoe.md

File metadata and controls

35 lines (21 loc) · 7.81 KB

Horseshoe Shrinkage MCMC

We would like to see whether adaptive shrinkage prior such as Horseshoe has better performance in variable selection under the negative binomial model. Recall that the hierarchical model with the horseshoe prior on $\boldsymbol{\beta}$ is

$$
\begin{align}
\left[\beta_{k}\mid\lambda_{k}\right] & \overset{\text{indep}}{\sim}\text{Normal}\left(0,\lambda_{k}^{2}\right),\\
\left[\lambda_{k}\mid A\right] & \overset{\text{iid}}{\sim}C^{+}\left(0,A\right),\\
A & \sim\text{Uniform}\left(0,10\right),
\end{align}
$$

where

$\beta_{k}$

is adaptively shrunken by

$\lambda_{k}$

whose distribution is an independent identical half-Cauchy distribution (iid) and

$A$

controls the global level of shrinkage. We can rewrite the Horseshoe prior using the following parameter expansion:

$$
\begin{align*}
\left[\beta_{k}\mid\eta_{k}\right] & \overset{\text{indep}}{\sim}\text{Normal}\left(0,\eta_{k}^{-1}\right),\\
\left[\eta_{k}\mid\gamma_{k}\right] & \overset{\text{indep}}{\sim}\text{Gamma}\left(\frac{1}{2},\gamma_{k}\right),\\
\left[\gamma_{k}\mid\tau_{A}\right] & \overset{\text{indep}}{\sim}\text{Gamma}\left(\frac{1}{2},\tau_{A}\right),\\
\tau_{A} &\sim\text{Gamma}_{\left[0.01,\infty\right)}\left(-\frac{1}{2},0\right).
\end{align*}
$$

where we note that $\eta_{k}=\lambda_{k}^{-2}$, $\tau_{A}=A^{-2}$ and $\tau_{A}$ follows a truncated Gamma distribution. The posterior distribution for all parameters of interests is as follows where we again eliminate the additive effect of the intercept by integrating it out of the posterior distribution

$$
p\left(\boldsymbol{\beta},\beta_{0},\boldsymbol{\eta},\boldsymbol{\gamma},\tau_{A}\mid\boldsymbol{X},\boldsymbol{y},r,\boldsymbol{\omega}\right)\propto p\left(\boldsymbol{y}\mid\boldsymbol{\psi},\boldsymbol{\omega},r\right)p\left(\boldsymbol{\beta}\mid\boldsymbol{\eta}\right)p\left(\boldsymbol{\eta}\mid\boldsymbol{\gamma}\right)p\left(\boldsymbol{\gamma}\mid\tau_{A}\right)p\left(\tau_{A}\right)p\left(\beta_{0}\right),
$$

$$
\int p\left(\boldsymbol{y}\mid\boldsymbol{\psi},\boldsymbol{\omega},r\right)p\left(\beta_{0}\right)d\beta_{0}\propto\frac{1}{\sqrt{\bar{\boldsymbol{\omega}}}}\exp\left(\frac{\bar{\boldsymbol{\kappa}}^{2}}{2\bar{\boldsymbol{\omega}}}\right)\exp\left(\hat{\boldsymbol{\kappa}}^{T}\boldsymbol{X}\boldsymbol{\beta}-\frac{1}{2}\boldsymbol{\beta}^{T}\boldsymbol{X}^{T}\hat{\boldsymbol{\Omega}}\boldsymbol{X}\boldsymbol{\beta}\right),
$$

where $\bar{\boldsymbol{\kappa}}=\sum_{i}\kappa_{i}$, $\bar{\boldsymbol{\omega}}=\sum_{i}\omega_{i}$ and $\hat{\boldsymbol{\Omega}}$ is the normalizing kernel and $\hat{\boldsymbol{\kappa}}$ is normalized version of $\boldsymbol{\kappa}$.

Full conditional distributions

The posterior of $\boldsymbol{\beta}$ follows a $p$-dimensional multivariate distribution

$$
\begin{align*}
\left[\boldsymbol{\beta}\mid\cdots\right] & \sim\text{MVN}\left(\boldsymbol{Q}_{\beta}^{-1}\boldsymbol{\ell}_{\beta},\boldsymbol{Q}_{\beta}^{-1}\right),\\
\boldsymbol{Q}_{\beta} & =\boldsymbol{X}^{T}\hat{\boldsymbol{\Omega}}\boldsymbol{X}+\boldsymbol{H},\\
\boldsymbol{\ell}_{\beta} & =\boldsymbol{X}^{T}\hat{\boldsymbol{\kappa}},
\end{align*}
$$

given

$\boldsymbol{H}=\diag\left(\left[\eta_{1},\eta_{2},\cdots,\eta_{p}\right]\right)$

.

The posterior of $\eta_{k}$ follows a Gamma distribution

$$
\left[\eta_{k}\mid\cdots\right]\sim\text{Gamma}\left(1,\frac{1}{2}\beta_{k}^{2}+\gamma_{k}\right).
$$

The posterior of $\gamma_{k}$ follows a Gamma distribution

$$
\left[\gamma_{k}\mid\cdots\right]\sim\text{Gamma}\left(1,\eta_{k}+\tau_{A}\right).
$$

The posterior of

$\tau_{A}$

follows a truncated Gamma distribution

$$
\left[\tau_{A}\mid\cdots\right]\sim\text{Gamma}_{\left[0.01,\infty\right)}\left(\frac{p}{2},\sum_{j=1}^{p}\gamma_{j}\right).
$$

The condition distributions for overdispersion parameter

$r$

,

$\beta_{0}$

and for each

$\omega_{i}$

are the same as the paper.