Skip to content

Commit

Permalink
update and simplify introduction
Browse files Browse the repository at this point in the history
  • Loading branch information
drbenvincent committed Aug 31, 2024
1 parent 2184277 commit c176b41
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 23 deletions.
19 changes: 5 additions & 14 deletions examples/generalized_linear_models/GLM-simpsons-paradox.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -18,20 +18,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook covers:\n",
"- [Simpson's Paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox) and its resolution through mixed or hierarchical models. This is a situation where there might be a negative relationship between two variables within a group, but when data from multiple groups are combined, that relationship may disappear or even reverse sign. The gif below (from the [Simpson's Paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox) Wikipedia page) demonstrates this very nicely.\n",
"- How to build linear regression models, starting with linear regression, moving up to hierarchical linear regression. Simpon's paradox is a nice motivation for why we might want to do this - but of course we should aim to build models which incorporate as much as our knowledge about the structure of the data (e.g. it's nested nature) as possible.\n",
"- Use of `pm.Data` containers to facilitate posterior prediction at different $x$ values with the same model.\n",
"- Providing array dimensions (see `coords`) to models to help with shape problems. This involves the use of [xarray](http://xarray.pydata.org/) and is quite helpful in multi-level / hierarchical models.\n",
"- Differences between posteriors and posterior predictive distributions.\n",
"- How to visualise models in data space and parameter space, using a mixture of [ArviZ](https://arviz-devs.github.io/arviz/) and [matplotlib](https://matplotlib.org/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](https://upload.wikimedia.org/wikipedia/commons/f/fb/Simpsons_paradox_-_animation.gif)"
"[Simpson's Paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox) describes a situation where there might be a negative relationship between two variables within a group, but when data from multiple groups are combined, that relationship may disappear or even reverse sign. The gif below (from the [Simpson's Paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox) Wikipedia page) demonstrates this very nicely.\n",
"\n",
"![](https://upload.wikimedia.org/wikipedia/commons/f/fb/Simpsons_paradox_-_animation.gif)\n",
"\n",
"This paradox can be resolved by assuming a causal DAG which includes how the main predictor variable _and_ group membership influence the outcome variable. We demonstrate an example where we _don't_ incorporate group membership (so our causal DAG is wrong, or in other words out model is misspecified). We then show 2 wayes to resolve this by including group membership as causal influence upon the outcome variable. This is shown in an unpooled model (which we could also call a fixed effects model) and a hierarchical model (which we could also call a mixed effects model)."
]
},
{
Expand Down
12 changes: 3 additions & 9 deletions examples/generalized_linear_models/GLM-simpsons-paradox.myst.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,12 @@ kernelspec:

+++

This notebook covers:
- [Simpson's Paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox) and its resolution through mixed or hierarchical models. This is a situation where there might be a negative relationship between two variables within a group, but when data from multiple groups are combined, that relationship may disappear or even reverse sign. The gif below (from the [Simpson's Paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox) Wikipedia page) demonstrates this very nicely.
- How to build linear regression models, starting with linear regression, moving up to hierarchical linear regression. Simpon's paradox is a nice motivation for why we might want to do this - but of course we should aim to build models which incorporate as much as our knowledge about the structure of the data (e.g. it's nested nature) as possible.
- Use of `pm.Data` containers to facilitate posterior prediction at different $x$ values with the same model.
- Providing array dimensions (see `coords`) to models to help with shape problems. This involves the use of [xarray](http://xarray.pydata.org/) and is quite helpful in multi-level / hierarchical models.
- Differences between posteriors and posterior predictive distributions.
- How to visualise models in data space and parameter space, using a mixture of [ArviZ](https://arviz-devs.github.io/arviz/) and [matplotlib](https://matplotlib.org/).

+++
[Simpson's Paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox) describes a situation where there might be a negative relationship between two variables within a group, but when data from multiple groups are combined, that relationship may disappear or even reverse sign. The gif below (from the [Simpson's Paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox) Wikipedia page) demonstrates this very nicely.

![](https://upload.wikimedia.org/wikipedia/commons/f/fb/Simpsons_paradox_-_animation.gif)

This paradox can be resolved by assuming a causal DAG which includes how the main predictor variable _and_ group membership influence the outcome variable. We demonstrate an example where we _don't_ incorporate group membership (so our causal DAG is wrong, or in other words out model is misspecified). We then show 2 wayes to resolve this by including group membership as causal influence upon the outcome variable. This is shown in an unpooled model (which we could also call a fixed effects model) and a hierarchical model (which we could also call a mixed effects model).

```{code-cell} ipython3
import arviz as az
import graphviz as gr
Expand Down

0 comments on commit c176b41

Please sign in to comment.