01-load-data.Rmd

---
title: "Context and Data Description"
output: html_document
bibliography: bibliography.bib
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE, eval = FALSE)
```


## Context

We will be using the data from the following study [@Dobert_Logging_2017]:

> Döbert, T.F., Webber, B.L., Sugau, J.B., Dickinson, K.J.M. and Didham, R.K. (2017), Logging increases the functional and phylogenetic dispersion of understorey plant communities in tropical lowland rain forest. J Ecol, 105: 1235-1245. https://doi.org/10.1111/1365-2745.12794

Logging is the major cause of forest degradation in the Tropics. The effect of logging on taxonomic diversity is well known but more rarely studied on other facets of biodiversity such as functional diversity and phylogenetic diversity. Functional diversity and phylogenetic diversity should better reflect the impact of logging on ecosystem. For example logging can decrease the functional "redundancy" observed in ecosystems, meaning that some functional traits could be lost.

The tropical lowland rain forests on the island of Borneo are floristically among the most diverse systems on the planet, yet large-scale timber extraction and conversion to commercial tree plantations continue to drive their rapid degradation and loss. As is the case for the majority of tropical forests, the effects of logging on habitat quality in these forests have rarely been assessed, despite the critical implications for biodiversity conservation. Moreover, studies investigating the effects of logging on plant community dynamics across both tropical and temperate forest ecosystems have rarely focused on the understorey, despite its crucial relevance for successional trajectories.

Our goal with this tutorial is to reproduce the analyses from the paper and analyze how logging impacts the different facets of the diversity of the understorey vegetation, and to reveal to what extent similar facets give similar answers. The general goal is to familiarize yourself with the data and functions needed to compute diversity indices. As well the general principles behind them.


## Associated slides

This practical session comes with some slides that cover the general context of
the study as well as some basic facts regarding functional diversity indices.


```{r download-slides, eval = TRUE}
downloadthis::download_link(
  link = "https://github.com/Rekyt/biodiversity_facets_tutorial/raw/main/biodiversity_facets_presentation.odp",
  button_label = "Download Context Slides",
  button_type = "danger",
  has_icon = TRUE,
  icon = "fa fa-save"
)
```


## Loading the data

Fortunately for us, the authors of the study have shared openly the data they used in their article [@Dobert_Data_2018]. They are available through the Dryad platform at the following link: https://doi.org/10.5061/dryad.f77p7

> Döbert, Timm F. et al. (2018), Data from: Logging increases the functional and phylogenetic dispersion of understorey plant communities in tropical lowland rainforest, Dryad, Dataset, https://doi.org/10.5061/dryad.f77p7

The fact that these data researchers provided the full dataset including all data and meta-data will help us reproduce the exact same analyses as well as additional analyses not in their paper.

### Getting the data

To get the data you can follow the above-mentioned link https://doi.org/10.5061/dryad.f77p7 and click on the "Download Dataset" button available on the top right of the webpage. It will download a .zip file that you can unzip in the folder you created for the project. This will create a folder named `doi_10.5061_dryad.f77p7__v1` that contains all needed data files.

```{r download-data, echo = FALSE, eval = TRUE}
downloadthis::download_link(
  link = "https://datadryad.org/stash/downloads/download_resource/5179",
  button_label = "Download Original Data Files",
  button_type = "danger",
  has_icon = TRUE,
  icon = "fa fa-save",
  self_contained = FALSE
)
```


### Summarizing the data

The zip file contains 4 files (also available in the `data/doi_10.5061_dryad.f77p7__v1/` folder):

* `README.txt` which is a text file that describes the content of the other files with great precision. It details all the columns available in the other files.
* `PlotData.csv` is a comma-separated file that describes characteristics for each of the sampled vegetation plots including logging metrics, environmental variables as well as taxonomic, functional and phylogenetic diversity indices (to which we'll compare the indices we compute ourselves).
* `PlotSpeciesData.csv` is a comma-separated file that contains a matrix of biomass values for the plant taxa sampled across the sampled vegetation plots.
* `SpeciesTraitData.csv` contains the complete list of species  sampled across all vegetation plots, with their associated traits both continuous and discrete.

We will load each of the file (apart from the phylogenetic tree) in your workspace now with the `read.csv()` function:

```{r loading-data}
plot_data         = read.csv("data/doi_10.5061_dryad.f77p7__v1/PlotData.csv",
                             na.strings = c("NA", "na"),
                             stringsAsFactors = TRUE)
plot_species_data = read.csv("data/doi_10.5061_dryad.f77p7__v1/PlotSpeciesData.csv")
species_traits    = read.csv("data/doi_10.5061_dryad.f77p7__v1/SpeciesTraitData.csv",
                             na.strings = c("NA", "na"), stringsAsFactors = TRUE)
```

To describe the data we will use the `str()`, `summary()`, and `dim()` functions.

```{r str-summary-data}
str(plot_data)
summary(plot_data)

str(plot_species_data[, 1:5])
summary(head(plot_species_data)[,1:5])
dim(plot_species_data)

# Transform one column for further analyses
species_traits$seed = ordered(species_traits$seed)
str(species_traits)
summary(species_traits)
```

:::: {.questions}
#### Questions to you

* **Q1**: How many plots were sampled?
* **Q2**: How many species are there in the dataset?
* **Q3**: How many traits are available?
* **Q4**: How many of them are continuous? How many of them are discrete?
* **Q5**: What is the most numerous family among all observed species?
* **Q6**: What is the most numerous genus?
::::

### Environment variables

Forest loss proportion is one of the main driver variable. The data has been acquired across different block with different proportion of logging and compared to unlogged forest.

```{r forest-block}
boxplot(forestloss17 ~ block, data = plot_data,
        xlab = "Block of plot", ylab = "Forest loss (%)",
        main = "Forest loss in funciton of block of data")
```


## References