21-AAV_DLVE.Rmd

---
title: "Compare differentially expressed LVs across AAV datasets"
output:   
  html_notebook: 
    toc: true
    toc_float: true
---

**J. Taroni 2018**

Here, we'll compare what latent variables are differentially expressed in each 
of the ANCA-associated vasculitis data sets.
An advantage of the multiPLIER approach is that we don't have to compare across 
multiple models.
We'll use a cutoff of `FDR < 0.05`. 
Recall that we used "BH" correction for multiple testing in each case.

At first, we'll take the naive approach of just looking at the overlapping
sets of significant LVs.
There's no guarantee that the directionality will be in agreement this way.

## Functions and directory set up

```{r}
# magrittr pipe
`%>%` <- dplyr::`%>%`
```

```{r}
# plot and result directory setup for this notebook
plot.dir <- file.path("plots", "21")
dir.create(plot.dir, recursive = TRUE, showWarnings = FALSE)
results.dir <- file.path("results", "21")
dir.create(results.dir, recursive = TRUE, showWarnings = FALSE)
```


## Find overlap

### Read in data

```{r}
# files with all the limma results
nares.file <- file.path("results", "18", 
                        "NARES_recount2_model_LV_limma_results.tsv")
blood.file <- file.path("results", "19", 
                        "GPA_blood_recount2_model_LV_limma_results.tsv")
kidney.file <- file.path("results", "20", 
                         "ERCB_glom_recount2_model_LV_limma_results.tsv")
```

```{r}
nares.df <- readr::read_tsv(nares.file)
blood.df <- readr::read_tsv(blood.file)
kidney.df <- readr::read_tsv(kidney.file)
```

### Identify significant LVs

```{r}
# initialize list to hold significant pathways for each of the datasets -- this
# is what VennDiagram functions generally take as an argument
sig.list <- list()
sig.list[["nares"]] <- nares.df$LV[which(nares.df$adj.P.Val < 0.05)]
sig.list[["blood"]] <- blood.df$LV[which(blood.df$adj.P.Val < 0.05)]
sig.list[["kidney"]] <- kidney.df$LV[which(kidney.df$adj.P.Val < 0.05)]
```

### Overlap

```{r}
overlap <- VennDiagram::calculate.overlap(sig.list)
overlap.file <- file.path(results.dir, "AAV_FDR_0.05_overlap.RDS")
saveRDS(overlap, file = overlap.file)
```

### Venn Diagram

```{r}
vd.file <- file.path(plot.dir, "AAV_FDR_0.05_overlap.png")
VennDiagram::venn.diagram(x = sig.list, filename = vd.file,
                          imagetype = "png",
                          resolution = 600,
                          category.names = c("NARES", "GPA blood", 
                                             "ERCB glomeruli"))
```

### What are the 22 LVs that are differentially expressed in all datasets/tissues?

```{r}
# which element of the overlap list are we looking for?
lapply(overlap, length)
```

```{r}
overlap$a5
```

```{r}
# remove files & data.frame no longer needed once we have the overlap
rm(list = setdiff(ls(), c("%>%", "overlap", "plot.dir", "results.list")))
```

## Plotting overlapping LVs

```{r}
lvs.to.plot <- overlap$a5
```

### B data.frames

We'll use these, from the individual dataset notebooks, for plotting.

```{r}
nares.file <- file.path("results", "18", 
                        "NARES_recount2_model_B_long_sample_info.tsv")
nares.df <- readr::read_tsv(nares.file)
gpa.file <- file.path("results", "19", 
                      "GPA_blood_recount2_model_B_long_sample_info.tsv")
gpa.df <- readr::read_tsv(gpa.file)
kidney.file <- file.path("results", "20", 
                         "ERCB_glom_recount2_model_B_long_sample_info.tsv")
kidney.df <- readr::read_tsv(kidney.file)
```

#### Recoding and reordering

We'll want the plots to be in the same general order and for the x-axis labels
to display well.

```{r}
# Nasal brushing data
nares.df <- nares.df %>%  
  dplyr::mutate(Classification =
                  dplyr::recode(Classification,
                    `GPA with active nasal disease` = "GPA active",
                    `GPA with prior nasal disease` = "GPA prior",
                    `GPA (no prior nasal disease)` = "GPA none"
                  )) %>%
  dplyr::mutate(Classification = 
                  factor(Classification, 
                            levels = c("GPA active",
                                       "GPA prior",
                                       "GPA none",
                                       "EGPA",
                                       "Allergic Rhinitis",
                                       "Sarcoidosis",
                                       "Healthy")))
```

```{r}
# GPA PBMC data
gpa.df <- gpa.df %>%
  dplyr::mutate(GPA_signature = dplyr::recode(GPA_signature, 
                                              GPApos = "GPA-positive",
                                              GPAneg = "GPA-negative",
                                              Control = "Control")) %>%
  dplyr::mutate(GPA_signature = factor(GPA_signature,
                                       levels = c("GPA-positive", 
                                                  "GPA-negative", 
                                                  "Control"))) 
```

```{r}
kidney.df <- kidney.df %>%
  dplyr::mutate(Diagnosis = factor(Diagnosis, 
                                   levels = c("Vasculitis", 
                                              "Nephrotic syndrome",
                                              "Living donor control")))
```

### Plotting functions

```{r}
mean_ci <- function(x) ggplot2::mean_se(x, mult = 2)

# custom ggplot2 theme
custom_theme <- function() {
  ggplot2::theme_bw() +
    ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, 
                                                       hjust = 1),
                   legend.position = "none",
                   plot.title = ggplot2::element_text(hjust = 0.5, 
                                                      face = "bold"),
                   axis.title.x = ggplot2::element_blank()) 
}
```

```{r}
# wrapper function for plotting the three datasets -- this is only intended to 
# be used in this environment
MultiPlotWrapper <- function(lv.name) {
  # lv.name: string to be passed to dplyr::filter -- one LV plotted at a time
  # returns a 3 panel plot with strip charts from the three tissues, the output
  # of cowplot::plot_grid
  
  #### NARES ####
  nares.p <- nares.df %>%
    dplyr::filter(LV == lv.name) %>%
    ggplot2::ggplot(ggplot2::aes(x = Classification, y = Value)) +
      ggplot2::geom_jitter(width = 0.2, alpha = 0.6,
                           ggplot2::aes(colour = Classification)) +
      ggplot2::stat_summary(geom = "pointrange", 
                            fun.data = mean_ci) +
      ggplot2::labs(y = lv.name, title = "NARES") +
      ggplot2::scale_color_manual(values = c("#2F4F4F", "#191970", "#20B2AA",
                                             "#666666", "#CDCD00", "#FF4500",
                                             "#FF8C69")) +
      custom_theme()
  
  #### Glomeruli ####
  glom.p <- kidney.df %>% 
    dplyr::filter(LV == lv.name) %>%
    ggplot2::ggplot(ggplot2::aes(x = Diagnosis, y = Value)) +
      ggplot2::geom_jitter(width = 0.2, alpha = 0.6,
                           ggplot2::aes(colour = Diagnosis)) +
      ggplot2::stat_summary(geom = "pointrange",
                            fun.data = mean_ci) +
      ggplot2::labs(y = lv.name, title = "Glomeruli") +
      ggplot2::scale_color_manual(values = c("#2F4F4F", "#666666", "#FF8C69")) +
      custom_theme()
  
  #### PBMC ####
  pbmc.p <- gpa.df %>%
    dplyr::filter(LV == lv.name) %>%
    ggplot2::ggplot(ggplot2::aes(x = GPA_signature, y = Value)) +
      ggplot2::geom_jitter(width = 0.2, alpha = 0.6,
                         ggplot2::aes(colour = GPA_signature)) +
      ggplot2::stat_summary(geom = "pointrange", 
                            fun.data = mean_ci) +
      ggplot2::labs(y = lv.name, title = "GPA PBMCs") +
      ggplot2::scale_color_manual(values = c("#2F4F4F", "#20B2AA", "#FF8C69")) +
      custom_theme()  
  
  #### Return multipanel plot ####
  return(cowplot::plot_grid(nares.p, glom.p, pbmc.p, align = "h", ncol = 3))
    
}

```

### Plotting itself

```{r}
# list to hold the multipanel plots
plot.list <- lapply(lvs.to.plot, MultiPlotWrapper)
```

```{r}
plot.file <- file.path(plot.dir, "significant_LVs_3_AAV_sets.pdf")
pdf(plot.file, width = 8, height = 6, onefile = TRUE)
plot.list
dev.off()
```