scDEED (single-cell dubious embeddings detector): a statistical method for detecting dubious non-linear embeddings
-
This package is used to determine the reliability of non-linear dimension reduction embeddings. It provides functions to detect dubious cells and trustworthy cells in tSNE and UMAP embeddings. Furthermore, by minimizing the number of dubious cells, functions in this package find the best perplexity parameter of tSNE and the best n.neighbors/min.dist parameter of UMAP.
-
Choose the suitable dimension for PCA (num_pc)
-
Input count matrix should contain cells as columns and genes as rows
You can install the released version of scDEED from GitHub with:
library(devtools)
devtools::install_github("JSB-UCLA/scDEED")
This is a basic example showing how to find the best parameter.If users use our example input data and the default parameter list(s), users can get the result in about 2mins We use an example input data which is generated by randomly sampled 10000 cells from Hydra data as a demo:
suppressPackageStartupMessages(library(scDEED))
data(input_counts)
chooseK(input_counts)
ChooseK plot:
umap_example <- scDEED(input_counts , num_pc = 16, use_method = "umap",visualization = TRUE)
head(umap_example$`number of dubious cells corresponding to pair of n.neighbors and min.dist list`)
n.neighbors | min.dist | number of dubious cells | |
---|---|---|---|
1 | 5 | 0.1 | 42 |
2 | 6 | 0.1 | 54 |
3 | 7 | 0.1 | 39 |
4 | 8 | 0.1 | 76 |
5 | 9 | 0.1 | 29 |
6 | 10 | 0.1 | 43 |
umap_example$`best pair of n.neighbors and min.dist`
5 0.5
Comparative UMAP plots of the randomly selected 10000 cells from Hydra dataset under the n.neighbors 50, min.dist 0.7 and the n.neighbors 5, min.dist 0.5 optimized by scDEED:
umap_example$`plot. # of dubious embeddings vs pair of n.neighbors and min.dist`
Plot of number of dubious embeddings vs pair of n.neighbors and min.dist for UMAP:
tsne_example <- scDEED(input_counts, num_pc = 10, use_method = "tsne",visualization = TRUE)
head(tsne_example$`number of dubious cells corresponding to perplexity list`)
perplexity | number of dubious cells | |
---|---|---|
1 | 20 | 323 |
2 | 50 | 6 |
3 | 80 | 7 |
4 | 110 | 10 |
5 | 140 | 13 |
6 | 170 | 12 |
tsne_example$`best perplexity`
50
Comparative tSNE plots of the randomly selected 10000 cells from Hydra dataset under the perplexity 20 and the perplexity 50 optimized by scDEED:
tsne_example$`plot. # of dubious embeddings vs parameters`
Plot of number of dubious embeddings vs parameters for tSNE: