-
Notifications
You must be signed in to change notification settings - Fork 923
Assay
The Assay
class stores single cell data.
For typical scRNA-seq experiments, a Seurat object will have a single Assay ("RNA"). This assay will also store multiple 'transformations' of the data, including raw counts (@counts slot), normalized data (@data slot), and scaled data for dimensional reduction (@scale.data slot).
For more complex experiments, an object could contain multiple assays. These could include multi-modal data types (CITE-seq antibody-derived tags, ADTs), or imputed/batch-corrected measurements. Each of those assays has the option to store the same data transformations as well.
Slot | Function |
---|---|
counts |
Stores unnormalized data such as raw counts or TPMs |
data |
Normalized data matrix |
scale.data |
Scaled data matrix |
key |
A character string to facilitate looking up features from a specific Assay
|
var.features |
A vector of features identified as variable |
meta.features |
Feature-level meta data |
Summary information about Assay
objects can be had quickly and easily using standard R functions. Object shape/dimensions can be found using the dim
, ncol
, and nrow
functions; cell and feature names can be found using the colnames
and rownames
functions, respectively, or the dimnames
function.
# The following examples use the RNA assay from the PBMC 3k dataset
> rna
Assay data with 13714 features for 2638 cells
Top 10 variable features:
PPBP, DOK3, NFE2L2, ARVCF, YPEL2, UBE2D4, FAM210B, CTB-113I20.2, GBGT1,
GMPPA
# nrow and ncol provide the number of features and cells, respectively
# dim provides both nrow and ncol at the same time
> dim(x = rna)
[1] 13714 2638
# In addtion to rownames and colnames, one can use dimnames
# which provides a two-length list with both rownames and colnames
> head(x = rownames(x = rna))
[1] "AL627309.1" "AP006222.2" "RP11-206L10.2" "RP11-206L10.9"
[5] "LINC00115" "NOC2L"
> head(x = colnames(x = rna))
[1] "AAACATACAACCAC" "AAACATTGAGCTAC" "AAACATTGATCAGC" "AAACCGTGCTTCCG"
[5] "AAACCGTGTATGCG" "AAACGCACTGGTAC"
Accessing data from an Assay
object is done in several ways. Expression data is accessed with the GetAssayData
function. Pulling expression data from the data
slot can also be done with the single [
extract operator. Adding expression data to either the counts
, data
, or scale.data
slots can be done with SetAssayData
. New data must have the same cells in the same order as the current expression data.
# Slicing data using the single [ extract operator can take
# numeric slices or vectors of row/column names
> rna[1:3, 1:3]
3 x 3 sparse Matrix of class "dgCMatrix"
AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC
AL627309.1 . . .
AP006222.2 . . .
RP11-206L10.2 . . .
# GetAssayData allows pulling from a specific slot rather than just data
> GetAssayData(object = rna, slot = 'scale.data')[1:3, 1:3]
AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC
AL627309.1 -0.06547546 -0.10052277 -0.05804007
AP006222.2 -0.02690776 -0.02820169 -0.04508318
RP11-206L10.2 -0.03596234 -0.17689415 -0.09997719
# SetAssayData example...
Feature-level meta data can be accessed with the double [[
extract operator. Adding feature-level meta data can be set using the double [[
extract operator as well. The HVFInfo
function serves a specific version of the double [[
extract operator, pulling certain columns from the meta data.
# Feature-level meta data is stored as a data frame
# Standard data frame functions work on the meta data data frame
> colnames(x = rna[[]])
[1] "mean" "dispersion" "dispersion.scaled"
# HVFInfo pulls mean, dispersion, and dispersion scaled
# Useful for viewing the results of FindVariableFeatures
> head(x = HVFInfo(object = rna))
mean dispersion dispersion.scaled
AL627309.1 0.013555659 1.432845 -0.6236875
AP006222.2 0.004695980 1.458631 -0.5728009
RP11-206L10.2 0.005672517 1.325459 -0.8356099
RP11-206L10.9 0.002644177 0.859264 -1.7556304
LINC00115 0.027437275 1.457477 -0.5750770
NOC2L 0.376037723 1.876440 -0.4162432
# One can pull multiple values from the data frame at any time
> head(x = rna[[c('mean', 'dispersion')]])
mean dispersion
AL627309.1 0.013555659 1.432845
AP006222.2 0.004695980 1.458631
RP11-206L10.2 0.005672517 1.325459
RP11-206L10.9 0.002644177 0.859264
LINC00115 0.027437275 1.457477
NOC2L 0.376037723 1.876440
# Passing `drop = TRUE` will turn the meta data into a names vector
# with each entry being named for the cell it corresponds to
> head(x = rna[['mean', drop = TRUE]])
AL627309.1 AP006222.2 RP11-206L10.2 RP11-206L10.9 LINC00115
0.013555659 0.004695980 0.005672517 0.002644177 0.027437275
NOC2L
0.376037723
# Add meta data example
The vector of variable features can be pulled with the VariableFeatures
function. VariableFeatures
can also set the vector of variable features.
# VariableFeatures both accesses and sets the vector of variable features
> head(x = VariableFeatures(object = rna))
[1] "PPBP" "DOK3" "NFE2L2" "ARVCF" "YPEL2" "UBE2D4"
# Set variable features example
The key
# Key both accesses and sets the key slot for an Assay object
> Key(object = rna)
"rna_"
> Key(object = rna) <- 'myRNA_'
> Key(object = rna)
"myRNA_"
# Pull a feature from the RNA assay on the Seurat level
> head(x = FetchData(object = pbmc, vars.fetch = 'rna_MS4A1'))
rna_MS4A1
AAACATACAACCAC 0.000000
AAACATTGAGCTAC 2.583047
AAACATTGATCAGC 0.000000
AAACCGTGCTTCCG 0.000000
AAACCGTGTATGCG 0.000000
AAACGCACTGGTAC 0.000000
Methods for the Assay
class can be found with the following:
library(Seurat)
utils::methods(class = 'Assay')
-
[
: access expression data from thedata
slot -
[[
: access feature-level metadata -
[[<-
: add feature-level metadata -
colMeans
: calculate means across columns (cells) of any expression matrix within theAssay
-
colSums
: calculate sums across columns (cells) of any expression matrix within theAssay
-
dimnames
: get a list with row (feature) and column (cell) names -
dim
: get the number of features (indata
) and cells in theAssay
-
GetAssayData
: pull one of the expression matrices within theAssay
-
HVFInfo
: -
Key
: get the key assigned to theAssay
Key<-
: ...-
merge
: ... -
RenameCells
: ... -
rowMeans
: calculate means across rows (features) of any expression matrix within theAssay
-
rowSums
: calculate sums across rows (features) of any expression matrix within theAssay
-
SetAssayData
: add data to or replace one of the expresion matrices within theAssay
-
SubsetData
: ... -
VariableFeatures
: pull the names of features designated as variable -
VariableFeatures<-
: assign a vector of features that are considered variable -
WhichCells
: ...