Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

odd result from Lee et al. 2019 data #38

Open
sshen82 opened this issue Apr 29, 2023 · 0 comments
Open

odd result from Lee et al. 2019 data #38

sshen82 opened this issue Apr 29, 2023 · 0 comments

Comments

@sshen82
Copy link

sshen82 commented Apr 29, 2023

Hi,
I was using ALLCools on the Lee et al. 2019 single-cell methylation data (https://pubmed.ncbi.nlm.nih.gov/31501549/). The code I use to generate the mcds data is something like below:

allcools generate-dataset --allc_table /p/keles/schic/volumeD/HumanPatchSeq/allcfiles.tsv --output_path /p/keles/schic/volumeD/HumanPatchSeq/Lee2019.mcds --chrom_size_path hg19.chrom.sizes--obs_dim cell --cpu 30 --chunk_size 50 --regions geneslop2k /p/keles/schic/volumeD/HumanPatchSeq/hg19.bed --quantifiers geneslop2k count CGN,CHN

I load the output data like what is in the example. However, when I look at the data using mcds.get_adata(mc_type = "CHN").to_df(), the data shown is like below:

image

Some cells are consistently 10 among all genes. Here is the result if you want to check it. https://drive.google.com/drive/folders/1PQ5jhvTFuRsm9qn8oVDRS9MzCkiMQnx3?usp=share_link

Is this the problem of the data of am I doing something wrong?

Edit: I guess it is the easiest if you can tell me how you do integration between single-cell methylation and other transcriptomics data in a bit more detail? Especially how to form the cell by gene matrix for methylation? Previously, I directly calculate mCH fraction by first filtering out those CG reads, and among all the rest CH reads, for each cell, I calculate the mean of the fifth column in the gene body divided by the mean methylation of the whole cell. However, this approach failed. Below is the r file if you are interested.

library(foreach)
data(Annotations)

# generate a gene annotation data for hg19.
colnames(hg19Annotations) = c("chr", "start", "end", "strand", "names")
hg19Annotations = hg19Annotations[order(hg19Annotations$names), ]
genes = makeGRangesFromDataFrame(hg19Annotations, keep.extra.columns = TRUE)

# list all the files, all of them are tsv.gz methylation files.
filenames <- list.files('/storage08/kwangmoon/data/Lee2019/methylation/rawdata', full.names = TRUE)
names <- list.files('/storage08/kwangmoon/data/Lee2019/methylation/rawdata', full.names = FALSE)

registerDoParallel(8)
foreach(cell = 1:length(filenames)) %dopar% {
    tmp = fread(filenames[cell], select = c(1, 2, 4, 5, 6), fill = TRUE)
    if(!grepl('chr', tmp$V1[1])){tmp$V1 = paste0("chr", tmp$V1)} #add "chr" in the front of chromosomes
    tmpmCG <- tmp %>% filter(!grepl("CG", V4)) # eliminate all CG and keep only CH.
    tmpmCG <- tmpmCG %>% filter(V1 %in% paste0("chr", c(1:22, "X", "Y"))) # only use chr1-22, X, Y.
    methy = GRanges(tmpmCG$V1, IRanges(tmpmCG$V2, width = 1))
    overlapPos = findOverlaps(methy, genes) # get all the reads that overlap with genes.
    output = data.frame(tmpmCG[queryHits(overlapPos), ], 
               genes = hg19Annotations[subjectHits(overlapPos), ]$names)
    mCHmean = mean(output$V5) # this is the mean methylation for all cells
    write.table(output %>% group_by(genes) %>% summarize(Methyl_rate = mean(V5) / mCHmean), # we calculate mean methylation in the gene body divide by the mean methylation of all cells
			file = paste0("/p/keles/schic/volumeD/HumanPatchSeq/MethylGene/", names[cell]), row.names = FALSE, col.names = FALSE, quote = FALSE, sep = "\t")
} 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant