Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running Matilda with expression matrix #1

Open
vanhoan310 opened this issue May 30, 2023 · 9 comments
Open

Running Matilda with expression matrix #1

vanhoan310 opened this issue May 30, 2023 · 9 comments

Comments

@vanhoan310
Copy link

I would like to run Matilda (classification). However I don't have the inputs in .h5 format. I only have gene expression matrix (in .csv), a ADT matrix (in .csv), and labels. Is an easy way to run Matilda using these inputs?

Thanks!

@liuchunlei0430
Copy link
Collaborator

liuchunlei0430 commented May 31, 2023

Thank you for your interest in Matilda. In R, you can read a '.csv' file and convert the matrix into a '.h5' format using the following function:

write_h5 <- function(exprs_list, h5file_list) {
  if (length(unique(lapply(exprs_list, rownames))) != 1) {
    stop("rownames of exprs_list are not identical.")
  }
  for (i in seq_along(exprs_list)) {
    if (file.exists(h5file_list[i])) {
      warning("h5file exists! will rewrite it.")
      system(paste("rm", h5file_list[i]))
    }
    h5createFile(h5file_list[i])
    h5createGroup(h5file_list[i], "matrix")
    writeHDF5Array(t((exprs_list[[i]])), h5file_list[i], name = "matrix/data")
    h5write(rownames(exprs_list[[i]]), h5file_list[i], name = "matrix/features")
    h5write(colnames(exprs_list[[i]]), h5file_list[i], name = "matrix/barcodes")
    print(h5ls(h5file_list[i]))
  }
}
write_h5(exprs_list = list(data = your_matrix),  h5file_list = c(saved_path))  # for example, saved_path is "./rna.h5"

When saving your data into an '.h5' format, make sure to replace 'your_matrix' with your actual data and 'saved_path' with the desired file path where you want to save the data.

Hope this help.

@vanhoan310
Copy link
Author

Thanks alot! which R library that supports the function writeHDF5Array?

@liuchunlei0430
Copy link
Collaborator

library(HDF5Array)

You can install it using the following commands:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("HDF5Array")

@vanhoan310
Copy link
Author

Dear developers,

How can I use Matilda for training and classifying cell types on CITE-seq data (RNA + ADT)? I followed the tutorial and omitted the ---atac parameter but it does not work.

Thanks!

@liuchunlei0430
Copy link
Collaborator

For Training CITE-seq data:
python main_matilda_train.py --rna ../data/TEAseq/train_rna.h5 --adt ../data/TEAseq/train_adt.h5 --cty ../data/TEAseq/train_cty.csv #Training CITEseq

For classifying CITE-seq data:
python main_matilda_task.py --rna ../data/TEAseq/test_rna.h5 --adt ../data/TEAseq/test_adt.h5 --cty ../data/TEAseq/test_cty.csv --classification True --query True # Classification for CITEseq

Hope this help.

@vanhoan310
Copy link
Author

Thanks for your fast reply. I followed your instruction but I got the following error.

Traceback (most recent call last):
File "main_matilda_task.py", line 136, in
atac_name = h5py.File(atac_data_path,"r")['matrix/features'][:]
NameError: name 'atac_data_path' is not defined

@liuchunlei0430
Copy link
Collaborator

Thanks for this, I have updated the codes as

rna_name  = h5py.File(rna_data_path,"r")['matrix/features'][:]
if args.adt != "NULL":
    adt_name  = h5py.File(adt_data_path,"r")['matrix/features'][:]
if args.atac!= "NULL":
    atac_name  = h5py.File(atac_data_path,"r")['matrix/features'][:]

You can re-download the codes to solve this problem.

@vanhoan310
Copy link
Author

It works now. Thanks alot.

@hongfeiZhang-source
Copy link

Hello, I downloaded the TEA-seq dataset mentioned in your article for replication purposes. However, based on the README file provided, I'm still unsure how to preprocess the data into a format suitable for inputting into the model. Could you please assist me by sharing the steps or procedures you used to prepare the data for model input?Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants