Skip to content

key-r-code/naive-bayes-multi-level-basic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 

Repository files navigation

naive-bayes-multi-level-basic

NBC-Based Novelty Detection in Multi-Level Taxonomy

This project investigates Naive-Bayes novelty detection using k-mer counting across different taxonomic levels. The database used for this project consisted of 4634 unique species.

Methodology

Dataset creation

The first step of the project is focused on creating balanced training datasets. At the superkingdom level, training sets were generated by selecting 2 out of 4 available classes (archaea, eukaryota, bacteria, viruses) resulting in 6 unique combinations.

At lower taxonomic levels (phylum, class, order, family), classes containing fewer than 30 instances were first excluded from the database. Then, 50% of the total representatives were randomly sampled five times, resulting in five trials at each level for each k-mer length used.

k-mer counting

Models are trained on k-mer frequencies. All k-mer count files were generated using Jellyfish. The k-mers used in this project were of length 3, 6, 9, 12 and 15.

Testing data

The testing data consisted of 100 random reads from each class (species) in the database. The same testing sequence was used for all trials.

Post-data analysis and ROC/AUC generation

Each classification produces a CSV file with the logarithmic probability of each genome in the testing sequence. For each trial, genome sequences in the training data were labeled as "known", while those not present were labeled as "unknown", simplifying this multi-modal problem into a binary classification task. ROC/AUC curves as well as distribution plots were then generated to assess novelty detection.

Results

plot

All scripts in this project were executed on Picotte, Drexel's main high-performance computing cluster.

About

NBC-Based Novelty Detection in Multi-Level Taxonomy

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published