Provided data: list of ~ 2000 gene fusions sequences in ATCG format (nitrogenous bases)
-
Create a new dataset translating the above mentioned sequences into protein alphabet
-
Create a Long Short Term Memory (LSTM) classifier able to classify gene fusions into Oncogenic and NotOncogenic. You have to build two classifiers, one with the dataset provided by us and one with the dataset you have built at step 1.
-
Implement a bidirectional LSTM classifier able to classify gene fusions into Oncogenic and NotOncogenic. You have to build two classifiers, one with the dataset provided by us and one with the dataset you have built at step 1.