This repository contains classification models, used for the ANDDigest project.
This notebook can be used for the fine-tuning of the classification models for the binary classification of the names of biological entities for ANDDigest, the code was developed on the basis of the McCormick, Chris, and Nick Ryan. "Bert fine-tuning tutorial with pytorch." Retrieved January 24 (2019): 2021.
Please keep in mind, that in order to repeat a fine-tuning, as it was done in the manuscript, the same dataset should be downloaded, and valid pathes to the correspond files from the dataset need to be specified within the notebook.
This notebook can be used for the application of the fine-tuned classifiers.
The examples folder contains positive and negative input and output examples, used for the classification of the short names of cell components.
The Gold Standards folder contains 2 subdirectories, a first one, entitled ANDSystem, contains data used for the main assesement of the accuracy for each developed model, as it is described in the manuscript. The Other subfolder contains input and output data, that was generated using the existing gold standards.
All the datasets used for the fine-tuning are freely available, and can be downloaded from the datasets section via the following link. The models section contains the fine-tuned models used as a part of AI NER module in ANDDigest.
If you found ANDDigest, or the developed models, to be useful in your research, please cite the following articles:
Ivanisenko, T.V., Saik, O.V., Demenkov, P.S. et al. ANDDigest: a new web-based module of ANDSystem for the search of knowledge in the scientific literature. BMC Bioinformatics 21 (Suppl 11), 228 (2020). https://doi.org/10.1186/s12859-020-03557-8
Ivanisenko, T.V.; Demenkov, P.S.; Kolchanov, N.A.; Ivanisenko, V.A. The New Version of the ANDDigest Tool with Improved AI-Based Short Names Recognition. Int. J. Mol. Sci. 2022, 23, 14934. https://doi.org/10.3390/ijms232314934