This project won 2nd prize at Lumiata COVID-19 Global AI Hackathon. Click the image below to see more details.
CSN searcher leverages Siamese RNN architecture proposed by Mueller and Thyagarajan (2016) to provide document search for COVID-19 articles based on section-level similarity. You can provide the section you would like to explore more, and our tool finds research articles contain similar section. The network is built based on dataset generously provided by AI2 on Kaggle (link below).
- Python 3.7 +
- Open your python virtual environment.
- Run the following command to install our package.
pip install -i https://test.pypi.org/simple/ csn-searcher==0.1.1
Note: If you see an error message saying you need torchtext==0.5, please run the following:
pip install torchtext==0.5
- Run the following command to install data (sorry, this takes a while).
csn-search
It downloads the following data:
- Siamese LSTM model (340MB)
- CSN (680MB)
- Vocabulary (71KB)
- Create a .txt file with some input. For example, this website (The New England Journal of Medicine) lists some articles related to COVID-19.
Copy some section in an article and store it in a txt file, e.g.
input.txt
. - Run the following command to query the most similar articles in the CSN.
csn-search \
--input-path input.txt \
--num-search 5
CSN searcher requires Python 3.7+. Please run the following code to install:
pip install -i https://test.pypi.org/simple/ csn-searcher==0.1.1
CSN search enables you to explore COVID-19 articles based on the section you'd want to know more. We provide command line interface so far. All you need to do is to store a section of research article in .txt format, open your terminal and specify the number of search results (3 by default) and path to the txt file!
The following code shows an example usage. It prints out the title of articles and the title of sections most similar to your input.
csn-search \
--input-path data/input.txt \
--num-search 2
Start searching...
+++++++++++++ Search Results +++++++++++++
------------ No. 1 ------------
Similarity score - 0.5761
Article title - ... errors in the icu dj melia ...
Section title - ... transplantation may be associated ...
------------ No. 2 ------------
Similarity score - 0.5739
Article title - ... biliary ...
Section title - ... pathophysiologic rationale ...
+++++++++++++++++++++++++++++++++++++++++++++++++