Heuristic approach to curate disease taxonomy beyond nosology-based standards

Environment setting: python version : 3.7.0

1、Infodemiological study

Experimental corpus：GDELT Summary

The textual and visual narratives of different queries
65 multilingual online news
Machine translate capacity
Network image recognition capacity

2、Historiographical study

Experimental corpus：Google Books Ngram Corpus

n-grams from approximately 8 million books
6% of all books published in Eight languages
- English
- Hebrew
- French
- German
- Spanish
- Russian
- Italian
- Chinese
Book data logs from 1500 to 2019

3、Semantic similarity experiments

Experimental corpus

BERT [ the BookCorpus (800M words) and English Wikipedia (2,500M words) ]
PubMedBERT [ PubMed abstracts (14M abstracts, 3.2B words, 21GB) ]

4、Semantic drift experiments

Experimental corpus：Google Books Ngram Corpus

Citing

Code for our paper "Heuristic approach to curate disease taxonomy beyond nosology-based standards". Please cite our paper if you find this repository helpful in your research.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heuristic approach to curate disease taxonomy beyond nosology-based standards

1、Infodemiological study

Experimental corpus：GDELT Summary

2、Historiographical study

Experimental corpus：Google Books Ngram Corpus

3、Semantic similarity experiments

Experimental corpus

4、Semantic drift experiments

Experimental corpus：Google Books Ngram Corpus

Citing

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Historiographical study		Historiographical study
Infodemiological study		Infodemiological study
Semantic drift experiments		Semantic drift experiments
Semantic similarity experiments		Semantic similarity experiments
README.md		README.md

Computational-social-science/Naming_human_disease

Folders and files

Latest commit

History

Repository files navigation

Heuristic approach to curate disease taxonomy beyond nosology-based standards

1、Infodemiological study

Experimental corpus：GDELT Summary

2、Historiographical study

Experimental corpus：Google Books Ngram Corpus

3、Semantic similarity experiments

Experimental corpus

4、Semantic drift experiments

Experimental corpus：Google Books Ngram Corpus

Citing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages