Skip to content

Computational-social-science/Naming_human_disease

Repository files navigation

Heuristic approach to curate disease taxonomy beyond nosology-based standards

Environment setting: python version : 3.7.0

1、Infodemiological study

Experimental corpus:GDELT Summary

  • The textual and visual narratives of different queries
  • 65 multilingual online news
  • Machine translate capacity
  • Network image recognition capacity image

2、Historiographical study

Experimental corpus:Google Books Ngram Corpus

  • n-grams from approximately 8 million books
  • 6% of all books published in Eight languages
    • English
    • Hebrew
    • French
    • German
    • Spanish
    • Russian
    • Italian
    • Chinese
  • Book data logs from 1500 to 2019 image

3、Semantic similarity experiments

Experimental corpus

  • BERT [ the BookCorpus (800M words) and English Wikipedia (2,500M words) ]
  • PubMedBERT [ PubMed abstracts (14M abstracts, 3.2B words, 21GB) ] image

4、Semantic drift experiments

Experimental corpus:Google Books Ngram Corpus

image

Citing

Code for our paper "Heuristic approach to curate disease taxonomy beyond nosology-based standards". Please cite our paper if you find this repository helpful in your research.

About

Naming human diseases: why names matter?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published