Word Sense Disambiguation using GlossBERT on the PMB dataset
This project contains two directories: jupyter
and data
.
This directory contains all the Jupyter notebooks that were used to extract/inspect the data and build the models.
This directory contains the data used in the project, divided into two folders:
only_sns
- contains the raw PMB dataprocessed
- contains the data in the Weak Supervision Context-Gloss pairs
The data in the only_sns
folder was obtained by running the following command on the PMB 4.0.0 dataset:
python3 src/extract_conll.py en data test_dir -j statuses.json -ls sns:g
Running this command extracts the gold english data in combination with their annotated senses.
We recommend running this project in Google Colab.
All required dependecies are installed in the notebooks.
- Folkert Leistra
- Milan van Wouden
- Xi Yu