Coreference Resolution

This repository contains PyTorch reimplementation of the EMNLP paper "BERT for Coreference Resolution: Baselines and Analysis" by Joshi et al., 2019. The code is built upon this repository and involves substantial modifications and bug corrections, and rests upon Span-BERT as the document encoder.

Data

The source code assumes access to the English train, test, and development data of OntoNotes Release 5.0. This data should be located in a folder called 'data' inside the main directory. The data consists of 2,802 training documents, 343 development documents, and 348 testing documents. The average length of all documents is 454 words with a maximum length of 4,009 words. The number of mentions and coreferences in each document varies drastically, but is generally correlated with document length.

Since the data require a license from the Linguistic Data Consortium to use, they are thus not supplied here. Information on how to download it can be found here. Run the the following script to preprocess it (Note: requires Python 2):

bash preprocess_conll.sh PATH_TO_ONTONOTES_5.0 data/ english

Then run the following to preprocess the data into the format used here and save it:

python prepare_data.py

Training

First, install the requirements as specified in requirements.txt You can start training as follows:

python main.py --train

You can add --pretrained_coref_path PATH with the path to the model save if the training was interrupted. The pretrained model can be downloaded here.

Testing

python main.py --test --pretrained_coref_path PATH

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
ubc_coref		ubc_coref
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
preprocess_conll.sh		preprocess_conll.sh
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coreference Resolution

Data

Training

Testing

About

Releases

Packages

Contributors 2

Languages

grig-guz/ubc-coref

Folders and files

Latest commit

History

Repository files navigation

Coreference Resolution

Data

Training

Testing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages