Parallel corpus mining

Metadata

Status: Proposed (NB: still has to be discussed with relevant researchers)
Type: Specific
Work Package: WP3
Research Coordinators: Time in Translation group
Coordinators for CLARIAH: Jesse de Does, Vincent Vandeghinste
Participating Institutes: INT, UU
End-users: Time in Translation group
Developers: (Who is involved in implementing this use-case (if any)? Try to mention name, institute, role/responsibility)
Interest Groups: (a list of CLARIAH interest groups, such as Text and DevOps, for which this use case may be relevant. See the list of IG's at: https://github.com/clariah/ig/.
Task IDs: Wp3 search engine extensions: parallel corpora; treebanks

Description

Progress in studying verbal tense and aspect semantics can be made by applying quantitative corpus methods in the field of semantic micro-typology, in particular by exploiting the possibilities of translation corpora.

What is the research about?

Tense-aspect categories found across languages.

What problem is hindering the research?

Absence of a flexible, open source and user-friendly environment to explore the corpus data.

What is needed to do the research?

We propose extensions to blacklab/blacklab-server/autosearch

to enable parallel concordancing
extraction of relevant statistics
upload of parallel data created by researchers into autosearch
exploitation of existing parallel corpora

Data

Parallel UD-enriched corpora (tagging, lemmatization, dependency syntax)

created by researchers
existing corpora (OPUS, etc)

Tools

extended version of blacklab/autosearch
Visualization and analysis tools developed by the Time in Translation group

What software and services are involved?

How to evaluate this?

Is the researched satisfied?

References

References to related resources and publications and especially links to related use-cases:

CLARIAH

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel-corpus-mining.md

parallel-corpus-mining.md

Parallel corpus mining

Metadata

Description

What is the research about?

What problem is hindering the research?

What is needed to do the research?

Data

Tools

What software and services are involved?

How to evaluate this?

References

Files

parallel-corpus-mining.md

Latest commit

History

parallel-corpus-mining.md

File metadata and controls

Parallel corpus mining

Metadata

Description

What is the research about?

What problem is hindering the research?

What is needed to do the research?

Data

Tools

What software and services are involved?

How to evaluate this?

References