FuSe - a Multi-Layered Parallel Treebank

While there exist a number of bi- and even multilingual corpora, syntactically analyzed parallel corpora are rare. At Münster University, we have initiated a treebank project with the aim of closing this gap. Our goal is to build a multi-layered treebank of aligned parallel texts in English and Germ...

Verfasser: Cyrus, Lea
Feddes, Hendrik
Schumacher, Frank
FB/Einrichtung:FB 09: Philologie
IKM-Service
Dokumenttypen:Artikel
Medientypen:Text
Erscheinungsdatum:2003
Publikation in MIAMI:27.07.2004
Datum der letzten Änderung:06.04.2022
Angaben zur Ausgabe:[Electronic ed.]
Quelle:Proc. Second Workshop on Treebanks and Linguistic Theories (14-15 November 2003), 213-216
Schlagwörter:Korpuslinguistik; Computerlinguistik; syntaktische Annotation; semantische Annotation
Fachgebiet (DDC):400: Sprache
Lizenz:InC 1.0
Sprache:English
Format:PDF-Dokument
URN:urn:nbn:de:hbz:6-85659523905
Permalink:https://nbn-resolving.de/urn:nbn:de:hbz:6-85659523905
Onlinezugriff:0311_tlt.pdf

While there exist a number of bi- and even multilingual corpora, syntactically analyzed parallel corpora are rare. At Münster University, we have initiated a treebank project with the aim of closing this gap. Our goal is to build a multi-layered treebank of aligned parallel texts in English and German. While we confine ourselves to annotating only one language pair, the design will be such that additional languages can be added, provided there exist appropriate translations. Our working title for the treebank is FuSe, which stands for functional semantic annotation and connotes that two or more languages are fused with each other. Although our main motivation is to contribute to linguistic research rather than to develop a corpus which is tailor-made for a particular N L P-application, we believe that the corpus will prove useful for research in several fields of application, the most obvious one being machine translation. The linguistic annotation of the FuSe corpus will contain the following layers: POS tags, constituent structure, functional relations, predicate-argument structure, and alignment information. The alignment layer is the only one which is defined for a language pair rather than for a single language. Apart from this layer, the subcorpora are complete monolingual resources in their own right. In the following we will concentrate on the predicate-argument structure and on the representation of alignment information.