Options
Unsupervised Text Segmentation for Automated Error Reduction
Abstract
Challenging the assumption that traditional whitespace/punctuation-based tokenisation is the best solution for any NLP application, I propose an alternative approach to segmenting text into processable units. The proposed approach is nearly knowledge-free, in that it does not rely on language-dependent, man-made resources. The text segmentation approach is applied to the task of automated error reduction in texts with high noise. The results are compared to conventional tokenisation.
Publikationstyp
ConferencePaper
Autor*in
Furrer, Lenz
Erscheinungsdatum
2014
Fachbereich
Institut / Einrichtung
Erschienen in
Proceedings of the 12th edition of the KONVENS conference
Erste Seite
178
Letzte Seite
185
URN
urn:nbn:de:gbv:hil2-opus-2804
HilPub Permalink
Dateien p046.pdf (936.62 KB)
Main Conference Proceedings of the 12th Konvens 2014