Discourse Type Clustering using POS n-gram Profiles and High-Dimensional Embeddings

Details

Ressource 1Download: BIB_5A2CBDB06CA2.P001.pdf (1815.32 [Ko])
State: Public
Version: author
Serval ID
serval:BIB_5A2CBDB06CA2
Type
Inproceedings: an article in a conference proceedings.
Collection
Publications
Institution
Title
Discourse Type Clustering using POS n-gram Profiles and High-Dimensional Embeddings
Title of the conference
Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Author(s)
Cocco C.
Publisher
Association for Computational Linguistics
Organization
Université d'Avignon
Address
Stroudsburg
ISBN
978-1-937284-19-0
Publication state
Published
Issued date
04/2012
Peer-reviewed
Oui
Pages
55-63
Language
english
Notes
Actes de conférence en ligne
Abstract
Abstract:
To cluster textual sequence types (discourse types/modes) in French texts, K-means algorithm with high-dimensional embeddings and fuzzy clustering algorithm were applied on clauses whose POS (part-ofspeech) n-gram profiles were previously extracted. Uni-, bi- and trigrams were used on four 19th century French short stories by Maupassant. For high-dimensional embeddings, power transformations on the chi-squared distances between clauses were explored. Preliminary results show that highdimensional embeddings improve the quality of clustering, contrasting the use of bi and trigrams whose performance is disappointing, possibly because of feature space sparsity.
Keywords
Discourse types, K-means, high-dimensional embeddings, fuzzy clustering
Create date
22/08/2012 13:18
Last modification date
20/08/2019 14:13
Usage data