Abstract
There is an increasing interest in providing common Web users with access to structured knowledge bases such as DBpedia, for example by means of question answering systems. An essential task of such systems is transforming natural language questions into formal queries, e.g. expressed in SPARQL. To this end, such systems require knowledge about how the vocabulary elements used in the available ontologies and datasets are verbalized in natural language, covering different verbalization variants, possibly in multiple languages. An important part of such lexical knowledge is constituted by adjectives. In this paper, we present and evaluate a machine learning approach to extract adjective lexicalizations from DBpedia. This is a challenge that has so far not been addressed. Our approach achieves an accuracy of \(91.15 \%\) on a tenfold cross validation regime. In addition to providing a first baseline system for the task of extracting adjective lexicalizations from DBpedia, we publish the extracted adjective lexicalizations in lemon format for free use by the community.
Similar content being viewed by others
Notes
Which can be downloaded at http://sebastianwalter.org/downloads/dblexipedia/jods_arff_files.tar.gz as .arff files.
References
Almuhareb A, Poesio M (2004) Attribute-based and value-based clustering: an evaluation. In: Proceedings of the 2004 conference on empirical methods in natural language processing (EMNLP 2004), Barcelona, Spain
Boguraev B, Pustejovsky J (eds) (1996) Corpus processing for lexical acquisition. MIT, Cambridge
Boleda G, Badia T, Batlle E (2004) Acquisition of semantic classes for adjectives from distributional evidence. In: Proceedings of the 20th international conference on computational linguistics (COLING’04)
Boleda Torrent G, Alonso i Alemany L (2003) Clustering adjectives for class acquisition. In: Proceedings of the tenth conference on european chapter of the association for computational linguistics (EACL ’03), Budapest, Hungary
Cabrio E, Cojan J, Aprosio AP, Magnini B, Lavelli A, Gandon F (2012) QAKiS: an open domain QA system based on relational patterns. In: Proceedings of the 11th International semantic web conference (ISWC 2012), Boston, USA
Chen Kh, Chen HH (1994) Corpus-Based analyses of adjectives: automatic clustering. In: Proceedings of the international conference on quantitative linguistics, Moscow, Russia
Damljanovic D, Agatonovic M, Cunningham H (2010) Natural language interfaces to ontologies: combining syntactic analysis and ontology-based lookup through the user interaction. In: Proceedings of the 7th extended semantic web conference, ESWC 2010, Heraklion, Crete, Greece, May 30–June 3
Gerber D, Ngomo ACN (2011) Bootstrapping the linked data web. In: Proceedings of the 1st workshop on web scale knowledge extraction, workshop co-located with the 10th International Semantic Web Conference (ISWC 2011)
Gilles S, DBnary: Wiktionary as a lemon-based multilingual lexical resource in RDF. Semant web 6(4):355–361. doi:10.3233/SW-140147
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor. Newsl 11:10–18
Hartung M, Frank A (2010) A semi-supervised type-based classification of adjectives: distinguishing properties and relations. In: Proceedings of the 7th international conference on Language Resources and Evaluation (LREC), Malta
Hartung M, Frank A (2010) A structured vector space model for hidden attribute meaning in adjective-noun phrases. In: Proceedings of the 23rd International conference on computational linguistics (COLING’10), Beijing, China
Hartung M, Frank A (2011) Exploring supervised Lda models for assigning attributes to adjective-noun phrases. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP’11), Edinburgh, UK
Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on computational linguistics (COLING’92), Nantes, France
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Doklady 10:707–710
Lopez V, Fernández M, Motta E, Stieler N (2012) Poweraqua: supporting users in querying and exploring the semantic web. Semant Web 3:249–265
Maillard J, Clark S (2015) Learning adjective meanings with a tensor-based skip-gram model. CoNLL 2015:327
McCrae J, Quattri F, Unger C, Cimiano P (2014) Modelling the semantics of adjectives in the ontology-lexicon interface. In: Proceedings of the cognitive aspects of the lexicon (CogAlex), workshop co-located with the 25th international conference on computational linguistics (COLING 2014)
McCrae J, Spohr D, Cimiano P (2011) Linking lexical resources and ontologies on the semantic web with lemon. In: The semantic web: research and applications. Springer, Heidelberg pp 245–259
Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41
Preiss J, Briscoe T, Korhonen A (2007) A System for large-scale acquisition of verbal, nominal and adjectival subcategorization frames from corpora. In: In: Proceedings of the 45th Annual meeting of the association for computational linguistics (ACL), June 23–30, Prague, Czech Republic
Pustejovsky J (1992) The acquisition of lexical semantic knowledge from large corpora. In: Proceedings of the workshop on speech and natural language, workshop co-located with the association for computational linguistics (ACL), Harriman, New York, USA
Unger C, Bühmann L, Lehmann J, Ngonga Ngomo AC, Gerber D, Cimiano P (2012) Template-based question answering over RDF Data. In: Proceedings of the 21st international conference on World Wide Web (WWW)
Unger C, Forascu C, Lopez V, Ngonga Ngomo AC, Cabrio E, Cimiano P, Walter S (2014) Question answering over linked data (qald-4). In: Working notes for the CLEF 2014 conference
Vila M, Rodríguez H, Martí MA (2010) WRPA: a system for relational paraphrase acquisition from Wikipedia. Procesamiento del lenguaje natural 45:11–19
Walter S, Unger C, Cimiano P (2014) ATOLL—a framework for the automatic induction of ontology lexica. Data Knowl Eng 94(Part B):148–162 (Special issue following the 18th International Conference on Applications of Natural Language Processing to Information Systems (NLDB’13))
Walter S, Unger C, Cimiano P (2014) M-ATOLL: A Framework for the Lexicalization of Ontologies in Multiple Languages. In: In: Proceedings of the 13th International Semantic Web Conference (ISWC 2014), October 19-23, Riva del Garda, Italy
Walter S, Unger C, Cimiano P, Bär D (2012) Evaluation of a layered approach to question answering over linked data. In: In: Proceedings of the 11th international semantic web conference (ISWC 2012), Boston, USA
Walter S, Unger C, Cimiano P, Lanser B (2014) Automatic acquisition of adjective lexicalizations of restriction classes. In: Proceedings of the 2nd International Workshop on NLP and DBpedia, co-located with the 13th International Semantic Web Conference (ISWC 2014), October 19-23, Riva del Garda, Italy
Acknowledgments
This work was supported by the Cluster of Excellence Cognitive Interaction Technology CITEC (EXC 277) at Bielefeld University, which is funded by the German Research Foundation (DFG).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Walter, S., Unger, C. & Cimiano, P. Automatic Acquisition of Adjective Lexicalizations of Restriction Classes: a Machine Learning Approach. J Data Semant 6, 113–123 (2017). https://doi.org/10.1007/s13740-016-0069-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13740-016-0069-0