This article is focused on the complexity of finding and analyzing the totality of educational information shared by the University of Bologna on its website during the last twenty years. It specifically emphasizes some issues related to the use of the Wayback Machine, the most important international web archive, and the need for a different research tool which would guarantee more solid analyses of the corpus. This tool could initially be characterized by the use of standard Natural Language Processing techniques (such as tokenization, stop-words removal, parsing, etc.) but we also have to take into consideration more complex solutions, such as text mining analyses, WordNet integration and an ontological representation of knowledge. Thanks to approaches like the one here presented, future historians will be able to efficiently study the evolution of a university website.
Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.
Dieser Datensatz wurde nicht während einer Tätigkeit an der Universität Mannheim veröffentlicht, dies ist eine Externe Publikation.