Design and Implementation of Architectures for Interactive Textual Documents Collation Systems

Please always quote using this URN: urn:nbn:de:bvb:20-opus-56601
  • One of the main purposes of textual documents collation is to identify a base text or closest witness to the base text, by analyzing and interpreting differences also known as types of changes that might exist between those documents. Based on this fact, it is reasonable to argue that, explicit identification of types of changes such as deletions, additions, transpositions, and mutations should be part of the collation process. The identification could be carried out by an interpretation module after alignment has taken place. UnfortunatelyOne of the main purposes of textual documents collation is to identify a base text or closest witness to the base text, by analyzing and interpreting differences also known as types of changes that might exist between those documents. Based on this fact, it is reasonable to argue that, explicit identification of types of changes such as deletions, additions, transpositions, and mutations should be part of the collation process. The identification could be carried out by an interpretation module after alignment has taken place. Unfortunately existing collation software such as CollateX1 and Juxta2’s collation engine do not have interpretation modules. In fact they implement the Gothenburg model [1] for collation process which does not include an interpretation unit. Currently both CollateX and Juxta’s collation engine do not distinguish in their critical apparatus between the types of changes, and do not offer statistics about those changes. This paper presents a model for both integrated and distributed collation processes that improves the Gothenburg model. The model introduces an interpretation component for computing and distinguishing between the types of changes that documents could have undergone. Moreover two architectures implementing the model in order to solve the problem of interactive collation are discussed as well. Each architecture uses CollateX library, and provides on the one hand preprocessing functions for transforming input documents into CollateX input format, and on the other hand a post-processing module for enabling interactive collation. Finally simple algorithms for distinguishing between types of changes, and linking collated source documents with the collation results are also introduced.show moreshow less

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar Statistics
Metadaten
Author: Mohamadou Nassourou
URN:urn:nbn:de:bvb:20-opus-56601
Document Type:Preprint
Faculties:Philosophische Fakultät (Histor., philolog., Kultur- und geograph. Wissensch.) / Institut für deutsche Philologie
Language:English
Year of Completion:2011
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
GND Keyword:Softwarearchitektur; Textvergleich
Tag:Gothenburg model of collation process; interactive collation of textual variants; service based software architecture; service brokerage
Release Date:2011/05/17
Licence (German):License LogoDeutsches Urheberrecht