Volltext-Downloads (blau) und Frontdoor-Views (grau)

A Hybrid Approach to Assignment of Library of Congress Subject Headings

  • Library of Congress Subject Headings (LCSH) are popular for indexing library records. We studied the possibility of assigning LCSH automatically by training classifiers for terms used frequently in a large collection of abstracts of the literature on hand and by extracting headings from those abstracts. The resulting classifiers reach an acceptable level of precision, but fail in terms of recall partly because we could only train classifiers for a small number of LCSH. Extraction, i.e., the matching of headings in the text, produces better recall but extremely low precision. We found that combining both methods leads to a significant improvement of recall and a slight improvement of F1 score with only a small decrease in precision.

Download full text files

Export metadata

Statistics

frontdoor_oas
Metadaten
Author:Christian WartenaORCiDGND, Michael Franke-MaierORCiDGND
URN:urn:nbn:de:bsz:960-opus4-15658
URL:https://publikationen.bibliothek.kit.edu/1000105121
DOI:https://doi.org/10.25968/opus-1565
DOI original:https://doi.org/10.5445/KSP/1000085951/22
ISSN:2363-9881
Parent Title (English):Archives of Data Science, Series A
Document Type:Article
Language:English
Year of Completion:2018
Publishing Institution:Hochschule Hannover
Release Date:2020/01/23
Tag:Classification; Keyword Extraction; LCSH; Machine Learning
GND Keyword:Library of Congress; Schlagwort; Automatische Klassifikation; Maschinelles Lernen
Volume:4
Issue:1
Link to catalogue:1688179658
Institutes:Fakultät III - Medien, Information und Design
DDC classes:020 Bibliotheks- und Informationswissenschaft
Licence (German):License LogoCreative Commons - CC BY-SA - Namensnennung - Weitergabe unter gleichen Bedingungen 4.0 International