Definition and Evaluation of the NEOCR Dataset for Natural-Image Text Recognition

Language
en
Document Type
Report
Issue Date
2011-09-28
Issue Year
2011
Authors
Nagy, Robert
Dicker, Anders
Meyer-Wegener, Klaus
Editor
Abstract

Recently growing attention has been paid to recognizing text in natural images. Natural image text OCR is far more complex than OCR in scanned documents. Text in real world environments appears in arbitrary colors, font sizes and typefaces, often affected by perspective distortion, lighting effects, textures or occlusion. Currently there is no dataset publicly available that covers all aspects of natural image OCR. A comprehensive well-annotated configurable dataset for optical character recognition in natural images is defined and created for the evaluation and comparison of approaches tackling with natural-image text OCR. Furthermore, current open source and commercial OCR tools have been analyzed in various test scenarios using the proposed NEOCR dataset. Based on the results further steps to be addressed by the OCR community are concluded towards all-embracing natural-image text recognition.

Series
Technical reports / Department Informatik
Series Nr.
CS-2011-07
DOI
Document's Licence
Faculties & Collections
Zugehörige ORCIDs