Definition and Evaluation of the NEOCR Dataset for Natural-Image Text Recognition

Nagy, Robert; Dicker, Anders; Meyer-Wegener, Klaus

Definition and Evaluation of the NEOCR Dataset for Natural-Image Text Recognition

Files

1915_main_small.pdf (1.09 MB)

Language

en

Document Type

Report

Issue Date

2011-09-28

Issue Year

2011

Authors

Nagy, Robert

Dicker, Anders

Meyer-Wegener, Klaus

Abstract

Recently growing attention has been paid to recognizing text in natural images. Natural image text OCR is far more complex than OCR in scanned documents. Text in real world environments appears in arbitrary colors, font sizes and typefaces, often affected by perspective distortion, lighting effects, textures or occlusion. Currently there is no dataset publicly available that covers all aspects of natural image OCR. A comprehensive well-annotated configurable dataset for optical character recognition in natural images is defined and created for the evaluation and comparison of approaches tackling with natural-image text OCR. Furthermore, current open source and commercial OCR tools have been analyzed in various test scenarios using the proposed NEOCR dataset. Based on the results further steps to be addressed by the OCR community are concluded towards all-embracing natural-image text recognition.

Series

Technical reports / Department Informatik

Series Nr.

CS-2011-07