U-DIADS-Bib: a full and few-shot pixel-precise dataset for document layout analysis of ancient manuscripts

Zottin, Silvia; De Nardin, Axel; Colombi, Emanuela; Piciarelli, Claudio; Pavan, Filippo; Foresti, Gian Luca

doi:10.1007/s00521-023-09356-5

U-DIADS-Bib: a full and few-shot pixel-precise dataset for document layout analysis of ancient manuscripts

Special Issue on Visual Pattern Recognition and Extraction for Cultural Heritage
Published: 16 January 2024

(2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Silvia Zottin ORCID: orcid.org/0000-0003-0820-7260¹,
Axel De Nardin^1,2,
Emanuela Colombi³,
Claudio Piciarelli¹,
Filippo Pavan³ &
…
Gian Luca Foresti¹

215 Accesses
1 Altmetric
Explore all metrics

Abstract

Document Layout Analysis, which is the task of identifying different semantic regions inside of a document page, is a subject of great interest for both computer scientists and humanities scholars as it represents a fundamental step towards further analysis tasks for the former and a powerful tool to improve and facilitate the study of the documents for the latter. However, many of the works currently present in the literature, especially when it comes to the available datasets, fail to meet the needs of both worlds and, in particular, tend to lean towards the needs and common practices of the computer science side, leading to resources that are not representative of the humanities real needs. For this reason, the present paper introduces U-DIADS-Bib, a novel, pixel-precise, non-overlapping and noiseless document layout analysis dataset developed in close collaboration between specialists in the fields of computer vision and humanities. Furthermore, we propose a novel, computer-aided, segmentation pipeline in order to alleviate the burden represented by the time-consuming process of manual annotation, necessary for the generation of the ground truth segmentation maps. Finally, we present a standardized few-shot version of the dataset (U-DIADS-BibFS), with the aim of encouraging the development of models and solutions able to address this task with as few samples as possible, which would allow for more effective use in a real-world scenario, where collecting a large number of segmentations is not always feasible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Microsoft COCO: Common Objects in Context

Image Generation: A Review

Article 11 March 2022

Data availability

The datasets generated and analysed during the current study are available in the U-DIADS-Bib repository.^{Footnote 6}

Notes

References

Adam K, Baig A, Al-Maadeed S et al (2018) KERTAS: dataset for automatic dating of ancient Arabic manuscripts. Int J Doc Anal Recognit 21(4):283–290. https://doi.org/10.1007/s10032-018-0312-3
Article Google Scholar
Alaei A, Nagabhushan P, Pal U (2011) A new dataset of Persian handwritten documents and its segmentation. In: 2011 7th Iranian conference on machine vision and image processing, pp 1–5. https://doi.org/10.1109/IranianMVIP.2011.6121553
Amelio A, Bonifazi G, Corradini E et al (2022) Defining a deep neural network ensemble for identifying fabric colors. Appl Soft Comput 130(109):687. https://doi.org/10.1016/j.asoc.2022.109687
Article Google Scholar
Amelio A, Bonifazi G, Cauteruccio F et al (2023) Representation and compression of residual neural networks through a multilayer network based approach. Expert Syst Appl 215(119):391. https://doi.org/10.1016/j.eswa.2022.119391
Article Google Scholar
Boillet M, Bonhomme ML, Stutzmann D et al (2019) Horae: an annotated dataset of books of hours. In: Proceedings of the 5th international workshop on historical document imaging and processing. Association for computing machinery, New York, HIP ’19, pp 7–12. https://doi.org/10.1145/3352631.3352633
Bukhari SS, Breuel TM, Asi A et al (2012) Layout analysis for Arabic historical document images using machine learning. In: 2012 international conference on frontiers in handwriting recognition, pp 639–644. https://doi.org/10.1109/ICFHR.2012.227
Chen L, Papandreou G, Schroff F et al (2017) Rethinking atrous convolution for semantic image segmentation. CoRR arXiv:abs/1706.05587
Chen LC, Zhu Y, Papandreou G et al (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari V, Hebert M, Sminchisescu C et al (eds) Computer vision—ECCV 2018. Springer, Cham, pp 833–851
Cilia ND, De Stefano C, Fontanella F et al (2021) Papyrow: a dataset of row images from ancient Greek papyri for writers identification. In: Del Bimbo A, Cucchiara R, Sclaroff S et al (eds) Pattern recognition. Springer, Cham, ICPR International Workshops and Challenges, pp 223–234
Clausner C, Antonacopoulos A, Mcgregor N et al (2018) Icfhr 2018 competition on recognition of historical Arabic scientific manuscripts—rasm2018. In: 2018 16th international conference on frontiers in handwriting recognition (ICFHR), pp 471–476. https://doi.org/10.1109/ICFHR-2018.2018.00088
De Nardin A, Zottin S, Paier M et al (2023a) Efficient few-shot learning for pixel-precise handwritten document layout analysis. In: 2023 IEEE/CVF winter conference on applications of computer vision (WACV), pp 3669–3677. https://doi.org/10.1109/WACV56688.2023.00367
De Nardin A, Zottin S, Piciarelli C, et al (2023) Few-shot pixel-precise document layout segmentation via dynamic instance generation and local thresholding. International Journal of Neural Systems 33(10):2350,052. https://doi.org/10.1142/S0129065723500521
Dolfing HJ, Bellegarda J, Chorowski J et al (2020) The “scribblelens” Dutch historical handwriting corpus. In: 2020 17th international conference on frontiers in handwriting recognition (ICFHR), pp 67–72. https://doi.org/10.1109/ICFHR2020.2020.00023
Fiel S, Kleber F, Diem M et al (2017) Icdar2017 competition on historical document writer identification (historical-wi). In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), pp 1377–1382. https://doi.org/10.1109/ICDAR.2017.225
Fischer A, Indermühle E, Bunke H et al (2010) Ground truth creation for handwriting recognition in historical documents. In: Proceedings of the 9th IAPR international workshop on document analysis systems. Association for Computing Machinery, New York, DAS ’10, p 3–10. https://doi.org/10.1145/1815330.1815331
Fischer A, Frinken V, Fornés A et al (2011) Transcription alignment of Latin manuscripts using hidden markov models. In: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing. Association for Computing Machinery, New York, HIP ’11, pp 29–36. https://doi.org/10.1145/2037342.2037348
Gatos B, Stamatopoulos N, Louloudis G et al (2015) Grpoly-db: An old Greek polytonic document image database. In: 2015 13th international conference on document analysis and recognition (ICDAR), pp 646–650. https://doi.org/10.1109/ICDAR.2015.7333841
Grüning T, Labahn R, Diem M et al (2018) Read-bad: a new dataset and evaluation scheme for baseline detection in archival documents. In: 2018 13th IAPR international workshop on document analysis systems (DAS), pp 351–356. https://doi.org/10.1109/DAS.2018.38
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. preprint arXiv:1503.02531
Howard A, Sandler M, Chen B et al (2019) Searching for mobilenetv3. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 1314–1324. https://doi.org/10.1109/ICCV.2019.00140
Kassis M, Abdalhaleem A, Droby A et al (2017) Vml-hd: The historical Arabic documents dataset for recognition systems. In: 2017 1st international workshop on Arabic script analysis and recognition (ASAR), pp 11–14. https://doi.org/10.1109/ASAR.2017.8067751
Kiessling B, Ezra DSB, Miller MT (2019) Badam: a public dataset for baseline detection in Arabic-script manuscripts. In: Proceedings of the 5th international workshop on historical document imaging and processing. Association for Computing Machinery, New York, HIP ’19, pp 13–18. https://doi.org/10.1145/3352631.3352648
Kurar Barakat B, El-Sana J, Rabaev I (2019) The pinkas dataset. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp 732–737, https://doi.org/10.1109/ICDAR.2019.00122
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440, https://doi.org/10.1109/CVPR.2015.7298965
Mehri M, Héroux P, Mullot R et al (2017) Hba 1.0: a pixel-based annotated dataset for historical book analysis. In: Proceedings of the 4th international workshop on historical document imaging and processing. Association for Computing Machinery, New York, HIP2017, pp 107–112. https://doi.org/10.1145/3151509.3151528
Nikolaidou K, Seuret M, Mokayed H et al (2022) A survey of historical document image datasets. Int J Doc Anal Recog 25(4):305–338. https://doi.org/10.1007/s10032-022-00405-8
Article Google Scholar
Potanin M, Dimitrov D, Shonenkov A et al (2021) Digital peter: new dataset, competition and handwriting recognition methods. In: The 6th international workshop on historical document imaging and processing. Association for Computing Machinery, New York, HIP ’21, pp 43–48. https://doi.org/10.1145/3476887.3476892
Quirós L, Kallio M, Vidal E (2020) Finnish court records-sub500. A dataset of Finnish notarial records (19th Century). https://doi.org/10.5281/zenodo.3945088
Romero V, Sánchez JA (2021) The hisclima database: historical weather logs for automatic transcription and information extraction. In: 2020 25th international conference on pattern recognition (ICPR), pp 10141–10148. https://doi.org/10.1109/ICPR48806.2021.9412210
Saini R, Dobson D, Morrey J et al (2019) Icdar 2019 historical document reading challenge on large structured Chinese family records. In: 2019 international conference on document analysis and recognition (ICDAR), pp 1499–1504. https://doi.org/10.1109/ICDAR.2019.00241
Sauvola J, Pietikäinen M (2000) Adaptive document image binarization. Pattern Recognit 33(2):225–236. https://doi.org/10.1016/S0031-3203(99)00055-2
Article Google Scholar
Simistira F, Seuret M, Eichenberger N et al (2016) Diva-hisdb: a precisely annotated large dataset of challenging medieval manuscripts. In: 2016 15th international conference on frontiers in handwriting recognition (ICFHR), pp 471–476. https://doi.org/10.1109/ICFHR.2016.0093
Wüthrich M, Liwicki M, Fischer A et al (2009) Language model integration for the recognition of handwritten medieval documents. In: 2009 10th international conference on document analysis and recognition, pp 211–215. https://doi.org/10.1109/ICDAR.2009.17
Zhao H, Shi J, Qi X et al (2017) Pyramid scene parsing network. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6230–6239. https://doi.org/10.1109/CVPR.2017.660

Download references

Acknowledgements

The authors would like to acknowledge the Bibliothèque nationale de France for providing access to the digital library Gallica.

Funding

Partial financial support was received from Piano Nazionale di Ripresa e Resilienza (PNRR) DD 3277 del 30 dicembre 2021 (PNRR Missione 4, Componente 2, Investimento 1.5)—Interconnected Nord-Est Innovation Ecosystem (iNEST).

Author information

Authors and Affiliations

Department of Mathematics, Computer Science and Physics, University of Udine, Udine, Italy
Silvia Zottin, Axel De Nardin, Claudio Piciarelli & Gian Luca Foresti
Department of Engineering and Architecture, University of Trieste, Trieste, Italy
Axel De Nardin
Department of Humanities and Cultural Heritage, University of Udine, Udine, Italy
Emanuela Colombi & Filippo Pavan

Authors

Silvia Zottin
View author publications
You can also search for this author in PubMed Google Scholar
Axel De Nardin
View author publications
You can also search for this author in PubMed Google Scholar
Emanuela Colombi
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Piciarelli
View author publications
You can also search for this author in PubMed Google Scholar
Filippo Pavan
View author publications
You can also search for this author in PubMed Google Scholar
Gian Luca Foresti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Silvia Zottin.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zottin, S., De Nardin, A., Colombi, E. et al. U-DIADS-Bib: a full and few-shot pixel-precise dataset for document layout analysis of ancient manuscripts. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-023-09356-5

Download citation

Received: 03 April 2023
Accepted: 24 October 2023
Published: 16 January 2024
DOI: https://doi.org/10.1007/s00521-023-09356-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

U-DIADS-Bib: a full and few-shot pixel-precise dataset for document layout analysis of ancient manuscripts

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Microsoft COCO: Common Objects in Context

Image Generation: A Review

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

U-DIADS-Bib: a full and few-shot pixel-precise dataset for document layout analysis of ancient manuscripts

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Microsoft COCO: Common Objects in Context

Image Generation: A Review

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation