Skip to main content
Log in

U-DIADS-Bib: a full and few-shot pixel-precise dataset for document layout analysis of ancient manuscripts

  • Special Issue on Visual Pattern Recognition and Extraction for Cultural Heritage
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Document Layout Analysis, which is the task of identifying different semantic regions inside of a document page, is a subject of great interest for both computer scientists and humanities scholars as it represents a fundamental step towards further analysis tasks for the former and a powerful tool to improve and facilitate the study of the documents for the latter. However, many of the works currently present in the literature, especially when it comes to the available datasets, fail to meet the needs of both worlds and, in particular, tend to lean towards the needs and common practices of the computer science side, leading to resources that are not representative of the humanities real needs. For this reason, the present paper introduces U-DIADS-Bib, a novel, pixel-precise, non-overlapping and noiseless document layout analysis dataset developed in close collaboration between specialists in the fields of computer vision and humanities. Furthermore, we propose a novel, computer-aided, segmentation pipeline in order to alleviate the burden represented by the time-consuming process of manual annotation, necessary for the generation of the ground truth segmentation maps. Finally, we present a standardized few-shot version of the dataset (U-DIADS-BibFS), with the aim of encouraging the development of models and solutions able to address this task with as few samples as possible, which would allow for more effective use in a real-world scenario, where collecting a large number of segmentations is not always feasible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The datasets generated and analysed during the current study are available in the U-DIADS-Bib repository.Footnote 6

Notes

  1. Source https://gallica.bnf.fr.

  2. https://gallica.bnf.fr/ark:/12148/btv1b8452767n.

  3. https://gallica.bnf.fr/ark:/12148/btv1b84429190.

  4. https://gallica.bnf.fr/ark:/12148/btv1b85144288.

  5. https://gallica.bnf.fr/ark:/12148/btv1b10527102b.

  6. https://ai4ch.uniud.it/udiadsbib/.

References

  1. Adam K, Baig A, Al-Maadeed S et al (2018) KERTAS: dataset for automatic dating of ancient Arabic manuscripts. Int J Doc Anal Recognit 21(4):283–290. https://doi.org/10.1007/s10032-018-0312-3

    Article  Google Scholar 

  2. Alaei A, Nagabhushan P, Pal U (2011) A new dataset of Persian handwritten documents and its segmentation. In: 2011 7th Iranian conference on machine vision and image processing, pp 1–5. https://doi.org/10.1109/IranianMVIP.2011.6121553

  3. Amelio A, Bonifazi G, Corradini E et al (2022) Defining a deep neural network ensemble for identifying fabric colors. Appl Soft Comput 130(109):687. https://doi.org/10.1016/j.asoc.2022.109687

    Article  Google Scholar 

  4. Amelio A, Bonifazi G, Cauteruccio F et al (2023) Representation and compression of residual neural networks through a multilayer network based approach. Expert Syst Appl 215(119):391. https://doi.org/10.1016/j.eswa.2022.119391

    Article  Google Scholar 

  5. Boillet M, Bonhomme ML, Stutzmann D et al (2019) Horae: an annotated dataset of books of hours. In: Proceedings of the 5th international workshop on historical document imaging and processing. Association for computing machinery, New York, HIP ’19, pp 7–12. https://doi.org/10.1145/3352631.3352633

  6. Bukhari SS, Breuel TM, Asi A et al (2012) Layout analysis for Arabic historical document images using machine learning. In: 2012 international conference on frontiers in handwriting recognition, pp 639–644. https://doi.org/10.1109/ICFHR.2012.227

  7. Chen L, Papandreou G, Schroff F et al (2017) Rethinking atrous convolution for semantic image segmentation. CoRR arXiv:abs/1706.05587

  8. Chen LC, Zhu Y, Papandreou G et al (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari V, Hebert M, Sminchisescu C et al (eds) Computer vision—ECCV 2018. Springer, Cham, pp 833–851

  9. Cilia ND, De Stefano C, Fontanella F et al (2021) Papyrow: a dataset of row images from ancient Greek papyri for writers identification. In: Del Bimbo A, Cucchiara R, Sclaroff S et al (eds) Pattern recognition. Springer, Cham, ICPR International Workshops and Challenges, pp 223–234

  10. Clausner C, Antonacopoulos A, Mcgregor N et al (2018) Icfhr 2018 competition on recognition of historical Arabic scientific manuscripts—rasm2018. In: 2018 16th international conference on frontiers in handwriting recognition (ICFHR), pp 471–476. https://doi.org/10.1109/ICFHR-2018.2018.00088

  11. De Nardin A, Zottin S, Paier M et al (2023a) Efficient few-shot learning for pixel-precise handwritten document layout analysis. In: 2023 IEEE/CVF winter conference on applications of computer vision (WACV), pp 3669–3677. https://doi.org/10.1109/WACV56688.2023.00367

  12. De Nardin A, Zottin S, Piciarelli C, et al (2023) Few-shot pixel-precise document layout segmentation via dynamic instance generation and local thresholding. International Journal of Neural Systems 33(10):2350,052. https://doi.org/10.1142/S0129065723500521

  13. Dolfing HJ, Bellegarda J, Chorowski J et al (2020) The “scribblelens” Dutch historical handwriting corpus. In: 2020 17th international conference on frontiers in handwriting recognition (ICFHR), pp 67–72. https://doi.org/10.1109/ICFHR2020.2020.00023

  14. Fiel S, Kleber F, Diem M et al (2017) Icdar2017 competition on historical document writer identification (historical-wi). In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), pp 1377–1382. https://doi.org/10.1109/ICDAR.2017.225

  15. Fischer A, Indermühle E, Bunke H et al (2010) Ground truth creation for handwriting recognition in historical documents. In: Proceedings of the 9th IAPR international workshop on document analysis systems. Association for Computing Machinery, New York, DAS ’10, p 3–10. https://doi.org/10.1145/1815330.1815331

  16. Fischer A, Frinken V, Fornés A et al (2011) Transcription alignment of Latin manuscripts using hidden markov models. In: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing. Association for Computing Machinery, New York, HIP ’11, pp 29–36. https://doi.org/10.1145/2037342.2037348

  17. Gatos B, Stamatopoulos N, Louloudis G et al (2015) Grpoly-db: An old Greek polytonic document image database. In: 2015 13th international conference on document analysis and recognition (ICDAR), pp 646–650. https://doi.org/10.1109/ICDAR.2015.7333841

  18. Grüning T, Labahn R, Diem M et al (2018) Read-bad: a new dataset and evaluation scheme for baseline detection in archival documents. In: 2018 13th IAPR international workshop on document analysis systems (DAS), pp 351–356. https://doi.org/10.1109/DAS.2018.38

  19. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. preprint arXiv:1503.02531

  20. Howard A, Sandler M, Chen B et al (2019) Searching for mobilenetv3. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 1314–1324. https://doi.org/10.1109/ICCV.2019.00140

  21. Kassis M, Abdalhaleem A, Droby A et al (2017) Vml-hd: The historical Arabic documents dataset for recognition systems. In: 2017 1st international workshop on Arabic script analysis and recognition (ASAR), pp 11–14. https://doi.org/10.1109/ASAR.2017.8067751

  22. Kiessling B, Ezra DSB, Miller MT (2019) Badam: a public dataset for baseline detection in Arabic-script manuscripts. In: Proceedings of the 5th international workshop on historical document imaging and processing. Association for Computing Machinery, New York, HIP ’19, pp 13–18. https://doi.org/10.1145/3352631.3352648

  23. Kurar Barakat B, El-Sana J, Rabaev I (2019) The pinkas dataset. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp 732–737, https://doi.org/10.1109/ICDAR.2019.00122

  24. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440, https://doi.org/10.1109/CVPR.2015.7298965

  25. Mehri M, Héroux P, Mullot R et al (2017) Hba 1.0: a pixel-based annotated dataset for historical book analysis. In: Proceedings of the 4th international workshop on historical document imaging and processing. Association for Computing Machinery, New York, HIP2017, pp 107–112. https://doi.org/10.1145/3151509.3151528

  26. Nikolaidou K, Seuret M, Mokayed H et al (2022) A survey of historical document image datasets. Int J Doc Anal Recog 25(4):305–338. https://doi.org/10.1007/s10032-022-00405-8

    Article  Google Scholar 

  27. Potanin M, Dimitrov D, Shonenkov A et al (2021) Digital peter: new dataset, competition and handwriting recognition methods. In: The 6th international workshop on historical document imaging and processing. Association for Computing Machinery, New York, HIP ’21, pp 43–48. https://doi.org/10.1145/3476887.3476892

  28. Quirós L, Kallio M, Vidal E (2020) Finnish court records-sub500. A dataset of Finnish notarial records (19th Century). https://doi.org/10.5281/zenodo.3945088

  29. Romero V, Sánchez JA (2021) The hisclima database: historical weather logs for automatic transcription and information extraction. In: 2020 25th international conference on pattern recognition (ICPR), pp 10141–10148. https://doi.org/10.1109/ICPR48806.2021.9412210

  30. Saini R, Dobson D, Morrey J et al (2019) Icdar 2019 historical document reading challenge on large structured Chinese family records. In: 2019 international conference on document analysis and recognition (ICDAR), pp 1499–1504. https://doi.org/10.1109/ICDAR.2019.00241

  31. Sauvola J, Pietikäinen M (2000) Adaptive document image binarization. Pattern Recognit 33(2):225–236. https://doi.org/10.1016/S0031-3203(99)00055-2

    Article  Google Scholar 

  32. Simistira F, Seuret M, Eichenberger N et al (2016) Diva-hisdb: a precisely annotated large dataset of challenging medieval manuscripts. In: 2016 15th international conference on frontiers in handwriting recognition (ICFHR), pp 471–476. https://doi.org/10.1109/ICFHR.2016.0093

  33. Wüthrich M, Liwicki M, Fischer A et al (2009) Language model integration for the recognition of handwritten medieval documents. In: 2009 10th international conference on document analysis and recognition, pp 211–215. https://doi.org/10.1109/ICDAR.2009.17

  34. Zhao H, Shi J, Qi X et al (2017) Pyramid scene parsing network. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6230–6239. https://doi.org/10.1109/CVPR.2017.660

Download references

Acknowledgements

The authors would like to acknowledge the Bibliothèque nationale de France for providing access to the digital library Gallica.

Funding

Partial financial support was received from Piano Nazionale di Ripresa e Resilienza (PNRR) DD 3277 del 30 dicembre 2021 (PNRR Missione 4, Componente 2, Investimento 1.5)—Interconnected Nord-Est Innovation Ecosystem (iNEST).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Silvia Zottin.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zottin, S., De Nardin, A., Colombi, E. et al. U-DIADS-Bib: a full and few-shot pixel-precise dataset for document layout analysis of ancient manuscripts. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-023-09356-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00521-023-09356-5

Keywords

Navigation