Infrastructures for a Community-Developed Text Processing Library

Authors

  • Florian Barth Göttingen State and University Library (SUB) https://orcid.org/0000-0003-3408-7311
  • George Dogaru Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) https://orcid.org/0000-0001-9500-9638
  • Tillmann Dönicke Göttingen State and University Library (SUB)
  • Mathias Göbel Göttingen State and University Library (SUB)

DOI:

https://doi.org/10.14279/eceasst.v85.2684

Keywords:

Natural language processing, Digital Humanities, High-Performance Computing

Abstract

This paper introduces MONAPipe, a modular natural language processing (NLP) pipeline developed within the German National Research Data Infrastructure (NFDI) and coordinated by the Göttingen State and University Library (SUB). Designed to support a wide range of text-based research disciplines, including literary studies, digital humanities, and (computational) linguistics, MONAPipe integrates both general-purpose NLP tools and community-developed components into a flexible, spaCy-based framework. MONAPipe supports reproducible and sustainable research through comprehensive documentation, versioned resource management via long-term repositories like GRO.data, and automated testing and deployment pipelines. Docker-based containerisation allows for scalable deployment on both local machines and high-performance computing infrastructures, with components being accessible via REST APIs. The pipeline is available as an installable Python package, complemented by example workflows and training materials. Future developments may include integration into the Jupyter4NFDI environment, online service provision via the KISSKI HPC infrastructure, as well as a graphical user interface.

Downloads

Published

2025-12-15

How to Cite

[1]
F. Barth, G. Dogaru, T. Dönicke, and M. Göbel, “Infrastructures for a Community-Developed Text Processing Library”, ECEASST, vol. 85, Dec. 2025.