Infrastructures for a Community-Developed Text Processing Library
DOI:
https://doi.org/10.14279/eceasst.v85.2684Keywords:
Natural language processing, Digital Humanities, High-Performance ComputingAbstract
This paper introduces MONAPipe, a modular natural language processing (NLP) pipeline developed within the German National Research Data Infrastructure (NFDI) and coordinated by the Göttingen State and University Library (SUB). Designed to support a wide range of text-based research disciplines, including literary studies, digital humanities, and (computational) linguistics, MONAPipe integrates both general-purpose NLP tools and community-developed components into a flexible, spaCy-based framework. MONAPipe supports reproducible and sustainable research through comprehensive documentation, versioned resource management via long-term repositories like GRO.data, and automated testing and deployment pipelines. Docker-based containerisation allows for scalable deployment on both local machines and high-performance computing infrastructures, with components being accessible via REST APIs. The pipeline is available as an installable Python package, complemented by example workflows and training materials. Future developments may include integration into the Jupyter4NFDI environment, online service provision via the KISSKI HPC infrastructure, as well as a graphical user interface.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Florian Barth, George Dogaru, Tillmann Dönicke, Mathias Göbel

This work is licensed under a Creative Commons Attribution 4.0 International License.
