Linguistic corpus research software at the Leibniz-Institute for the German Language (IDS)

Authors

  • Nils Diewald Lebniz-Institute for the German Language
  • Franck Bodmer Leibniz-Institut für Deutsche Sprache
  • Peter M. Fischer Leibniz-Institut für Deutsche Sprache
  • Elena Frick Leibniz-Institut für Deutsche Sprache
  • Marc Kupietz Leibniz-Institut für Deutsche Sprache
  • Mark-Christoph Müller Leibniz-Institut für Deutsche Sprache
  • Helge Stallkamp Leibniz-Institut für Deutsche Sprache
  • Uyen-Nhu Tran Leibniz-Institut für Deutsche Sprache

DOI:

https://doi.org/10.14279/eceasst.v85.2692

Keywords:

Corpus Linguistics, Language Resources, User Interface Design, Legacy Software

Abstract

Empirical linguistic research requires access to richly annotated and metadata-enhanced language corpora. This paper presents the ongoing development of corpus search and analysis platforms at the Leibniz-Institute for the German Language (IDS), which provide access to DeReKo, the world’s largest collection of contemporary written German corpora, and the Archive for Spoken German (AGD) among others. We describe our platforms, especially focusing on improving, extending and evaluating their user interfaces. Challenges addressed include legal constraints, handling large and heterogeneous datasets, ensuring reproducibility, and especially meeting accessibility and usability standards for a diverse scientific audience from the humanities. This work contributes to the broader effort of advancing research infrastructure in linguistics and offers insights into sustainable and user-friendly corpus technology design.

Downloads

Published

2025-12-15

How to Cite

[1]
N. Diewald, “Linguistic corpus research software at the Leibniz-Institute for the German Language (IDS)”, ECEASST, vol. 85, Dec. 2025.