Main Projects
Large language models through the prism of corpus linguistics
PI: Jiří Milička
The aim of this project is to examine the differences between human-written texts and texts produced by large language models (LLMs) using corpus-linguistic methods such as classical stylistics and multidimensional analysis, while also addressing perceptual aspects. The project focuses on both English and Czech and the differences between them. These topics are investigated using a unique corpus of texts generated by various LLMs, which will be published and made available to the scholarly community.
Sensitivity to register variability: Combining corpus-based and experimental methods
PI: Anna Marklová
This project investigates how humans and large language models process variation in linguistic register—that is, the shifts in style, formality, and context-dependent language use. By integrating large-scale corpus analyses with controlled psycholinguistic experiments, the research explores whether readers and AI systems exhibit similar sensitivities to subtle register cues in text. The project examines comprehension, memory, and prediction patterns across different registers (e.g., formal vs. colloquial), aiming to uncover the cognitive mechanisms underlying register adaptation in humans and to evaluate how well LLMs capture these nuances. Ultimately, it bridges computational and experimental approaches to provide a deeper understanding of register awareness in both human cognition and artificial intelligence.