In the carnival heart of Mainz, Germany, a centuries-old dialect is exposing a blind spot in artificial intelligence that few expected: the machines we've trained to understand language are almost entirely deaf to Meenzerisch, the lyrical regional tongue that has shaped the cultural identity of this Rhine city for generations.

Researchers at Johannes Gutenberg University Mainz have discovered something both humbling and urgent. When they asked large language models to explain what Meenzerisch words mean, the AI systems got it right only 4.24% of the time. When asked to generate a dialect word from a standard German definition, accuracy plummeted to 0.56%—barely better than guessing blindly. This is not a minor technical glitch. It reveals how quickly smaller language varieties vanish from the digital world, becoming invisible to the technologies that increasingly mediate how we understand and preserve human culture.

The team, led by Minh Duc Bui and Professor Katharina von der Wense of JGU's Institute of Computer Science, took a practical first step: they created the first machine-readable lexicon of Meenzerisch, digitizing a dialect dictionary originally published in 1966 and producing a dataset of 2,351 words with their definitions in standard German. With that resource in hand, they systematically tested several open-source language models of different sizes, examining both comprehension and generation. The findings, presented at the 2026 Language Resources and Evaluation Conference in Palma de Mallorca, are unsparing. Even the strongest models performed poorly. Additional scaffolding—providing the AI with example sentences or automatically derived grammatical rules—made almost no difference. Accuracy remained stuck below 10% across every approach.

What makes this research matter extends far beyond technical curiosity. Meenzerisch remains a living part of Mainz's cultural landscape, known across Germany through the satirical carnival speeches of the city's famous Fastnacht tradition. Yet like hundreds of regional dialects across Europe and the world, it is slowly disappearing from everyday use. The digital age, which promised to democratize and preserve human knowledge, is paradoxically erasing linguistic diversity—not through deliberate suppression, but through neglect. Language models are trained on vast quantities of text, but dialects live primarily in speech, leaving minimal written records for the algorithms to learn from.

"Regional dialects have so far received very little attention in digital language research," said Bui. "Yet language technologies could help make them more visible and preserve them in the long run." The research points to a path forward, though it requires deliberate effort. Building targeted datasets, developing new training approaches, and designing AI systems that can process culturally and regionally significant varieties rather than only standard languages—these are not peripheral tasks but urgent work in an era when digital systems increasingly shape what gets remembered and what gets forgotten.

The irony is sharp: the very technology that could document and revitalize Meenzerisch and dialects like it remains, for now, fundamentally unable to understand them. But this study is a beginning. By naming the gap, measuring it precisely, and showing what tools are needed to close it, the researchers have created a template for preserving not just one dialect, but the linguistic mosaic that connects us to place, memory, and identity.