This is the first in a series of posts I’ll be writing on the Texas Czech Legacy Project
Introduction: The Unexpected Journey
If you’d told me a few years ago that I’d be spending my time listening to transcripts of interviews with elderly Texas Czechs, I probably would have laughed and said, “Yeah I WISH.” After more than two decades as a professional linguist and technical expert, my language career has been defined by academic experience, government service, and the latest advances in all types of language tools. Yet here I am, volunteering my time to help preserve a unique Czech dialect in Texas—and I can honestly say it’s the most meaningful and fun work I’ve ever done.
This journey has taken me from the world of academic research and professional translation into the heart of a grassroots cultural preservation effort. I have reconnected with a powerful family legacy that stretches back 140 years, connecting my own career to the remarkable story of my great-grandfather, Jan Štěpán. His dedication to immigrant communities in America laid the foundation for the work I’m doing today, and I’m honored to continue his mission in a new era.
My Professional Background: From Academia to Language Technology
Let me start with a bit of context. I earned my Ph.D. in Slavic Linguistics from UNC Chapel Hill, where my dissertation focused on politeness strategies in Russian, Polish, and Czech. Over the past 20+ years, I’ve worked as a translator, transcriber, language instructor, and technical leader; a lot of my professional life with the government (and since I retired!) has revolved around analysis and language technology—everything from computer-assisted translation (CAT) tools to advanced AI-driven linguistic analysis.
I’ve specialized in Russian ↔ English and Czech ↔ English translation, and I’ve consulted on cross-cultural communication projects for businesses and academic institutions. My work has always been about bridging gaps—between languages, cultures, and people. And in a way, that makes sense, given how deeply my own family history was intertwined with this mission.
Discovering the Texas Czech Legacy Project
My connection to Czech heritage runs deep. My ancestors were Iowa Czechs—Bohemians who settled in the Midwest in the late 1800s. While researching my family history, I stumbled upon the Texas Czech Legacy Project (TCLP), an academic initiative led by Dr. Lida Cope and housed at UT Austin. The project’s goal is ambitious: to document and preserve the unique Czech dialect spoken by descendants of 19th-century immigrants in Texas.
When I joined the project, an international team (including collaborators from Charles University in Prague) had already transcribed about 50 hours of oral history recordings. But there are still more than 450 hours left—an enormous task. Traditional transcription methods would have taken decades, and time is running out for the few native speakers who remain among us. That’s when I realized my background in language technology could make a real difference.
The Family Legacy Connection: Jan Štěpán’s Story
Here’s where things get personal. My great-grandfather, Jan Štěpán, emigrated from a tiny village near Prague, in Bohemia, in 1889 at the age of 20. Already a trained teacher, he dedicated 59 years to serving immigrant communities in America. In Cedar Rapids, Iowa, he taught at the first exclusively Czech school in the United States, founded in 1901. Over his career, he taught citizenship classes to more than 3,000 immigrants, speaking eight languages fluently to help anyone who needed it.
Jan’s teaching methods were so effective that a Columbia University professor once called them “the best in the United States.” He was also the editor of Cedar Rapidské Listy, a Czech newspaper, and contributed to Svornost, the first Czech daily in America. Jan and his wife (my great-grandmother Kateřina neé Šnydrová) raised their children, all born in the US, as bilingual in Czech and English. Meaning: my grandfather spoke Czech like a native. Unfortunately, he did not pass the language on to my father. However, he did himself have a lot of Czech language materials (dictionaries, grammars, books and magazines). Those materials sat in our family for decades—until my grandfather died, and I inherited them around age 8. They were the spark that ignited my own passion for Slavic linguistics.
There’s a direct line from Jan’s service in the late 1800s, through the materials he and my grandfather left behind, to my academic career and government work and now to this Texas project in 2025. It’s a legacy of service, innovation, and dedication to community that I’m proud to continue.
The Technical Challenge
One of the most exciting (and challenging) aspects of the Texas Czech Legacy Project is the language itself.
What is Texas Czech, you ask? In short:
According to the TCLP website, Texas Czech, an immigrant variety of European Czech, is a product of over a century and a half of contact between Moravian Czech and English in Texas. Texas Czech blends the archaic features of Northeastern Moravian dialects – the Lachian (Lašsko) and Valachian (Valašsko) regions of the present-day Czech Republic, aspects of Standard (Written) Czech, and features of English spoken in Texas.
And it’s worth a couple of quick notes on the Czech language in general.
- The Foreign Service Institute categorizes it as a Category IV language, meaning it takes approximately 1,100 hours to reach conversational fluency. It is considered a very difficult language to learn, even in its standard form.
- Czech has complex distinctions between spoken, written, and formal registers. Czech is often considered an example of diglossia:
- High code: Literary Czech, the formal, standard language used in official, written, and public domains.
- Low code: Common Czech, the informal, colloquial language used in everyday, private speech.
In short, when I, as a non-native speaker, started learning Czech, I learned the “high code,” standard Czech. When I went to Czech Republic for the first time and spoke to Czechs in real life, I was often told how beautifully I spoke, but in return I couldn’t understand a word being said to me! (And that was when they were speaking the standard Prague form- Spoken Prague Czech!) Imagine if in English our “High Code” was Shakespearean English, and the “Low Code” that everybody speaks is how we speak today. You learn “the lady doth protest too much, methinks” but people actually say “The harder you deny it, the less I believe you;” “Hark! Who goes there” is “Hey, who’s that?” You get the idea.
So Texas Czech is a dialect of a Moravian dialect (which is REALLY different from how they speak in Prague), with standard Czech features and Texas English in the mix as well. It’s hard for educated native speakers of Czech to make sense of. It’s also hard for non-native speakers to make sense of (as dialects often are).
In short, challenges abound. The language is extremely difficult. There are few native speakers left these days. Thankfully, we have access to hundreds of hours of recordings- but manual transcription is an extremely labor- and time- intensive activity even under the very best of circumstances (standard language usage; clear recording with no static or background noise; clear, slow speech, and so on).
How in the world, then, can we expose more of this linguistic and cultural material to the world, as soon as possible?
Technology.
This is where my background in both language technology and linguistics comes into play.
Maybe we could sprinkle some AI on this project… to scale the effort – but we need to do it in a way that is respectful of speakers of Texas Czech and their culture. The end goal is to make the invaluable knowledge and experiences (and language features!) hidden in the recordings available in a meaningful way to anyone who is interested: friends, relatives, historians, linguists, genealogists…
This is a teaser. More to come soon on this, in a future post dedicated to this unique language preservation issue and how to use technology to make the most out of limited resources.
Why This Matters: Heritage, Community, and the Future
Why does this work matter? Because these stories are our stories—the stories of Czech families who came to America, built new lives, and preserved their language and culture against the odds. Language preservation isn’t just about words; it’s about identity, memory, and belonging. Technology can help, but it takes community support and human expertise to make it meaningful.
The Broader Implications: A Model for Heritage Language Preservation
The Texas Czech Legacy Project is more than just a local initiative. It’s a model for heritage language preservation across American immigrant communities. AI technology has enormous potential for cultural preservation, but it must be paired with academic rigor and community involvement. Our work could be replicated for other heritage dialects, creating archives and corpora that support both linguistic research and cultural continuity.
Call to Action: How You Can Help
If you’re reading this and feel inspired, there are several ways you can help:
- Share this story within your academic, professional, or Czech-American networks. We are also on LinkedIn!
- Connect us with potential funders who value scholarly cultural preservation.
- Collaborate with other heritage language / language preservation projects.
- If you have a family story or connection to Czech heritage, reach out—I’d love to hear from you.
Conclusion: Honoring the Past, Serving the Future
As I reflect on this 140-year family tradition of service, I’m struck by the parallels between Jan’s tools and methods and the modern technology I use today. The mission is the same: to serve, to preserve, and to connect. Continuing Jan’s legacy through innovation and community engagement is the most fulfilling chapter of my career so far. Together, we can ensure that the voices of Texas-Czech families—and the heritage they represent—are preserved for generations to come.
Watch this space: In my next post, I will be discussing the team behind the Texas Czech Legacy project; what’s been done to date, how, and by whom; its goals and opportunities to engage and support. In future posts, you can expect more on language technology; the actual “experiments” we undertook to see what would work; how technology might help other language and cultural preservation efforts, and even more! Stay tuned!
