Next-gen AI call centres: fluent Estonian, no training – magic or tech?

Agnieszka Wiącek
Use cases
30/07/2025

Magic or tech? That’s the question that inevitably arises when a company’s CEO suddenly starts speaking fluent Estonian – with accurate pronunciation, natural intonation, and absolute confidence – without knowing a single word of the language. What may sound like magic is, in fact, the result of a carefully engineered solution to one of the most overlooked challenges in AI: support for low-resource languages (LRLs).

Low-resource languages are those for which limited digital data exists to train generative AI systems, machine translation engines, speech synthesis tools, and other natural language processing (NLP) technologies. In contrast to high-resource languages such as English, Chinese, French, Spanish, or Japanese, LRLs suffer from a shortage of corpora, annotated datasets, audio material, and digital infrastructure.

It’s important to note that the “low-resource” status of a language is not determined by the number of its speakers. For example, Swahili is spoken by tens of millions, yet it remains under-resourced due to a lack of digitised linguistic assets. Conversely, Welsh, with relatively fewer native speakers, benefits from a strong digital infrastructure. The real challenge lies not in geography or demography, but in the availability of structured, machine-readable data – shifting the conversation from the “size” of a language to the state of its digital ecosystem.

Even in Europe – despite formal support for multilingualism – many Baltic (Lithuanian, Latvian, Estonian) and Balkan languages (Albanian, Macedonian, Serbian, Montenegrin, Slovenian, among others) remain underrepresented in language models compared to English, German or French. These languages may be technically included in multilingual LLMs, but are often covered only superficially, lacking depth, quality, and quantity of data. This makes it extremely difficult for such languages to “compete for attention” in AI systems.

However, governments in several countries are taking structured steps to address this gap:

  • In Latvia, national-scale initiatives are underway – from the AII innovation programme Antenna to a dedicated Artificial Intelligence Centre – all aimed at building up language technologies, digital infrastructure, and multilingual solutions.
  • In Estonia, substantial work has gone into releasing open linguistic corpora: large volumes of language data have been shared with Meta to help integrate Estonian into global AI models. The government is also developing the AI Leap 2025 educational programme and actively supporting digital linguistics at a national level.
  • In Serbia, efforts include an official AI development strategy, the COMtext.SR platform, a national AI Institute, and coordinated governmental efforts to develop NLP tools for the Serbian language.

These developments demonstrate that support for LRLs is no longer just an academic concept – it’s becoming a pillar of national policy and long-term digital investment. And it’s not just about cultural preservation – it’s about equitable access to technology, digital participation, and economic opportunity in the global marketplace.

This trend is particularly relevant in light of projections that the global LLM market will grow from $8 billion in 2025 to $84 billion by 2033. Multilingual capability has already been identified as one of the primary drivers of that growth. Companies that begin working with underrepresented languages today will hold a significant competitive advantage tomorrow – through technological agility, localised products, and extended audience reach.

Apifonica is one such company. In this article, we’ll demonstrate how we approach the LRL challenge not in theory but in practice – with real-world, scalable solutions that bring value to businesses today. How we generate realistic speech in languages with minimal digital resources. How our CEO can speak Estonian convincingly without ever learning it. And how it’s not magic – but applied engineering.

Let’s begin – with call centres.

AI in English, dictaphone in Estonian?

Despite the impressive progress in generative AI, its benefits are distributed highly unevenly. While English-speaking contact centres are rapidly replacing live agents with intelligent assistants, automating customer interactions and rolling out conversational interfaces across every channel, countries with low-resource languages still rely on “traditional values”: human agents, outdated IVR systems, manual data entry, and long hold times.

These “values” are not a matter of strategic choice – they’re a direct consequence of technological scarcity. The lack of language data, weak representation in large language models (LLMs), and absence of plug-and-play solutions for Latvian, Estonian or Lithuanian create significant barriers. 

The result? Contact centres are overwhelmed, staff are burning out, service quality drops, and customer trust erodes. For the end user, it means one thing: if you don’t speak English, prepare to wait, repeat yourself, and hope you’re understood.

Meanwhile, in English-speaking environments, AI agents are not only operational – they’re actively selling, advising, managing enquiries, and continuing to generate value even after business hours. This technology gap isn’t just an issue of fairness – it’s a strategic failure. Entire markets remain untapped, and customers underserved, simply because their language is underrepresented.

So what are we waiting for?

Businesses today are facing pressure not only from customers but also from regulators. Language inclusivity and fair access to AI are becoming core elements of ESG strategies, national digital agendas, and corporate reputations. Major European initiatives – such as Digital Europe, European Language Grid and European Language Equality – are already working to promote linguistic equity in the digital sphere. These programmes help lay the regulatory and technological foundations for multilingual AI, but they are often rooted in academic frameworks and follow long research timelines.

Apifonica is moving ahead of the curve: we are integrating low-resource languages into real business solutions now. In some projects, our AI-powered tools cover up to 60% of customer service interactions. Through an engineering-driven approach, a flexible tech stack, and smart implementation techniques, we transform cutting-edge AI into working tools – from local-language voice agents for contact centres to scenario platforms and conversational interfaces.

Apifonica means business: why we choose engineering over linguistics

When it comes to supporting low-resource languages (LRLs), many still picture university labs, academic research centres, linguists, and lengthy grammatical treatises. But Apifonica isn’t an academic institution. We’re an engineering company solving real business problems through technology. Our mission isn’t to “study language” – it’s to make it work in real-world communication: in calls, chats, dialogues, and voice interfaces.

Building scalable, naturally sounding, multilingual voice solutions requires more than just a language model. It demands a robust end-to-end architecture – from text input to audio output. At Apifonica, we use a hybrid approach that combines the best of technologies, commercial tools, and our own engineering. This enables us to build systems that work reliably in low-resource languages while delivering high-quality interaction and client-specific flexibility.

Operating in the niche of low-resource languages gives Apifonica a strategic edge. Unlike the crowded space of “big languages” dominated by tech giants, our work focuses on environments that require deep customisation, adaptation, and a practical understanding of real use cases. Here, we’re not “just another platform” – we’re one of the few providers that consistently deliver results in linguistically complex settings.

Yes, in some cases our solutions may come at a bit higher cost – supporting LRLs demands more tailored configuration, fine-tuning, and the creation or retraining of language components. But this is a conscious engineering investment that pays off in reliability, quality of interaction, and real business impact.

Unlike general-purpose platforms, we don’t build language capabilities “in theory” – we develop functional solutions tied to specific business goals. This allows us to achieve high-quality speech generation and dialogue management within clearly defined contexts – whether it’s HR, fintech, logistics, or customer support.

To achieve this, we’ve developed our own proprietary component: Hermes – a part of Apifonica’s communication architecture. Hermes is a voice engine built by Apifonica to create natural human speech. It supports fluent, multilingual dialogues using advanced models for speech recognition and intent detection.

Internal benchmarks and comparative analyses confirm that Hermes outperforms Google Dialogflow – particularly in managing spoken interactions, with superior recognition accuracy, adaptability, and conversation context retention.

Hermes is a clear example of our philosophy: starting from the problem, not the model. It enables us not just to “speak” a language, but to operationalise it in applied scenarios – regardless of how underrepresented that language may be in mainstream LLMs.

Or, for instance, take the recent integration of an advanced GPT-based three-level intent recognition feature? Activated with a single click, it significantly improves understanding of user intent – especially in unpredictable or informal dialogue. GPT enables the system to generalise meaning without needing to predefine every possible phrase, dramatically speeding up scenario deployment and making our voice bots more flexible and adaptive, even in data-scarce linguistic environments.

Case study: “I don’t speak Estonian – but it sure sounds like I do”

Listen to the voice of our CEO from Latvia. He doesn’t speak a word of Estonian. And yet – he sounds so confident, clear, and natural when talking about our business, you’d almost believe it’s his native language. You might even feel compelled to sign up for every Apifonica service right on the spot!

Of course, it’s an AI-generated voiceover. But it is his real voice – his intonation, rhythm, and speaking style – just in a language he doesn’t know at all. Estonian, in this case – a language that remains severely underrepresented in most language models and digital infrastructures.

Still, Apifonica already offers a ready-to-use solution that turns this “impossibility” into a working, practical business tool. And it’s not a massive, custom-built system – it’s compact, accessible, and affordable. For businesses, that means no need to build costly internal infrastructure, design complex pipelines, curate datasets, or hire dedicated teams of language model specialists.

Let’s be honest: a bank may be an expert in finance and cybersecurity – but how much does it really know about language modelling, speech synthesis, or generative AI? And yet, we’re increasingly seeing major enterprises try to build in-house R&D units focused on developing their own language models for internal use – especially for contact centres. This often stretches far beyond their core competencies and leads to spiralling costs with questionable returns.

Apifonica offers the exact opposite: a way to save on those efforts – by delivering a modular, flexible solution that integrates quickly into an organisation’s existing infrastructure, with no deep technical setup required. It’s a case where engineering maturity and domain specialisation don’t just “show off” AI capabilities – they solve real business problems, here and now.

This is especially valuable for low resource languages, where digital underdevelopment makes DIY solutions disproportionately expensive – with minimal impact. 

With Apifonica, the impact becomes real, measurable, and immediate:

What Apifonica teaches us about rethinking AI strategy to translate speech?

Don’t rely on generic LLMs – adapt and customise. Time is ticking while business stands still. The limitations of general-purpose multilingual LLMs for low resource languages (LRLs) are well documented: inconsistent performance, data imbalance, and issues like lexical hallucinations (i.e. gibberish). Relying solely on these models leads to suboptimal results and missed opportunities. A successful strategy requires adaptation and customisation – fine-tuning LLMs on domain-specific and LRL-specific datasets, and building language-sensitive architectures with applied tools.

Apifonica’s three-layer intent recognition system, which integrates GPT while using proprietary layers for interpretation, is a prime example of such an adaptive, layered approach.

You don’t need one model – you need an engineering mindset.

Low resource languages demand a pragmatic, task-driven engineering mindset – not a one-size-fits-all model. This is our position not as a research institution, but as a commercial AI company. It means mapping business processes, identifying pain points, and tailoring solutions – whether through intent-based voice bots, integrated communications platforms, tuned LLMs, or hybrid customisation layers – to achieve measurable business impact.

Apifonica’s focus on practicality, customisation, and scalability reflects this principle. We’re a working example of how an engineering-first, commercially grounded approach to LRLs can deliver real results. By prioritising real-world business outcomes over linguistic theory, combining open-source and proprietary components, and adapting our stack to complex language scenarios, we provide scalable and effective AI communication solutions across diverse linguistic markets.

Let’s talk about your AI project – so your customers can leave this nightmare behind. Book a 30-minute call with one of our experts. No pressure, no commitments – just clear answers to your questions and a chance to hear the tech (or magic?) in action.

You may also want to read: