Smarter scenarios, more human voices: this month's platform updates

Darija Fjodorova
General
21/05/2026

Voice platforms are moving quickly toward more flexible control, lower-cost infrastructure and reduced-latency scenario building. Two areas that received major updates this month are text-to-speech providers and the scenario editor.

Human-like voices at a fraction of the cost

We added support for new Gemini text-to-speech models, expanding the platform to six voice providers, with over 25 models and more than 200 voices to choose from.

These models sound more natural and cost less than the existing premium options on the market. They produce high-quality voice output at four to five times lower cost than premium alternatives.

  • Model Selection: Clients can now choose between multiple voice models depending on their needs. Flash-Lite is the fastest and works well for short phrases. Flash is a balanced mid-tier model with strong emotional delivery for broader conversational use cases. Pro is the most advanced option, offering premium quality and deeper conversational capabilities. All three models support LLM-style prompting, allowing clients to shape delivery directly through prompts with no separate voice configuration or extra setup.
  • Dynamic Tone Control: The same Voice Agent can now shift register between use cases - "empathetic" for a service-recovery call and "formal" for a renewal reminder - set directly through prompts, with no separate voice configuration. Currently available on Gemini models.
  • Realistic Audio: The voices include natural breathing patterns and pacing, closer to human speech than to synthetic playback.

We've tested every major TTS model on the market. Until this year, high-quality human-like voices were expensive. Gemini matches the premium providers on quality but costs four to five times less.

"Ilya Ostrovskiy, Chief Product Officer at Apifonica

Alongside Gemini, we also expanded the ElevenLabs integration with support for Eleven v3. Clients now have access to six model types within the same interface, including low-latency, multilingual and highly expressive models. This helps clients balance latency against expressiveness, depending on the use case. A practical addition is the new audio sample download capability. Clients can generate and download voice samples directly from the platform using real scenario text. Tuning a complex scenario no longer requires a live test call. Clients can hear the output, adjust the dialogue and regenerate the sample in the same session.

Scenario editor improvements with search and visibility enhancements

Large voice scenarios can grow quickly in complexity, making navigation a challenge. The new Scenario Editor Search solves this by introducing a real-time, global search across the entire scenario.

  • Scenario Search: A Spotlight-style search bar now lets you locate any brick or text inside a scenario instantly, no matter how complex.
  • At-a-Glance Data Extraction: Entity extraction values – languages, CRM ticket numbers, anything the brick captures – now display directly on the canvas. No more opening each one to see what's inside.
  • Instant Audio Samples: When a client wants to hear a specific phrase, they can use the new download button to generate the audio sample and share it, if needed.
For large scenarios with hundreds of logic blocks, this becomes a critical usability upgrade.

What this means in practice

Together, these updates improve two core parts of the voice AI workflow:
  • Better voice quality with more expressive and natural speech
  • Accelerated scenario development and debugging at scale
The focus is clear: shorten the path from writing a scenario to hearing how it sounds.
See the new Gemini TTS voices in action and start building faster today

You may also want to read: