Full control over your AI model selection: introducing multi-LLM support for Claude, Gemini, Mistral and other custom LLM deployments

Darija Fjodorova
General
22/04/2026

Our platform now supports native integration with OpenAI, Anthropic's Claude, Google and Mistral, with GDPR-compliant options available across all four providers. Models can be assigned at the scenario level or to individual dialogue steps, allowing different capabilities to handle different parts of a single conversation, without engineering effort.

For organisations that require complete control over data residency and model logic, the platform also supports custom LLM deployments. This means enterprise clients can connect their own private LLM (for example, a Llama instance they host internally) directly to the Voice Agent platform, treating it as a native provider. This is particularly relevant where regulatory requirements, sector-specific terminology or government tender conditions make a shared-infrastructure model unsuitable.

Performance and Differentiation

Speed vs Accuracy

LLMs involve a tradeoff between latency and reasoning depth. Lightweight models (e.g., Gemini 2.5 Flash) respond in ~600 ms, while more complex models (e.g., Claude Opus) can take up to 2 seconds, introducing pauses.

Both speed and accuracy matter, but priorities depend on the use case. In voice, low latency is critical for natural turn-taking, while accuracy is key for correctly handling complex inputs like uncommon names or domain-specific terms.

Beyond recognition, systems also need to interpret intent and context to produce relevant responses. In chat, latency is less constrained than in voice, allowing more focus on deeper understanding and response quality over raw speed. Effective platform differentiation lies in balancing fast response times with sufficient linguistic and reasoning capability for real-world scenarios.

Flexibility

LLM selection in the platform is highly modular. Models can be assigned either at the full-scenario level or to individual building blocks (“bricks”), enabling different models to handle specific tasks, for instance, using one model for conversation and another for data extraction.

“Many companies are already used to working with specific LLM providers in their internal processes, and their employees are skilled at prompting and handling the responses. We are no longer limiting our clients to using only the models that were previously available on our platform. They can continue using what already works in their other processes without any need for retraining or transitioning to new models.

Most importantly, if a company has its own local LLM models trained on their proprietary data, this opens up significant potential for Voice Agent applications."

- Alexander Mishin, Voice Agent Product Owner

This flexibility is available through a low-code/no-code visual interface, used both by clients (via their dialogue designers) and by our internal teams. Within the same environment, flows can be configured, tested and updated without developer involvement. As a result, teams can iteratively optimise each step of the interaction, balancing speed, accuracy and compliance, while maintaining full control over the logic.

This approach removes dependency on a single provider and allows the platform to adapt to real-world complexity, where different parts of a single conversation often require different model capabilities.

GDPR Compliance

All models in the platform are explicitly tagged as GDPR-compliant or non-compliant, with a built-in filter that prioritises European-hosted models by default for European clients.

At the same time, the platform maintains flexibility: for use cases where strict data residency is not required, non-compliant models can be selectively enabled to access broader capabilities or performance characteristics. This allows teams to optimise quality, speed or specific features when regulatory constraints permit it.

LLM usage costs are covered

LLM-powered capabilities are now included as part of the standard offering. There is no additional charge for built-in model usage or “ChatGPT functionality". This ensures clients can use advanced AI features without needing to manage model-level costs.

For teams with specific requirements, custom LLM integrations remain available as an add-on project. As always, different models may be used to balance speed and response quality depending on the use case, so performance stays aligned with each scenario.

Run your first multi-LLM dialogue at no additional cost

You may also want to read: