LogoXiaomi MiMo API Provider
  • Features
  • Pricing
  • Models
  • Blog
  • Docs
Model catalog

Choose the right MiMo model for each workload

Start with a lightweight models landing page today, then keep expanding each model detail page as you publish comparisons, examples, and real use cases.

Model pages ready to grow

Each card below already maps to its own SEO-friendly route so we can keep adding content without changing the architecture.

MiMo-V2.5-Pro
Available
Flagship reasoning for coding and agent workflows

The primary model for complex agent execution, coding, long-context reasoning, and tool-heavy workflows.

Context window
1M
Output window
128K

Capabilities

Text generationDeep reasoningStreamingFunction callingStructured outputWeb search
View model pageOpen docs
MiMo-V2.5
Available
Full-modal understanding with 1M context

Built for applications that need to understand text, images, video, and audio in one model.

Context window
1M
Output window
128K

Capabilities

Full-modal understandingDeep reasoningStreamingFunction callingStructured outputWeb search
View model pageOpen docs
MiMo-V2.5-TTS
Available
Expressive text-to-speech with built-in voices

Generates natural speech from assistant messages, with style control through instructions and audio tags.

Context window
8K
Output window
8K

Capabilities

Text-to-speechAudio outputSingingStyle control
View model pageOpen docs
MiMo-V2.5-TTS-VoiceClone
Available
Voice cloning from audio samples

Replicates a target voice from an audio sample and uses it for speech synthesis.

Context window
8K
Output window
8K

Capabilities

Text-to-speechVoice cloningAudio output
View model pageOpen docs
MiMo-V2.5-TTS-VoiceDesign
Available
Custom voice design from text descriptions

Creates a voice from a text description, then synthesizes speech in that custom voice.

Context window
8K
Output window
8K

Capabilities

Text-to-speechVoice designAudio output
View model pageOpen docs
MiMo-V2-Pro
Available
Reasoning and production-grade text generation

Built for agent workflows, structured output, and long-context reasoning tasks.

Context window
1M
Output window
128K

Capabilities

Text generationDeep reasoningStreamingFunction callingStructured outputWeb search
View model pageOpen docs
MiMo-V2-Omni
Available
Multimodal understanding for image, audio, and richer inputs

Designed for teams building assistants and applications that need multimodal perception.

Context window
256K
Output window
128K

Capabilities

Multimodal understandingDeep reasoningStreamingFunction callingWeb search
View model pageOpen docs
MiMo-V2-TTS
Available
Text-to-speech output for voice experiences

A focused speech model for teams adding natural voice output to products and workflows.

Context window
8K
Output window
8K

Capabilities

Text-to-speechAudio outputVoice experiences
View model pageOpen docs
LogoXiaomi MiMo API Provider

Unified access to Xiaomi MiMo models for agent, multimodal, and voice workloads.

Email
Product
  • Features
  • Pricing
  • Models
  • FAQ
Resources
  • Blog
  • Documentation
  • Changelog
  • Roadmap
  • Happy Horse
Company
  • About
  • Contact
  • Waitlist
Legal
  • Cookie Policy
  • Privacy Policy
  • Terms of Service
© 2026 Xiaomi MiMo API Provider All Rights Reserved.