LogoMiMo API Docs
LogoMiMo API Docs
HomepageWelcome

Quick Start

Pricing & Rate Limits

API Reference

Guides

Support

FAQ

Audio Understanding

Use MiMo-V2-Omni for audio understanding and transcription.

MiMo-V2-Omni supports audio understanding, enabling you to send audio data for transcription, analysis, and audio-based question answering.

Sending Audio as Base64

Audio must be sent as base64-encoded data within the message content:

from openai import OpenAI
import base64

client = OpenAI(
    api_key="your_mimo_api_key",
    base_url="https://api.mimo-v2.com/v1"
)

with open("audio.wav", "rb") as f:
    audio_data = base64.b64encode(f.read()).decode()

completion = client.chat.completions.create(
    model="mimo-v2-omni",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is being said in this audio?"},
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": audio_data,
                        "format": "wav"
                    }
                }
            ]
        }
    ]
)

print(completion.choices[0].message.content)

Supported Formats

FormatExtension
WAV.wav
MP3.mp3
FLAC.flac
OGG.ogg

Audio tokens are calculated based on the duration of the audio. Longer audio clips consume more tokens.

Table of Contents

Sending Audio as Base64
Supported Formats