Audio Understanding
Use MiMo-V2-Omni for audio understanding and transcription.
MiMo-V2-Omni supports audio understanding, enabling you to send audio data for transcription, analysis, and audio-based question answering.
Sending Audio as Base64
Audio must be sent as base64-encoded data within the message content:
from openai import OpenAI
import base64
client = OpenAI(
api_key="your_mimo_api_key",
base_url="https://api.mimo-v2.com/v1"
)
with open("audio.wav", "rb") as f:
audio_data = base64.b64encode(f.read()).decode()
completion = client.chat.completions.create(
model="mimo-v2-omni",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What is being said in this audio?"},
{
"type": "input_audio",
"input_audio": {
"data": audio_data,
"format": "wav"
}
}
]
}
]
)
print(completion.choices[0].message.content)Supported Formats
| Format | Extension |
|---|---|
| WAV | .wav |
| MP3 | .mp3 |
| FLAC | .flac |
| OGG | .ogg |
Audio tokens are calculated based on the duration of the audio. Longer audio clips consume more tokens.
MiMo API Docs