Video Understanding
Use MiMo-V2-Omni for video understanding and analysis.
MiMo-V2-Omni supports video understanding, allowing you to send video content for analysis, description, and visual question answering. Videos can be provided via URL or base64-encoded data.
Using Video URL
from openai import OpenAI
client = OpenAI(
api_key="your_mimo_api_key",
base_url="https://api.mimo-v2.com/v1"
)
completion = client.chat.completions.create(
model="mimo-v2-omni",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What is happening in this video?"},
{
"type": "video_url",
"video_url": {"url": "https://example.com/video.mp4"}
}
]
}
]
)
print(completion.choices[0].message.content)Using Base64 Encoded Video
from openai import OpenAI
import base64
client = OpenAI(
api_key="your_mimo_api_key",
base_url="https://api.mimo-v2.com/v1"
)
with open("video.mp4", "rb") as f:
video_data = base64.b64encode(f.read()).decode()
completion = client.chat.completions.create(
model="mimo-v2-omni",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe what happens in this video"},
{
"type": "video_url",
"video_url": {"url": f"data:video/mp4;base64,{video_data}"}
}
]
}
]
)
print(completion.choices[0].message.content)Token Consumption
Video content consumes significantly more tokens than images or text due to the frame-by-frame analysis. Token usage depends on:
- Video duration: Longer videos consume more tokens.
- Resolution: Higher resolution videos are sampled at more detail.
- Frame rate: The model samples frames at regular intervals from the video.
Video content can consume a large number of tokens. Consider using shorter clips or lower resolutions to manage costs. For long videos, consider extracting key frames as images instead.
MiMo API Docs