Video Understanding

MiMo-V2-Omni supports video understanding, allowing you to send video content for analysis, description, and visual question answering. Videos can be provided via URL or base64-encoded data.

Using Video URL

from openai import OpenAI

client = OpenAI(
    api_key="your_mimo_api_key",
    base_url="https://api.mimo-v2.com/v1"
)

completion = client.chat.completions.create(
    model="mimo-v2-omni",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is happening in this video?"},
                {
                    "type": "video_url",
                    "video_url": {"url": "https://example.com/video.mp4"}
                }
            ]
        }
    ]
)

print(completion.choices[0].message.content)

Using Base64 Encoded Video

from openai import OpenAI
import base64

client = OpenAI(
    api_key="your_mimo_api_key",
    base_url="https://api.mimo-v2.com/v1"
)

with open("video.mp4", "rb") as f:
    video_data = base64.b64encode(f.read()).decode()

completion = client.chat.completions.create(
    model="mimo-v2-omni",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe what happens in this video"},
                {
                    "type": "video_url",
                    "video_url": {"url": f"data:video/mp4;base64,{video_data}"}
                }
            ]
        }
    ]
)

print(completion.choices[0].message.content)

Token Consumption

Video content consumes significantly more tokens than images or text due to the frame-by-frame analysis. Token usage depends on:

Video duration: Longer videos consume more tokens.
Resolution: Higher resolution videos are sampled at more detail.
Frame rate: The model samples frames at regular intervals from the video.

Video content can consume a large number of tokens. Consider using shorter clips or lower resolutions to manage costs. For long videos, consider extracting key frames as images instead.

Video Understanding

Use MiMo-V2-Omni for video understanding and analysis.

MiMo-V2-Omni supports video understanding, allowing you to send video content for analysis, description, and visual question answering. Videos can be provided via URL or base64-encoded data.

Using Video URL

from openai import OpenAI

client = OpenAI(
    api_key="your_mimo_api_key",
    base_url="https://api.mimo-v2.com/v1"
)

completion = client.chat.completions.create(
    model="mimo-v2-omni",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is happening in this video?"},
                {
                    "type": "video_url",
                    "video_url": {"url": "https://example.com/video.mp4"}
                }
            ]
        }
    ]
)

print(completion.choices[0].message.content)

Using Base64 Encoded Video

from openai import OpenAI
import base64

client = OpenAI(
    api_key="your_mimo_api_key",
    base_url="https://api.mimo-v2.com/v1"
)

with open("video.mp4", "rb") as f:
    video_data = base64.b64encode(f.read()).decode()

completion = client.chat.completions.create(
    model="mimo-v2-omni",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe what happens in this video"},
                {
                    "type": "video_url",
                    "video_url": {"url": f"data:video/mp4;base64,{video_data}"}
                }
            ]
        }
    ]
)

print(completion.choices[0].message.content)

Token Consumption

Video content consumes significantly more tokens than images or text due to the frame-by-frame analysis. Token usage depends on:

Video duration: Longer videos consume more tokens.
Resolution: Higher resolution videos are sampled at more detail.
Frame rate: The model samples frames at regular intervals from the video.

Video content can consume a large number of tokens. Consider using shorter clips or lower resolutions to manage costs. For long videos, consider extracting key frames as images instead.

Using Video URL

Using Base64 Encoded Video

Token Consumption

Table of Contents

Video Understanding

Using Video URL

Using Base64 Encoded Video

Token Consumption

Table of Contents