Model Parameters

Recommended Parameters

The following table lists the recommended parameter values for each MiMo-V2 model:

Parameter	Description	MiMo-V2-Pro	MiMo-V2-Omni	MiMo-V2-Flash
`temperature`	Controls randomness. Higher = more creative	1.0	1.0	1.0
`top_p`	Nucleus sampling threshold	0.95	0.95	0.95
`max_completion_tokens`	Maximum tokens in the response	1024-128000	1024-128000	1024-64000
`frequency_penalty`	Penalizes repeated tokens	0	0	0
`presence_penalty`	Penalizes tokens already present	0	0	0
`stream`	Enable streaming output	true/false	true/false	true/false
`stop`	Stop sequences	null	null	null

Controls the randomness of the model's output. A value of 0 makes the output nearly deterministic, while higher values increase creativity and variation. The recommended default is 1.0 for all MiMo-V2 models.

Range: 0.0 to 2.0
Default: 1.0
Tip: Use lower values (e.g., 0.2) for factual or deterministic tasks. Use higher values (e.g., 1.0-1.5) for creative writing or brainstorming.

`top_p`

Also known as nucleus sampling. The model considers tokens whose cumulative probability mass reaches top_p. A value of 0.95 means the model samples from the smallest set of tokens whose cumulative probability is at least 95%.

Range: 0.0 to 1.0
Default: 0.95
Tip: Generally, adjust either temperature or top_p, but not both simultaneously.

`max_completion_tokens`

The maximum number of tokens the model can generate in a single response. This includes both the visible output and any internal reasoning tokens when thinking mode is enabled.

Range: Varies by model (see table above)
Default: 1024
Tip: Set this high enough to accommodate your expected output length. For complex reasoning tasks, consider using higher values to allow the model sufficient space for thinking.

`frequency_penalty`

Penalizes tokens based on how frequently they appear in the generated text so far. Positive values reduce repetition.

Range: -2.0 to 2.0
Default: 0
Tip: Use small positive values (e.g., 0.1-0.5) to reduce repetitive phrasing in longer outputs.

`presence_penalty`

Penalizes tokens based on whether they have appeared in the generated text at all, regardless of frequency. Positive values encourage the model to introduce new topics.

Range: -2.0 to 2.0
Default: 0
Tip: Use small positive values to encourage more diverse outputs and topic exploration.

`stream`

When set to true, the model sends partial responses as server-sent events (SSE) as they are generated. This provides a better user experience for interactive applications by showing output incrementally.

Values: true or false
Default: false
Tip: Enable streaming for chat interfaces and real-time applications. Disable it for batch processing or when you need the complete response at once.

`stop`

A list of sequences where the model will stop generating further tokens. When the model encounters any of the specified stop sequences, it ends the response.

Type: null or array of strings (up to 4 sequences)
Default: null
Tip: Use stop sequences to control output format, such as stopping at a specific delimiter or marker.

Recommended Parameters

The following table lists the recommended parameter values for each MiMo-V2 model:

Parameter	Description	MiMo-V2-Pro	MiMo-V2-Omni	MiMo-V2-Flash
`temperature`	Controls randomness. Higher = more creative	1.0	1.0	1.0
`top_p`	Nucleus sampling threshold	0.95	0.95	0.95
`max_completion_tokens`	Maximum tokens in the response	1024-128000	1024-128000	1024-64000
`frequency_penalty`	Penalizes repeated tokens	0	0	0
`presence_penalty`	Penalizes tokens already present	0	0	0
`stream`	Enable streaming output	true/false	true/false	true/false
`stop`	Stop sequences	null	null	null

Parameter Details

`temperature`

Range: 0.0 to 2.0
Default: 1.0
Tip: Use lower values (e.g., 0.2) for factual or deterministic tasks. Use higher values (e.g., 1.0-1.5) for creative writing or brainstorming.

`top_p`

Range: 0.0 to 1.0
Default: 0.95
Tip: Generally, adjust either temperature or top_p, but not both simultaneously.

`max_completion_tokens`

The maximum number of tokens the model can generate in a single response. This includes both the visible output and any internal reasoning tokens when thinking mode is enabled.

Range: Varies by model (see table above)
Default: 1024
Tip: Set this high enough to accommodate your expected output length. For complex reasoning tasks, consider using higher values to allow the model sufficient space for thinking.

`frequency_penalty`

Penalizes tokens based on how frequently they appear in the generated text so far. Positive values reduce repetition.

Range: -2.0 to 2.0
Default: 0
Tip: Use small positive values (e.g., 0.1-0.5) to reduce repetitive phrasing in longer outputs.

`presence_penalty`

Penalizes tokens based on whether they have appeared in the generated text at all, regardless of frequency. Positive values encourage the model to introduce new topics.

Range: -2.0 to 2.0
Default: 0
Tip: Use small positive values to encourage more diverse outputs and topic exploration.

`stream`

Values: true or false
Default: false
Tip: Enable streaming for chat interfaces and real-time applications. Disable it for batch processing or when you need the complete response at once.

`stop`

A list of sequences where the model will stop generating further tokens. When the model encounters any of the specified stop sequences, it ends the response.

Type: null or array of strings (up to 4 sequences)
Default: null
Tip: Use stop sequences to control output format, such as stopping at a specific delimiter or marker.

Recommended Parameters

Parameter Details

`temperature`

`top_p`

`max_completion_tokens`

`frequency_penalty`

`presence_penalty`

`stream`

`stop`

Table of Contents

Model Parameters

Recommended Parameters

Parameter Details

`temperature`

`top_p`

`max_completion_tokens`

`frequency_penalty`

`presence_penalty`

`stream`

`stop`

Table of Contents