OpenAI o4-mini: A Compact Reasoning Model by OpenAI

What is OpenAI o4-mini?

OpenAI o4-mini is a compact reasoning model launched by OpenAI, optimized for fast and cost-effective inference tasks. It excels in mathematics, programming, and visual tasks, achieving top performance in the AIME 2024 and 2025 benchmarks. o4-mini supports high-volume, high-throughput inference, making it ideal for processing large quantities of queries quickly. With multimodal capabilities, it integrates images into reasoning chains, supports tool usage, and generates detailed, well-thought-out responses. Compared to its predecessors, o4-mini offers significant improvements in performance and cost-efficiency. Currently, ChatGPT Plus, Pro, and Team users can access o4-mini and o4-mini-high in the model selector, replacing o1, o3-mini, and o3-mini-high. ChatGPT Enterprise and Edu users will gain access within a week. Developers can utilize the model via the Chat Completions API and Responses API.

Key Features of OpenAI o4-mini

Fast Inference: Excels in rapidly processing mathematics, programming, and visual tasks, ideal for high-throughput scenarios.
Multimodal Capabilities: Combines images and text for reasoning, supporting image processing.
Tool Usage: Leverages tools like web search and Python programming to assist in problem-solving.
Cost-Effective: Outperforms the previous o3-mini at the same price, making it a top choice for upgrades.
Safe and Reliable: Trained for safety, capable of rejecting inappropriate requests.

Performance of OpenAI o4-mini

Mathematical Reasoning:

In AIME 2024 and 2025 benchmarks, o4-mini achieves a 93.4% accuracy rate without tools, rising to 98.7% with Python, nearing perfect scores.
It surpasses o3-mini in complex mathematical problem-solving, approaching the performance of the full o3 model in some tasks.

Programming Capabilities:

SWE-Lancer: o4-mini excels in efficiently completing complex programming tasks with outstanding results.
SWE-Bench Verified (Software Engineering Dataset): Outperforms o3-mini in algorithms, system design, and API calls, with higher accuracy and efficiency.
Aider Polyglot Code Editing (Multilingual Code Editing Benchmark): Excels in code editing tasks, including full rewrites and patch-style modifications, surpassing o3-mini.

Multimodal Capabilities:

MMMU (University-Level Visual Math Dataset): Solves problems combining images and mathematical symbols, achieving an 87.5% accuracy rate, far exceeding o1’s 71.8%.
MathVista (Visual Math Reasoning): Performs exceptionally in tasks involving geometric shapes and function curves, with an 87.5% accuracy rate.
CharXiv-Reasoning (Scientific Chart Reasoning): Understands charts and diagrams in scientific papers, achieving a 75.4% accuracy rate, significantly better than o1’s 55.1%.

Tool Usage:

Scale Multichallenge (Multi-Turn Instruction Following): Handles complex multi-turn instruction tasks, accurately executing commands.
BrowseComp Agentic Browsing (Browser Tasks): Performs searches, clicks, and page navigation in a virtual browser, integrating information with performance close to o3, far surpassing traditional AI search capabilities.
Tau-bench Function Calling: Delivers stable performance in generating structured API calls, though further optimization is needed for complex scenarios.

Comprehensive Testing：

Expert-Level Comprehensive Test (Humanity’s Last Exam): Achieves a 14.3% accuracy rate without tools, improving to 17.7% with plugins, falling short of o3’s 24.9% but excelling among compact models.
Interdisciplinary PhD-Level Science Questions (GPQA Diamond): Scores 81.4% accuracy, slightly below o3’s 83.3%, but highly competitive among compact models.

Project URL for OpenAI o4-mini

Official Website: https://openai.com/index/introducing-o4-mini/

Applications of OpenAI o4-mini

Educational Support: Assists students in solving math and programming problems.
Data Analysis: Quickly generates data charts and analytical results.
Software Development: Produces code snippets and aids in code debugging.
Content Creation: Provides creative inspiration and generates descriptions based on images.
Daily Queries: Answers questions using search and image analysis.

What is OpenAI o4-mini?

Key Features of OpenAI o4-mini

Fast Inference: Excels in rapidly processing mathematics, programming, and visual tasks, ideal for high-throughput scenarios.
Multimodal Capabilities: Combines images and text for reasoning, supporting image processing.
Tool Usage: Leverages tools like web search and Python programming to assist in problem-solving.
Cost-Effective: Outperforms the previous o3-mini at the same price, making it a top choice for upgrades.
Safe and Reliable: Trained for safety, capable of rejecting inappropriate requests.

Performance of OpenAI o4-mini

Mathematical Reasoning:

In AIME 2024 and 2025 benchmarks, o4-mini achieves a 93.4% accuracy rate without tools, rising to 98.7% with Python, nearing perfect scores.
It surpasses o3-mini in complex mathematical problem-solving, approaching the performance of the full o3 model in some tasks.

Programming Capabilities:

SWE-Lancer: o4-mini excels in efficiently completing complex programming tasks with outstanding results.
SWE-Bench Verified (Software Engineering Dataset): Outperforms o3-mini in algorithms, system design, and API calls, with higher accuracy and efficiency.
Aider Polyglot Code Editing (Multilingual Code Editing Benchmark): Excels in code editing tasks, including full rewrites and patch-style modifications, surpassing o3-mini.

Multimodal Capabilities:

MMMU (University-Level Visual Math Dataset): Solves problems combining images and mathematical symbols, achieving an 87.5% accuracy rate, far exceeding o1’s 71.8%.
MathVista (Visual Math Reasoning): Performs exceptionally in tasks involving geometric shapes and function curves, with an 87.5% accuracy rate.
CharXiv-Reasoning (Scientific Chart Reasoning): Understands charts and diagrams in scientific papers, achieving a 75.4% accuracy rate, significantly better than o1’s 55.1%.

Tool Usage:

Scale Multichallenge (Multi-Turn Instruction Following): Handles complex multi-turn instruction tasks, accurately executing commands.
BrowseComp Agentic Browsing (Browser Tasks): Performs searches, clicks, and page navigation in a virtual browser, integrating information with performance close to o3, far surpassing traditional AI search capabilities.
Tau-bench Function Calling: Delivers stable performance in generating structured API calls, though further optimization is needed for complex scenarios.

Comprehensive Testing：

Expert-Level Comprehensive Test (Humanity’s Last Exam): Achieves a 14.3% accuracy rate without tools, improving to 17.7% with plugins, falling short of o3’s 24.9% but excelling among compact models.
Interdisciplinary PhD-Level Science Questions (GPQA Diamond): Scores 81.4% accuracy, slightly below o3’s 83.3%, but highly competitive among compact models.

Project URL for OpenAI o4-mini

Official Website: https://openai.com/index/introducing-o4-mini/

Applications of OpenAI o4-mini

Educational Support: Assists students in solving math and programming problems.
Data Analysis: Quickly generates data charts and analytical results.
Software Development: Produces code snippets and aids in code debugging.
Content Creation: Provides creative inspiration and generates descriptions based on images.
Daily Queries: Answers questions using search and image analysis.

OpenAI o4-mini: A Compact Reasoning Model by OpenAI

What is OpenAI o4-mini?

Key Features of OpenAI o4-mini

Performance of OpenAI o4-mini

Project URL for OpenAI o4-mini

Applications of OpenAI o4-mini

Summary

OpenAI o4-mini: A Compact Reasoning Model by OpenAI

What is OpenAI o4-mini?

Key Features of OpenAI o4-mini

Performance of OpenAI o4-mini

Project URL for OpenAI o4-mini

Applications of OpenAI o4-mini

Summary