OpenAI introduces GPT-4o, its advanced model integrating text, audio, and vision. It offers real-time responses, surpassing previous models in speed and cost, with improved multilingual and visual-auditory comprehension capabilities.
OpenAI has announced the launch of GPT-4o, its latest AI model designed to understand and generate text, audio, and images in an integrated manner. Known as "omni" for its multimodal capability, this model responds to audio inputs in just 232 milliseconds on average, comparable to human response time in conversations.
GPT-4o not only maintains the performance of GPT-4 Turbo in English and programming but also significantly improves performance in other languages, being faster and 50% cheaper in the API. Additionally, it shows notable advancements in visual and auditory comprehension compared to previous models.
Unlike previous models that used multiple models to handle audio, text, and vision inputs and outputs, GPT-4o uses a single neural network trained integrally to process all these types of data. This allows for more natural and rich interaction, capturing tones, multiple speakers, and background sounds, and can generate laughter, songs, and emotions.
The model has been evaluated and demonstrated superior performance in traditional text, reasoning, and coding benchmarks, as well as new visual and auditory perception tests. Additionally, OpenAI has implemented rigorous safety measures and risk assessments to mitigate potential dangers, ensuring that GPT-4o does not exceed medium risk in any safety category.
GPT-4o is immediately available to free users and ChatGPT Plus subscribers, with text and image capabilities. Soon, support for audio and video will be added to a select group of partners in the API. This launch marks a significant step towards more natural and efficient human-computer interaction.
For more information, visit the official announcement.
ChatGPT helps you get answers, find inspiration and be more productive. It is free to use and easy to try. Just ask and ChatGPT can help with writing, learning, brainstorming and more. ChatGPT is a ...
OpenAI develops artificial intelligence with a focus on safety and social benefit. The company integrates advanced research and ethical principles to drive general-purpose AI ...
03/06/2025
ElevenLabs has released Eleven v3 (alpha), a text-to-speech model that incorporates emotional control tools and multi-speaker dialogue capabilities ...
29/05/2025
Black Forest Labs introduces FLUX.1 Kontext, a new family of artificial intelligence models that enables image generation and editing using both text ...
22/05/2025
Anthropic presents Claude Opus 4 and Sonnet 4, artificial intelligence models that achieve new records in code evaluations and incorporate extended ...
16/05/2025
Codex is an AI-powered agent that optimizes software development by automating multiple tasks simultaneously. OpenAI has launched a preliminary ...