OpenAI Introduces GPT-4o to the World

3 min readMay 23, 2024

In a blog, OpenAI announced the release of GPT-4o, a new GPT model that promises to seamlessly integrate text, audio, image, and video inputs and outputs. Dubbed “GPT-4o” for “omni” this flagship model represents a significant leap towards more natural and efficient interactions with AI.

Get your ODSC Europe 2024 pass today!
In-Person and Virtual Conference
September 5th to 6th, 2024 — London
Featuring 200 hours of content, 90 thought leaders and experts, and 40+ workshops and training sessions, Europe 2024 will keep you up-to-date with the latest topics and tools in everything from machine learning to generative AI and more.
REGISTER NOW

What sets GPT-40 apart is its ability to process various types of input and generate diverse outputs, making it a versatile tool for a wide range of applications. Unlike its predecessors, GPT-4o can respond to audio inputs in as little as 232 milliseconds, closely mimicking human response times.

This enhancement is a considerable improvement over the previous Voice Mode capabilities, which had latencies of 2.8 seconds with GPT-3.5 and 5.4 seconds with GPT-4. The model’s end-to-end training across text, vision, and audio allows it to retain and interpret information more accurately.

This holistic approach enables GPT-4o to understand and generate nuanced responses, including laughter, singing, and expressing emotions, which were previously unattainable with the separate model pipeline used in earlier versions.

GPT-4o achieves GPT-4 Turbo-level performance in text, reasoning, and coding, while significantly enhancing multilingual, audio, and vision capabilities. It excels in several benchmarks, including:

Reasoning: GPT-4o sets a new high score of 88.7% on the 0-shot COT MMLU, a general knowledge benchmark, surpassing previous models.
Audio: The model dramatically improves speech recognition and translation performance, outperforming Whisper-v3, especially in lower-resourced languages.
Vision: GPT-4o achieves state-of-the-art performance on visual perception benchmarks, including MMMU, MathVista, and ChartQA.

As for safety, OpenAI claims that it is a top priority. GPT-4o incorporates safety measures across all modalities, employing techniques like data filtering and post-training behavior refinement. The model has been rigorously evaluated according to OpenAI’s Preparedness Framework, ensuring it does not exceed Medium risk in cybersecurity, persuasion, and model autonomy.

External red teaming, involving over 70 experts in fields such as social psychology, bias, and misinformation, has been instrumental in identifying and mitigating new risks. While the text and image inputs and outputs are available now, audio outputs are limited to preset voices and comply with existing safety policies.

ODSC West 2024 tickets available now!
In-Person & Virtual Data Science Conference
October 29th-31st, 2024 — Burlingame, CA
Join us for 300+ hours of expert-led content, featuring hands-on, immersive training sessions, workshops, tutorials, and talks on cutting-edge AI tools and techniques, including our first-ever track devoted to AI Robotics!
REGISTER NOW

OpenAI plans to release more modalities in the coming months, with continuous improvements based on user feedback. As of the publication of this article, GPT-40 is available with expanded access for Plus users and developers

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Interested in attending an ODSC event? Learn more about our upcoming events here.

OpenAI Introduces GPT-4o to the World

Written by ODSC - Open Data Science

No responses yet