Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: https://t.co/MYHZB79UqN
Text and image input rolling out today in API and ChatGPT with voice and video in the coming weeks. pic.twitter.com/uuthKZyzYx
— OpenAI (@OpenAI) May 13, 2024
OpenAI has introduced its most comprehensive artificial intelligence endeavor yet: a multimodal model that will be able to communicate to users through both text and voice.
GPT-4o, which will be rolling out in ChatGPT as well as in the API over the next few weeks, is also able to recognize objects and images in real time, the company said Monday.
The model synthesizes a slew of AI capabilities that are already separately available in various other OpenAI models. But by combining all these modalities, OpenAI’s latest model is expected to process any combination of text, audio and visual inputs more efficiently.
Users can relay visuals — through their phone camera, by uploading documents, or by sharing their screen — all while conversing with the AI model as if they are in a video call. The technology will be available for free, the company announced, but paid users will have five times the capacity limit.
https://www.nbcnews.com/tech/rcna151947
Totally insane.
Watch the first couple of minutes as it shows the demos Open Ai released today
AC