ChatGPT now interprets photos better than an art critic and an investigator combined
ChatGPTās recent image generation capabilities have challenged our previous understanding of AI-generated media. The recently announced GPT-4o model demonstrates noteworthy abilities of interpreting images with high accuracy and recreating them with viral effects, such as that inspired by Studio Ghibli. It even masters text in AI-generated images, which has previously been difficult for AI. And now, it is launching two new models capable of dissecting images for cues to gather far more information that might even fail a human glance.
OpenAI announced two new models earlier this week that take ChatGPTās thinking abilities up a notch. Its new o3 model, which OpenAI calls its āmost powerful reasoning modelā improves on the existing interpretation and perception abilities, getting better at ācoding, math, science, visual perception, and more,ā the organization claims. Meanwhile, the o4-mini is a smaller and faster model for ācost-efficient reasoningā in the same avenues. The news follows OpenAIās recent launch of the GPT-4.1 class of models, which brings faster processing and deeper context.
ChatGPT is now āthinking with imagesā
With improvements to their abilities to reason, both models can now incorporate images in their reasoning process, which makes them capable of āthinking with images,ā OpenAI proclaims. With this change, both models can integrate images in their chain of thought. Going beyond basic analysis of images, the o3 and o4-mini models can investigate images more closely and even manipulate them through actions such as cropping, zooming, flipping, or enriching details to fetch any visual cues from the images that could potentially improve ChatGPTās ability to provide solutions.
With the announcement, it is said that the models blend visual and textual reasoning, which can be integrated with other ChatGPT features such as web search, data analysis, and code generation, and is expected to become the basis for a more advanced AI agents with multimodal analysis.
Among other practical applications, you can expect to include pictures of a multitude of items, such flow charts or scribble from handwritten notes to images of real-world objects, and expect ChatGPT to have a deeper understanding for a better output, even without a descriptive text prompt. With this, OpenAI is inching closer to Googleās Gemini, which offers the impressive ability to interpret the real world through live video.
Despite bold claims, OpenAI is limiting access only to paid members, presumably to prevent its GPUs from āmeltingā again, as it struggles to keep up the compute demand for new reasoning features. As of now, the o3, o4-mini, and o4-mini-high models will be exclusively available to ChatGPT Plus, Pro, and Team members while Enterprise and Education tier users get it in one weekās time. Meanwhile, Free users will be able to limited access to o4-mini when they select the āThinkā button in the prompt bar.
RECOMMENDED NEWS

Apple is eyeing a ChatGPT-like search, but it must focus beyond Siri
2025-10-20

5 AI apps with deep research features to rival ChatGPT
2025-10-20

Windows 11 is getting a lot of new features, hereās how to check if yourĀ PCĀ qualifies
2025-10-19

Microsoft will soon use AI to help you find your photos and files on Copilot+ PCs
2025-10-18

Snapchatās new lenses add AI videos to your Snaps at a steep fee
2025-10-19

Appleās hardware can dominate in AI ā so why is Siri struggling so much?
2025-10-20
Comments on "ChatGPT now interprets photos better than an art critic and an investigator combined" :