OpenAI released two powerful reasoning models a few days ago that make ChatGPT even more impressive. These are o3 and o4-mini that you can test right away in ChatGPT. They’re much better at reasoning than their predecessors and might excel at coding and math if those are your hobbies.
However, the new ChatGPT head-turning feature in o3 and o4-mini is, at least for me, the AI’s ability to interpret data in images. Essentially, ChatGPT has computer vision like in the movies, including reasoning capabilities that let the AI extract location data from photos. You can ask the AI, “Where was this photo taken?” and the AI will do everything in its power to answer.
ChatGPT o3 and o4-mini will get things right, as you’re about to see in my highly scientific test that follows. That is, they’ll get things right even if I try to use AI to fool ChatGPT.
Because yes, I used GPT-4o image generation to create a lifelike photo of a well-known ski location in the Alps rather than uploading a real picture of my own. I then told ChatGPT to alter that image in a way that would change the skyline.
After that, I started new chats with o3 and o4-mini, convinced that ChatGPT would recognize the location in the fake photo I had just submitted. I wasn’t wrong; both models did give me the result I expected, proving that you can use AI-generated content to fool the AI. But they blew my mind nonetheless.
I explained recently how the Apple Watch algorithms let me down while skiing last week, and that’s what I used as inspiration in my experiment to fool the AI.
I asked ChatGPT to generate a photo showing the well-known Matterhorn peak on a sunny day, with skiers enjoying their time. The photo had to have a 16:9 aspect ratio and resemble an iPhone photo.
I told the AI to put a gondola in it for good measure, but, as you can see on the first try, that gondola wasn’t going places. No matter; I only needed a first image from the AI so that I could alter it. Enter the following image:
I instructed ChatGPT to remove the gondola and place a smaller Matterhorn peak towards the right.
I took a screenshot of the image so it wouldn’t preserve any metadata, and then turned the file into a JPG photo:
Then, I started two separate chats, with ChatGPT o3 and ChatGPT o4-mini, where I uploaded the fake Matterhorn photo and asked the AI to tell me where the picture was taken and how they figured it out.
Unsurprisingly, both reasoning AI models successfully identified Matterhorn as the location.
ChatGPT o3
First, we have o3, which gave me ample details about how it determined the location. The AI is incredibly confident in its response, telling me that “Flanking peaks such as the Dent Blanche and Weisshorn” are telling signs.
I had a smile on my face. I had beaten the AI, with AI by making it recognize the location in a fake photo. It was even better that o3 was so sure of itself after only 34 seconds of thinking.
But then I thought I’d push things further so it could figure out the image was fake. I asked it to draw circles on Dent Blanche and Weisshorn.
This is where seeing o3 in action blew my mind. This time, the AI spent almost six minutes looking at the photo, trying to reliably pinpoint the two peaks it said it could see in the distance.
As you’ll see, the mini Matterhorn on the right immediately threw the AI off, but ChatGPT didn’t stop there. It kept looking at the photo and searched the web for pictures of the Alps region where these peaks are located.
It also looked at the photo to determine the relative location of additional peaks in the region. “I can try overlaying approximate local maxima based on brightness, but honestly, I think it’s easier to just use my eyes for this,” o3 thought, and I was blown away to read it.
The AI went on to zoom in to see parts of the fake AI photo better:
It cropped parts of the image trying to figure out details it would expect to be there in a real photo of the areas surrounding the Matterhorn. In its chain of thought, ChatGPT said it couldn’t quite spot mountain shapes it thought should be there.
The AI started annotating the image, looking for the answer as it continued to search the web for more images that would help it pinpoint the location of the two peaks I asked it to place red circles around.
As you can see, the fake mini-Matterhorn on the right kept fooling the AI.
Ultimately, ChatGPT o3 acknowledged the uncertainties but still decided to mark the two peaks I asked for. It ran code into the chat and gave me the following image.
I would have loved to see ChatGPT o3 call my bluff and tell me this photo isn’t real. Maybe future versions of the AI will be able to do that. But I must say that reading those five minutes of “thinking,” most of them seen in the image above, was even better.
It showed me that AI is putting in work to get the job done and reinforced my idea that AI computer vision is incredible in these new versions of ChatGPT.
But wait, it gets better.
ChatGPT o4-mini
My experiment can’t be done without using ChatGPT o4-mini. After all, o4-mini is the precursor of o4, which should be even better than o3. o4-mini was so much faster than o3 in giving me the answer.
The AI thought for 15 seconds, during which time it surfaced images from the internet to support its view that the photo I had uploaded was a real image of the Matterhorn.
o4-mini also explained how it identified the location, but it felt certain it was right about it. This is the Matterhorn, given all it has learned about it from the web.
Unlike ChatGPT o3, o4-mini didn’t mention the additional peaks. But I asked o4-mini to do the same thing as o3: Identify Dent Blanche and Weisshorn.
o4-mini blew my mind with its speed here. It took 18 seconds to give me the following image, which has red circles around the two peaks.
Yeah, it’s not a great job, and I have no idea why the AI put those circles there because the more limited chain-of-thought transcript doesn’t explain it.
It’s obviously wrong, considering that we’re working with a fake AI image here. And yes, o4-mini could not tell the photo was fake.
The real Matterhorn
The conclusions are obvious, and it’s not all great news.
First, 4o image generation can easily be abused. I’ve actually never seen the Matterhorn in person, and that’s why I asked the AI to make this specific image. I recognized its famous silhouette from real-life photos, but I’m definitely not familiar with the other peaks in the region. This goes to show that ChatGPT-created images can fool people. They can fool other AI models as well.
Second, o3 and o4-mini are simply amazing at analyzing data in images. Of course, they have to be. If 4o can create stunning, lifelike photos, it’s because the AI can interpret data in images.
Third, finding location information from photos will be trivially easy for OpenAI models like o3 and o4-mini. Competitors will probably get similar powers. This is a privacy issue that we’ll need to account for in the future.
Fourth, ChatGPT o3 takes the reasoning job very seriously. If it spent all that time on a fake AI photo trying to match it to the real world, it’ll spend similar time on other jobs you might throw at it, and it’ll use a bunch of tools available in ChatGPT (like coding, web search, image manipulation) to get the job done.
I’m sure that if I had spent more time with the AI reasoning over the image, we’d eventually reach the conclusion that the image the AI was investigating was fake.
Fifth, ChatGPT o4-mini can be really fast. Too fast. It’s something you want from genAI chatbots, but also something to worry about. o4-mini didn’t recognize the fake photo either, but its approach was a lot sloppier. That makes me think you need to pay extra attention when working with the mini version to ensure the AI gets the job done. But hey, I’m working with a very limited experiment here.
Finally, here’s the Matterhorn and surrounding area from a YouTube clip that was uploaded in December 2020. I say that because, in the age of AI, the video you’re about to see could always be a fake. The video gets you a “view from above the Weisshorn Nordwand looking towards the Matterhorn (L) and Dent Blanche (R). Mt Blanc is visible in the distance (Far R).” It’s a different angle, but at least good enough to give you an idea of what ChatGPT o3 was looking for.