Calvin Wankhede / Android Authority
TL;DR
- OpenAI had demoed live vision capabilities in the Advanced Voice Mode within ChatGPT but had not shared a release timeline beyond the alpha version.
- We spotted strings that suggest that the feature, which could be called “Live camera,” could soon be released in ChatGPT’s beta version.
Many people heavily rely on ChatGPT for their daily professional and personal needs. OpenAI added a level of friendliness to ChatGPT with features like Advanced Voice Mode for natural conversations, but users have been waiting for the promised vision capabilities to also roll out. There’s good news on this front, as ChatGPT’s Live Video features in the Advanced Voice Mode could soon be rolling out to more users.
When OpenAI announced GPT-4o in May 2024, it boasted of advanced live vision capabilities coming to ChatGPT’s Advanced Voice Mode. The company famously showed off this demo where the new Advanced Voice Mode easily and seamlessly recognized the subject in the camera feed as a dog, remembered its name, recognized the ball, and associated the ball and the dog through an activity like fetch.
The demo was fairly impressive, considering how little information the user had to specifically and manually input and how quickly the AI assistant responded to the live camera feed. It was almost like the user was video-calling a human.
Some users had the chance to try out the Live Video (vision) feature in alpha, and they walked away equally impressed.
Trying #ChatGPT’s new Advanced Voice Mode that just got released in Alpha. It feels like face-timing a super knowledgeable friend, which in this case was super helpful — reassuring us with our new kitten. It can answer questions in real-time and use the camera as input too! pic.twitter.com/Xx0HCAc4To
However, users have been waiting rather patiently for the feature to arrive on the app outside of the alpha. To our knowledge, OpenAI did not promise a release timeline for the vision capabilities in Advanced Voice Mode beyond the alpha rollout.
OpenAI now appears to be getting ready for a beta rollout, as we spotted strings related to the vision capabilities in Advanced Voice Mode in the latest ChatGPT v1.2024.317 beta build.
Code
<string name="video_nux_beta_label">Beta</string>
<string name="video_nux_description">Tap the camera icon to let ChatGPT view and chat about your surroundings.</string>
<string name="video_nux_title">Live camera</string>
<string name="video_warning">Don't use for live navigation or decisions that may impact your health or safety.</string></code?
The above strings indicate that the feature could be called “Live camera” when it rolls out in beta. We spotted warnings for users that advise them not to use the Live camera feature for live navigation or other decisions impacting their health or safety.
Since the strings were spotted in the app’s beta version, this could mean that the company is now preparing for a wider beta rollout, possibly in the near future. If we are allowed to make assumptions, we presume that the feature could soon become available to ChatGPT Plus subscribers and possibly other paid tiers of the AI assistant.
We have reached out to OpenAI for comments on the release timeline for the real-time vision capabilities within ChatGPT’s Advanced Voice Mode. We’ll update this article when we get a response from the company.