One of ChatGPT’s best features is the ability to summarize information, which I use regularly. While the AI has no problem reviewing text-based content like websites or large PDFs, it can’t yet watch YouTube videos and summarize them or answer questions about the content in those clips.
That kind of ability will probably come to AI products in the near future. After all, companies like OpenAI, Google, and others have already given AI the ability to “see.” You can feed images and real-time video to various AI models, and they’ll be able to extract information. They just can’t stream video clips, which is a bit of a letdown.
Still, there is a way to summarize YouTube videos with AI, and it’s how I’ve been watching some clips for a while now. I use ChatGPT to summarize those videos even before I watch them, or at least the parts I’m interested in.
Say I need to listen to a two-hour podcast on a specific topic and don’t have two hours to spare. I can use ChatGPT to summarize the whole thing, then focus on the part I care about. Once the AI finishes its summary, I can watch the segments that deal with that topic and skip the rest while still getting a general sense of the full discussion.
The same goes for any type of clip, and it can be a big timesaver. For one thing, summarizing a YouTube video before watching it means I don’t have to sit through ads. I’m not a YouTube Premium subscriber and don’t plan to become one, so ads are just part of the deal.
Also, the ChatGPT summary helps me decide if I need to watch the whole video or just rely on the information from the AI. Depending on what I’m after, I might still watch the clip to verify that ChatGPT didn’t hallucinate any of the details it gave me.
How to summarize YouTube videos
This all works thanks to a YouTube feature I was using long before AI chatbots changed how we use computers: transcripts. So, instead of having the AI watch the video, I make it summarize the video in text, which it happens to do really well.
Not all clips had transcripts at first, but most do now. To find the transcript, go to the video’s description and scroll until you see the Show transcript button.
A Transcript box with timestamps will appear next to the description.
Now, just copy all the text in that box and paste it into a chat with ChatGPT. Ask it to summarize the text, and wait a few seconds.
First, I remove the timestamps by tapping the 3-dot menu and selecting Toggle timestamps.
Once the AI summarizes the transcript, I get a good overview of the video. I can ask ChatGPT follow-up questions about the information, and then use the browser’s search feature to jump to the specific parts of the video I care about.
For this example, I used a video from YouTuber Foot Doctor Zach, my go-to channel for running shoe info. I discovered it last year when I used ChatGPT to find shoes that met some specific needs. Without ChatGPT surfacing that clip, I probably wouldn’t have subscribed.
The video is only 11 minutes long and covers one of Nike’s newer sneakers. Thanks to ChatGPT o3, I now have a summary and know exactly which part of the video I want to watch.
Wait, it might get better
This has always been my method for summarizing YouTube videos with ChatGPT. I used to just copy the transcript into ChatGPT and let it do its thing.
But now it turns out ChatGPT can grab the transcript on its own. All I have to do is feed it the video URL, and ChatGPT should be able to read the transcript and other text data from the clip. ChatGPT o3 told me that.
It’s not just ChatGPT either. Other AI models can also summarize YouTube clips as long as they’re able to browse the web and pull data from the page.
In practice, it doesn’t always work. I gave ChatGPT o3 the same running shoes clip and asked for a summary. It tried to do it using the method it had described earlier, but failed. o3 said the “caption route is being blocked upstream,” so it couldn’t access the text it needed.
Is Google blocking rival AI platforms from accessing YouTube’s text data? I wouldn’t be surprised, but I don’t really care either. As long as I can grab the transcripts manually, I can feed that data to ChatGPT and get a summary.
The 11-minute clip I used earlier isn’t the best example. I’d probably just watch that one. But for much longer videos, I’ll always consider using AI to summarize the content before diving into the parts I care about.