Introduction
As a product manager specializing in AI experiences, staying updated with industry innovations and changes is a constant priority. To achieve this, I frequently engage with resources like Spotify and YouTube, which provide valuable insights. However, the information on these platforms is often filtered through creators, leading to delays in the dissemination of knowledge. For more immediate access to accurate information, I turn to studies, research papers, and articles published by individuals and organizations.
These documents, however, are typically lengthy 80 to 100 pages and written in highly technical language, making it challenging to process them comprehensively from start to finish. To navigate this, I often rely on ChatGPT. By uploading a PDF or sharing a link, I prompt the model to summarize the key points or extract answers to specific queries. Initially, this seemed like an efficient solution. However, over time, I began noticing inconsistencies. Many of the summaries felt incomplete or misaligned with the paper’s abstract or title. The more I used the model, the more apparent these discrepancies became.
This prompted a deeper investigation. In several cases, I asked ChatGPT to locate a specific answer in a study, particularly when I knew the author had addressed the topic. Despite this, the model provided inaccurate or overly generalized responses. On one occasion, I requested a citation for a specific summary. ChatGPT supplied one, but it was completely irrelevant to my query. Reiterating prompts and adjusting my approach yielded no significant improvements. Eventually, I read the article myself and confirmed that the author had not addressed the question in detail, meaning there was no accurate answer to provide.
This experience was a turning point. I realized that while ChatGPT could assist in navigating dense information, it could not replace the critical process of reading and analyzing the material myself. Determined to understand the root of these issues, I delved into research to explore how the model’s results could be improved. During this exploration, I came across Matteo Wong’s Atlantic article, “Generative AI Can’t Cite Its Sources”, which highlighted the challenges generative AI faces in providing reliable citations. Wong’s findings resonated deeply with my own experiences.
Currently, countless articles and posts offer tips on how to use ChatGPT effectively, recommending specific prompts to achieve certain outcomes. However, most of these guides fail to evaluate the accuracy of the results they advocate. By sharing my experiences and findings, I aim to help others use ChatGPT more effectively. Understanding its limitations and adapting accordingly is crucial to maximizing its potential. My goal is not just to highlight the challenges but to empower others to approach ChatGPT critically, ensuring they get the best outcomes while remaining aware of its shortcomings.
Areas of Research
In my research, I focused on several key areas that I believe posed the greatest challenges.
-
Obtaining Accurate, Up-to-Date Information: Prompting ChatGPT to explore specific problems and analyze multiple sources.
-
Comparing Offline and Online Search Responses: Evaluating the impact of Open AI’s web-enabled search feature, now available to all users.
-
Enhancing Prompts for Better Summaries and Citations: Iteratively refining prompts to achieve more accurate results.
The evaluation of results across different areas will be based on the following criteria:
- Good: Indicates that, based on the query question, the model demonstrates the ability to effectively fulfill the task and provide a high-quality response.
- Acceptable: Denotes that while the model’s results are not of high quality, they can be improved through prompt optimization.
- Bad: Refers to cases where the model consistently fails to produce correct outputs.
Obtaining Accurate, Up-to-Date Information
Nowadays, many industry experts share critical insights and forward-looking perspectives on platforms like Spotify through podcasts, YouTube videos, and especially LinkedIn. These platforms have become hubs for timely, often groundbreaking information. We live in an era where long-held beliefs are increasingly challenged and revised, and the first venues for these revelations are often social media platforms. The process of information reaching the top search results on Google, where it is often perceived as a widely accepted “truth,” can take several months or even years. Through effective SEO optimization, this timeline can typically be reduced to approximately 90 to 180 days, depending on the industry. In the absence of SEO strategies, it may take significantly longer for Google’s algorithm to assess and prioritize the relevance of the information.
This delay in information spread creates a significant gap. For instance, recent studies have provided compelling evidence that weightlifting is more effective than cardio for overall health and fitness. The 2021 study from the University of New South Wales is just one example of this growing body of evidence. However, if you were to query Google on this topic today, the top results would likely continue to endorse cardio, relying on outdated studies. This lag illustrates how traditional search engines often fail to keep pace with the cutting-edge insights shared by experts on platforms like Spotify and YouTube. Industry professionals have been discussing such studies for years, but because their content is often shared via spoken media or niche communities, it doesn’t always make it into widely read articles or Google’s algorithm.
This realization leads to a critical question: how can one access the most up-to-date and disruptive insights on any topic? How can one leverage the latest findings to challenge conventional thinking, explore different perspectives, and become an expert with just a single query? In this context, I turned to ChatGPT to explore whether it could provide an advantage over Google Search in aggregating and synthesizing the latest knowledge.
ChatGPT, using a vast range of publicly available sources such as web pages, blogs, and YouTube transcripts, offers significant potential. It can even process video content from platforms like YouTube and generate summaries or answers based on that material, complete with citations. However, it has notable limitations. For instance, ChatGPT does not have direct access to platforms like Spotify or LinkedIn, which are rich sources of real-time, critical information in specific industries. While it can analyze text, audio, or video samples provided by the user, finding and supplying this content remains a manual process.
These constraints highlight both the strengths and weaknesses of ChatGPT. While it can streamline the process of synthesizing widely available information, it requires users to be proactive in curating content from platforms that fall outside its reach. This means that to fully leverage ChatGPT’s capabilities, users must supplement it with their own efforts to source the latest and most relevant materials.
Validating Online vs. Offline Search
The next phase of my research involved validating ChatGPT’s new online web search feature and comparing it to its offline functionality, with a specific focus on question-answering capabilities. This step was particularly challenging, as I wanted to understand how the model sources its answers and whether enabling online search truly enhanced the quality of responses. I approached this by prompting ChatGPT with a series of queries designed to test its ability to deliver accurate and relevant information.
Initially, I experimented with the online search enabled. I noticed a pattern: ChatGPT would typically rely on a single source, often highly ranked in Google search results or sourced from YouTube, to construct its response. While this approach sometimes yielded useful information, the model often provided additional sources of questionable quality. These supplemental sources would partially address the topic but failed to offer a comprehensive or precise answer to the query. In essence, the model’s response with online search enabled mirrored the type of answer one might find on the first page of Google search results.
One of the most striking examples arose when I tested ChatGPT with a common interview question: “Where do you see yourself in five years?” With online search enabled, ChatGPT delivered a response sourced from a highly ranked web page, advising the applicant to frame their answer around career growth, progression into senior roles, and leading teams. While this advice may appear correct, it is, in fact, not what most hiring managers or HR professionals consider ideal. Such responses can come across as overly ambitious or self-centered, rather than focused on the role at hand. To investigate deeper, I rephrased and reiterated my query, prompting ChatGPT to explain its reasoning. Despite my efforts, the responses remained largely unchanged, with additional sources cited to reinforce the same perspective.
Frustrated but curious, I decided to disable the online search and explore how the offline mode would handle the same question. To my surprise, the response was totally different and far more accurate. ChatGPT proposed an approach centered on mastering the role being applied for, excelling in it, and supporting the broader team. This advice aligns with what hiring professionals recommend, as it demonstrates a commitment to the immediate role while hinting at long-term value. Even more intriguing, the model cited a relevant and specific article: “10 Sample Answers to ‘Where Do You See Yourself in 5 Years?’. This was exactly the type of nuanced and practical advice I had been hoping to uncover.
The contrast between the two modes was both fascinating and revealing. The online search appeared to prioritize top-ranked sources, which introduced a potential bias and limited the depth of the responses. In contrast, offline search seemed to draw from a broader dataset, offering insights that were more balanced and aligned with real-world expectations. My hypothesis is that offline mode, relying on pre-trained data, synthesizes information from a wider range of sources, whereas online mode narrows its focus to a handful of prominent results, potentially sacrificing quality and nuance.
Enhancing Prompts for Better Summaries and Citations
The third area of my research centered around refining prompts and iterating on them to guide ChatGPT into rethinking its responses. This exploration aimed to validate whether the model’s answers could evolve or improve with strategic prompting. My focus was threefold: question-answering, document summarization, and ensuring accurate citations.
Through this process, I found that ChatGPT excels in reasoning and iterative refinement for search-based queries. By asking why the model provided a specific response and following up with additional prompts, it demonstrated an ability to identify its limitations and reframe its answers in case the initial response was incorrect. This iterative dialogue allowed for some improvement, but the extent of change depended heavily on the complexity of the query.
For document summarization, structured prompting proved acceptable results. By explicitly outlining the desired structure and key focus areas, I could guide ChatGPT to deliver better summaries. However, even with clear instructions, the model often struggled to capture the core value of a study or article. Instead, it tended to highlight surface-level themes or multiple unrelated points, which did not always reflect the primary benefits or key insights. Additionally, ChatGPT frequently failed to identify the reasoning or evidence underpinning an authors’ arguments, which diminished the depth and accuracy of its summaries.
The most challenging aspect of this research was improving citation accuracy. Despite repeated prompt refinement and reasoning-based queries, I could not consistently achieve correct or relevant citations. Even when the model provided citations, they were often irrelevant to the context of the query or sourced from inaccurate interpretations of the document. My hypothesis is that ChatGPT’s tendency to generate affirmative or positive responses creates an inherent bias, where the model attempts to provide an answer regardless of the availability of concrete supporting information.
This bias is particularly evident when addressing queries that cannot be resolved with a straightforward “yes” or “no.” The model’s generative nature often amends the outcome in a way that affects the overall quality of its response. While ChatGPT can summarize content reasonably well, it frequently interprets nuanced arguments through a binary lens, which oversimplifies complex information and can lead to misrepresentation.
Summary
In conclusion, ChatGPT is a powerful tool that can significantly enhance productivity and save valuable time. However, its effectiveness depends heavily on how it is utilized and for what purposes. This article has focused less on the biases inherent in information retrieval and more on the model’s ability to access the most recent, high-quality data that surpasses traditional search engines in responsiveness and relevance. Despite its strengths, ChatGPT remains far from delivering outputs that could quickly establish expertise in an industry. In my view, the primary limitation lies in its lack of access to platforms like LinkedIn and Spotify, which are critical sources for real-time, diverse perspectives. Additionally, the model’s tendency to simplify responses into binary yes-or-no frameworks restricts its ability to provide nuanced, multi-dimensional insights.
ChatGPT excels in initial research tasks, effectively outlining high-level pillars of a given query. However, it struggles to extract and synthesize key outcomes when tasked with processing detailed information. While employing the techniques discussed in this article to improve its reading capabilities, I encountered several challenges and ultimately fell short of achieving the desired level of accuracy. This shortfall becomes particularly problematic when users lack domain expertise, as they may unknowingly accept biased or outdated responses.
From a professional perspective, I rely on ChatGPT primarily for writing tasks, where it truly shines. The model’s ability to rewrite text into a polished, professional format is a game-changer, particularly for non-native English speakers. However, when it comes to information synthesis and analysis, my trust in the model diminishes. I now primarily use ChatGPT to outline high-level details when exploring a new area but prefer to invest more time in reading and analyzing content myself to ensure the most accurate and comprehensive understanding. By recognizing these strengths and limitations, users can better harness ChatGPT’s potential while mitigating its weaknesses.