Search Engines Are Failing. Research Methods Classes Need To Catch Up

Universities all around the world teach a course called “research methods”. This mixes philosophy of science and epistemology with hypothesis formulation, significance testing and a study of quality, quantity and bias in statistics. I’ve taught it quite a few times. It’s an important bedrock of academic research. But it needs to catch up with the horrors of the modern digital age.

Internet searching has become a central research methodology for all academics. All modern research rests on an assumption of some accessible network of information “out there”. Sure, academics have better tools at their disposal than the average web user, including private databases for searching out papers. Mainly, though, everyone uses the traditional advanced features of common search engines, such as Boolean operators, sorting and filtering.

Recently, the quality of search results across all popular engines has fallen, however, and we need to ask whether university research, as currently taught, can survive.

The original world wide web page, produced by Cern, depicts a very different paradigm from today’s “web”, describing the pioneering venture as an “information retrieval initiative aiming to give universal access to a large universe of documents”. However, you would be unlikely to easily find things just following the links from one URL to another as you did in Gopher, the predecessor of the modern web. Hence, in the early days of the web, search engines built indexes by “crawling” the network with their “spiders”, looking for content.

That worked reasonably well before artificial “intelligence” went mainstream. But, today, content is generated, “on the fly”, as it were – and the spiders can’t catch these flies. They are no sooner there than gone. This is partly because of the mushrooming glut of low-grade content spam spewed out by aggressively search engine-optimised (SEO) link farms (designed to boost the ranking of a particular website – usually a commercial one – in search engine results).

In a longitudinal investigation of SEO spam in search engines, a recently published Leipzig University study of about 8,000 product queries suggested that all search engines have significant problems with this SEO spam drowning out useful information. Google’s response amounted to, “Well, we’re still doing a little better than the other search engines that are failing”.

The spammers can instruct a large language model, such as ChatGPT, to spin a piece of advertising copy into 100,000 articles, each introducing the product under a different subject lead-in, such as cookery, sports, medicine, pet care, and then create 100,000 websites to host it that are indistinguishable to search engines from organic content even though they are poorly written and full of errors. Troll farms can do something similar to promote disinformation.

Traditional URLs are “universal resource locators” but get replaced by internalised tracking links, which redirect a URL so as to track whether it is shared, and with whom, or shortened URLs, or are simply hallucinated by the “AI” generators, so the number of dead links in circulation is rising. This means that even if Google were the most benevolent of gatekeepers, it still couldn’t sort wheat from the chaff.

But it isn’t the most benevolent of gatekeepers: in common with other search engines, its business priorities have long since turned away from users toward advertisers.

Google’s flaws matter enormously because of its dominance. Despite being only one window into the vast public network of URLs, through which any traveller could freely walk, Google has set itself up as the sole gatekeeper (apart from its wannabee, Bing). We coined the verb “to google”, meaning to consult one special URL as a path to all others – even if even Google’s mighty spider can’t penetrate the worldwide walls built by its Silicon Valley neighbours, the social media companies.

Once Google established its dominance, it’s fair to say that the organic web, and cultural knowledge of roads through it, died. So much so that even in university research methods classes, students still get told to use Google or Bing as first port of call.

But surely this must end given the increasing uselessness of the results these engines throw up. For too long, we’ve free-ridden on commercial applications instead of building solid, home-grown information systems that also serve the public interests. We need to revise the idea of search and reconsider what tools are really best for it. We need to start confronting students with the reality of the internet as it actually is, rather than as it was idealised in the 1990s.

Andy Farnell has been a visiting and associate professor in signals, systems and cybersecurity at a range of European universities. With Helen Plews and Ed Nevard, he now co-hosts The Cybershow, which seeks to restore understanding, safer use and control of everyday technology to ordinary people.

Search engines are failing. Research methods classes need to catch up

Leave a Reply

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Leave a Reply