By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Behind the Scenes of Using Web Scraping and AI in Investigative Journalism | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Behind the Scenes of Using Web Scraping and AI in Investigative Journalism | HackerNoon
Computing

Behind the Scenes of Using Web Scraping and AI in Investigative Journalism | HackerNoon

News Room
Last updated: 2025/05/05 at 9:15 PM
News Room Published 5 May 2025
Share
SHARE

While the work of investigative journalists sometimes involves contacting anonymous sources for hidden information or even going undercover, threads for great stories often lie in open sources accessible to everyone. Due to this reason, web scraping has become indispensable for journalists over the last couple of decades. Recently, developments in AI have provided another way to upgrade the reporter’s toolkit.

Why does web scraping matter to journalists?

Web scraping is the automated collection of data from the Internet using specialized software tools known as web scrapers. As a robust data collection method, it can be used for both good and bad. The general public often hears more about the latter, which fuels the belief that web scraping is something shady that should probably be banned altogether. However, when the case whose outcome threatened to make web scraping illegal appeared before the U.S. Supreme Court, it was journalists who stood up against it. An investigative nonprofit newsroom, The Markup, filed an amicus brief claiming that web scraping is vital to democracy.

This is not an overstatement. In some cases, only web data extraction tools allow journalists to keep government agencies accountable. By scraping public information, investigators can check if the data really supports the official position, report on otherwise ignored anomalies, or uncover negligent data management practices of state institutions.

Additionally, tracking disinformation spread all across the web would not be possible without such automated solutions. Artificial Intelligence can boost this spread by easily generating fake visual and audio content. On the bright side, AI-powered web scraping tools can also monitor, identify, and remove such fakes.

Web scraping also enables journalists to uncover stories from the criminal underground. Here, the work of journalistic and forensic investigators resemble each other. Both types of investigators can use data scraping to detect human trafficking activities and illegal marketplaces.

How to use the latest tech for high-quality journalism?

Investigative journalism today is closely related to data journalism, which uses data as the primary source for investigating and reporting stories. However, not all journalists are data scientists, analytics, or coders. And even for tech-savvy reporters, ways to leverage web scraping and AI tools for journalism don’t always come straightforwardly. A few things might help reporters get started.

Utilize no-code tools

Tools and tutorials are available for those who do not possess coding skills yet believe in the power of data to bring forth relevant stories. Some advocates of scraping in journalism share content online on using such no-code tools and provide tips for leveraging web scraping in investigations and storytelling. For example, one can seek guidance from fellow journalists in the Global Investigative Journalism Network on using free browser extensions like Data Miner to extract data from the web.

Think about the scale

Sometimes, the work of journalists is made harder by the abundance of information rather than lack. This is evident on the Internet, where the truth can be publicly accessible yet drowned in more disinformation than even an army of humans could quickly sift through.

Thus, one way to approach a scraping-based investigation is by thinking about the threads of the stories that would be impossible to follow manually. For example, if you notice some suspicious reporting, you might want to review all the articles written by the same reporter. However, searching for them all manually can be hard and time-consuming. With web scraping, you can quickly discover that the quantity of articles itself proves your suspicions.

This happened when data scraping tools helped show that 38,000 articles published in the same year on the war in Ukraine attribute authorship to the same “journalist.” Thus, real journalists can untangle fake journalism of non-existent persons with the help of proper scraping tools.

Let AI read and connect the dots

While web scraping can help journalists get large data sets, AI tools are well-suited to assist in going through this data. These tools have been used for years to analyze satellite imagery, which would take immense personnel, time, and resources to do manually. Recently, the New York Times utilized AI just this way to reinforce its findings on the bombardment of Gaza.

However, journalistic investigations often involve reading documents and putting the pieces scattered in vast amounts of textual information together. This needed to be done when the International Consortium of Investigative Journalists (ICIJ) got hold of the 11.5 million documents comprising the “Panama Papers.” A few years later, ICIJ collaborated with Stanford AI Lab to find how emerging machine learning (ML) techniques could be enlisted in such projects and quickly learned the value of such mutually beneficial collaborations.

In a more recent case, a Filipino journalist used OpenAI’s feature, allowing you to create agents on top of ChatGPT to build one that helps watchdog journalism. The custom agent can read and summarize many pages of audit reports and other official documents to identify potentially newsworthy angles. Without such solutions, journalists have to spend hours on one report while governments can publish thousands of them every year.

Ethical data gathering and AI usage

The strict ethical guidelines journalists follow when conducting investigations also apply to utilizing data scraping and AI solutions. Journalists are advised to identify their scrapers to the website when possible. In some cases, however, this would ruin the investigation. For example, journalists can only achieve their goals using proxy IPs when monitoring illegal activity on dark web forums and marketplaces. They can only avoid being blocked or targeted by hackers by hiding their real online identity.

Additionally, reporters should be careful about the data they gather and store to avoid breaking laws or leaking sensitive information. In this area, specially trained AI can help manage data-gathering activities so that only important public data is targeted. However, AI itself should never be trusted with final decisions when reporting a story. Ultimately, human oversight, journalistic integrity, and domain expertise remain the most important investigative tools that AI does not threaten to replace.

Conclusion

Data journalism is now a vital part of investigative journalism. Both web scraping and emerging AI technologies boost journalists’ work and help track elusive threads of fascinating stories behind mountains of data. In the future, AI tools will likely be used increasingly more for generating story ideas, catching anomalies, and visualizing the findings, among many other tasks. Meanwhile, the power of web scraping to extract value from public data and reveal what is hidden in plain sight can make it the definitive tool of investigative journalism in the 21st century.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article The Best Mice for Esports in 2025
Next Article Today's NYT Connections Hints, Answers for May 6, #695
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Appian Connected Claims 2.0, an AI solution for insurance
Mobile
Instagram Affiliate Marketing: What You Need to Know in 2025
Computing
Don’t Risk Your Main Laptop, Take This Refurb $200 MacBook Air Instead
News
Recreate Four Seasons luxury at home with the hotel’s infamous down and feather pillows — here’s how to buy them
News

You Might also Like

Computing

Instagram Affiliate Marketing: What You Need to Know in 2025

5 Min Read
Computing

tf.distribute 101: Training Keras on Multiple Devices and Machines | HackerNoon

27 Min Read
Computing

Discord Invite Link Hijacking Delivers AsyncRAT and Skuld Stealer Targeting Crypto Wallets

7 Min Read
Computing

Prices cut on more than 200 car models in China this year: expert · TechNode

1 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?