A Cloudflare report ensures that the AI conversational search engine is eluding the restrictions designed to prevent its web tracers from accessing certain sites. The report envives concern about the content collection without permission of AI and credibility systems to those who use this type of practices.
AI is insatiable and consumes an overwhelming amount of resources. Computations and energy mainly, but also data for training, learning or searches. And not everyone is obtained legally or transparently. Without possible regulation, the sector is becoming a jungle and the scraping of Iathe technique that uses artificial intelligence to automatically extract data from digital sources, is out of control.
Perplexity is in the eye of the hurricane
It is not the first time that perplexity is accused of this type of practices. Last year the company was discovered by paying the payment walls and ignoring the robots.txt files of the sites. At that time, its executive manager, Aravind Srinivas (Ex-Openai) attributed it to the activity of the external trackers used by the site.
Now, Cloudflare, one of the world’s largest Internet architecture suppliers, says he received complaints from customers who affirmed that perplexity bots maintained access to their websites even after putting their preference on the Robots.txt.txt file of their websites and creating web applications firewall rules (WAF) to restrict access to the Startup bots.
Cloudflare does not accuse blindly. The company claims to have performed a series of tests and experiments to determine if Perplexity really tried to avoid the limits established by the owners of the different websites that it collected. To try itcreated new domains with similar restrictions against Perplexity’s tracks, discovering that the firm first tried to access the sites identifying with the names of its trackers: “Perplexitybot” or “perplexity-uuser.”
But if the website had restrictions against the scraping of AI, Perplexity changed its user agent (The information that tells a website what type of browser and device is using, or if the visitor is a bot) for “Supplant Google Chrome’s identity in Macos”. Cloudflare says that this “undeclared tracker” uses “rotary” IP addresses that the company does not include in the IP addresses list used by its bots.
In addition, Cloudflare states that Perplexity modifies your autonomous systems networks (ASN), a number used to identify groups of IP networks controlled by a single operator, to overcome blockages. “This activity was observed in tens of thousands of domains and millions of daily applications”the researchers say.
Cloudflare has eliminated perplexity from its verified bots list and has implemented methods to block its hidden tracking. Last month, the infrastructure supplier began allowing websites to ask the AI companies to pay to track their content and began to block the default AI trackers. This is a jungle …