Generative AIs are undoubtedly resulting in a surge of spammy content farms that can overtake social media and search engine results. The existence of these farms is one thing, but how easily they are surfaced by Google and other search engines is another. It's a cat-and-mouse game for search engines to detect these sites, and the challenge of distinguishing genuine content from generated muck is only going to get harder.
The phenomenon of using AI to generate articles isn't new, nor is it limited to no-name sites. In January, The Verge wrote a piece documenting CNET's use of artificial intelligence and their lucrative SEO machine.
In this post, I will be documenting a couple new examples of what the problem looks like and the downwards path we're on.
On July 20, 2023, reddit user C0RDE_ made a post on the World of Warcraft subreddit to share their Google Discover feed. The feed is filled with World of Warcraft-related articles from a site called Z League (aka "The Portal"). These articles appear to have been auto-generated from recent content shared to that subreddit.
Unsurprisingly, The Portal took that very post and generated a now-deleted article about itself titled "World of Warcraft (WoW) Players React to AI-Generated Content on Popular Gaming Sites". A Wayback Machine archive of the article is available here.
Late that day, another user made a bait post about how excited they are about the introduction of the fictitious "Glorbo". And of course, The Portal published another now-deleted article about it, which likely showed up on the Google feeds for many unsuspected World of Warcraft players.
The so-called "author" of this article, Lucy Reed, published over 80 articles about dozens of games on July 20th.
This very same day, travel blogger Leslie Harvey posted a thread on Twitter about deception in her field. A site called Family Destinations Guide features hundreds of articles and lists. Harvey wrote,
"The site Family Destinations Guide purports to have dozens and dozens of local writers with personal destination experience. Each has a lengthy text bio about their travels, their home, & their experience - along with a funny & personal 'embarrassing travel moment' story."
As it turns out, these authors aren't real people.
"They don't have LinkedIn profiles or write for other publications. They don't appear in other photos on the Family Destinations Guide site. Nothing else shows up when you Google them. Why? They aren't real people."
While I haven't seen any confirmation, I wouldn't be surprised at all if this so-called "Family Destinations Guide" also uses large language models to generate the bulk of their hundreds of articles. And these articles rank very well in Google search results. Harvey ends with a call to action:
Read and publish carefully out there, folks. And call out this kind of deception when you see it. Because so far I can tell you that Google isn't very good at telling the difference.
These two incidents were surfaced to me in just an evening of scrolling. Content mills have always been a thing, but thanks to the arrival of ChatGPT and other similar AIs, we are going to see a huge rise in machine-generated "news" and "guides". Human writers who put in the hours to create genuine, meaningful content are going up against thoughtless machines that can churn out articles by the minute and be viewed favorably by Google.
I'm not sure how to best solve this issue. But here are some ideas:
- Make it a norm (or, wherever possible, a legal requirement) for publishers to clearly disclose when AI is used to generate content.
- Come up with tools that can detect AI-generated content. There is already a lot of work going on in this area, including GPTZero, a tool written by a college student to detect AI-generated essays. This will, however, never be perfect, and false-positives can be damaging (like when a professor tried to fail his students after incorrectly concluding they had used ChatGPT).
- Create extensions and other tools that can serve as blacklists. Earlier this year, I created Indie Wiki Buddy, a browser extension that filters out corporate wiki farms from search engine results and redirects people to quality, independent wikis. Can we have similar tools to warn people when they are being exposed to mass-generated content? Can the community maintain lists of content farms to blacklist?
- Finally, support your favorite creators and writers. Follow them and boost them across social media. If they offer a subscription and you have the means to do so, subscribe.