Can Google Detect and Does it Penalize AI Content

Inhalt

On March 5th Google announced a massive update would be occurring that would aim to reduce unhelpful content by 40%. This update started with a significant round of manual actions being applied to sites and having them be completely deindexed. These manual actions are occurring in conjunction with this update but are separate from the algorithm update that is to be rolled out over 2 to 4 weeks.

We have completed 2 significant studies in the last 24hrs to help people understand these updates and both are included here. 

  • Study 1 - List of Websites That Had a Manual Action in March 2024
    • 1,446 websites had a manual action applied to them out of 79k websites checked in March 2024.
  • Study 2 - Was AI Content Spam to Blame for the Manual Action?
    • 100% of the websites had some posts that were AI-generated
    • 50% of the sites had 90%-100% of their posts as AI-generated

In this post we are going to cover…

  1. Overview of the Google update
  2. What a Manual Action and Site Deindexation is
  3. The most complete list of sites that were impacted
  4. What we have learned about the sites that were impacted

To see the methodology see each section below. 

If you have any questions, want access to the data or are looking to extend this research reach out to [email protected] 

Overview of the March 2024 AI Content Manual Action Update

This update from Google was announced on March 5th is aiming to penalize…

  • “Scaled content abuse”
    • Aiming to stop mass publishing content spam (who does content spam and doesn’t use AI in 2024 and beyond?). 
  • “Expired Domain Abuse”
    • Aiming to stop people from repurposing expired domains for SEO benefit 
  • “Site Reputation Abuse”
    • Aiming to stop reputable websites from manipulating the search engines by having a portion of their site publishing Parasite SEO articles. An example of this is the Sports Illustrated Parasite SEO example. 

There has been more communication from Google of this update in its first days than most other similar updates.

**Coverage of the Mar 2024 Update:**‍

Google Blog Posts:

It is important to note again that manual actions are not the same as the algorithm updates that will also be occurring. 

What is a Manual Action and Site Deindexation

If Google has identified a site that does not meet its guidelines it can apply a “manual action” and completely remove it from its search results (aka deindex the website). 

Sites started receiving an increasing number of these notifications on March 5th in their Google Search Console Manual Action dashboard…

Google Shows a single issue detected in Google Search Console's Manual Action Tab

The consequence of these manual actions appears to be a complete removal from Google's search results. 

Indexation checking using site operator

Now what do we know about the sites that were impacted…

Study 1 - List of Websites That Had a Manual Action in March 2024

In an attempt to better understand the extent of this manual action effort we have completed a study to identify content websites that have been deindexed by Google who had until very recently had Google Organic traffic.

This study focuses on Content first websites not ECommerce or other types of sites.

Findings:

  • Manual Action Applied to Over 1,446 sites that were on MediaVine, Raptive or Ezoic
  • Of the approx 79k sites checked 1.9% of them had a manual action applied to them 
  • Cumulative traffic loss estimated at over 20 million visitors/month
  • 3 Websites With Over 1 Million Organic Visitors Per Month to Zero 

Manual Action Applied to 2% of websites from 1446 deindexed sites.

DR of Sites with Manual Action:

Histogram of Ahrefs DR for sites with manual action

3 Websites With Over 1 Million Organic Visitors Per Month to Zero

  • zacjohnson.com
  • beingselfish.in
  • equityatlas.org

Checking a websites traffic overview in Ahrefs Dashboard.

Ahrefs overview for beingselfish.in after taking manual action by google.

Another Ahrefs overview of traffic decreasing.

Methodology:

Here is how we completed this study.

Summary: Identified a list of 79k websites that are a better-than-average reflection of the internet, checked if they are currently indexed and if they were not indexed, we checked 2 sources (AHrefs and SImilarWeb) to verify if they recently had organic traffic. 

If a website recently had organic traffic (February) but is now not indexed in Google, we assume a March 5th Update Manual Action was applied.

  1. We created a list of all URLs that are displaying ads from popular advertising providers using BuiltWith lists. These platforms were selected because they apply a minimum standard to allow sites to be added to their platform, AdSense sites were not analyzed due to the volume of low-quality sites. This list should (arguably) be a better than average representation of all websites online and potentially underestimate the % of websites that received a manual action. Here are the number of websites checked based on the company providing the advertising:
    1. MediaVine: 21,808
    2. Raptive: 6,428
    3. Ezoic: 51,293
  2. 18 duplicates were removed
  3. Checked each URL to see if it was deindexed by searching in Google “Site:URL”
  4. For every site that was deindexed we checked February Organic traffic numbers using AHrefs and SimilarWeb. 
  5. We confirmed our method captured some of the publicly shared sites are captured in this method (which they were)

Dataset:

If you want access to the dataset of the deindexed websites including ahrefs and similar web data please reach out to [email protected]

Study 2 - Was AI Content Spam to Blame for the Manual Action?

Many media outlets were quick to jump to the conclusion that this update is aimed at squashing AI spam in Google's search results… 

Wired.com publish a post on google is finally trying to kill ai clickbait

Many SEO’s on X agree…

Brendan Oconnel agree with google manual action to remove ai clickbait

But using our AI Checker we wanted to do a more rigorous analysis. 

We looked at 100 recent articles for each of the deindexed sites that have already been publicly shared to see the prevalence of AI Content on the sites that received a manual penalty. 

Findings:

  • 100% of Websites that Had Manual Action showed signs of using AI
  • 7 of the 14 websites we analyzed had over 90% of their sample articles 

Parentages of websites has published ai content

7 out of 14 websites had 90%+ ai generated content

Presence Of Ai Content in Websites that had a manual action

Websites list that had used a percentage ai generated content

Methodology:

  1. Identified sites that were deindexed and whose URL was already disclosed on X
    1. fresherslive.com
    2. qmunicatemagazine.com
    3. hnbgu.net
    4. zacjohnson.com
    5. newsunzip.com
    6. Bognor.news
    7. popularbio.com
    8. popularnetworth.com
    9. bioofy.com
    10. istaunch.com
    11. healthyceleb.com
    12. GoDownSize.com
    13. networthpost.org
    14. tvguidetime.com
    15. thesocialtalks.com
    16. juliangoldie.com
    17. chipperbird.com
    18. EquityAtlas.org
    19. filmifeed.com
  2. Scraped 100 of the most recent posts that are over 100 words long
  3. Several sites were excluded:
    1. juliangoldie.com - no longer fully deindexed - 10 pages are in the index (but publicly admits to using AI)
    2. chipperbird.com - unable to get content (but the website owner publicly admits to using AI)
    3. equityatlas.org - unable to get content
    4. filmifeed.com - unable to get content
    5. thesocialtalks.com - unable to get content
  4. Ran each article through our AI detector (detector efficacy) using model 2.0 Standard on March 7th
  5. Completed an analysis for each site to identify the Average AI score and the % of articles suspected of being AI-generated. 

Was AI Content to Blame for Sites Being Deindexed By Google?

The short answer is that yes… after analyzing 200 sites and over 40k URLs with our AI checker it is clear that the vast majority of sites which received a manual action were likely using AI content. 

With the March 5th update Google deindexed almost 2% of all sites on popular advertising platforms like MediaVine, Ezoic and Raptive. Some of these platforms like MediaVine have taken a proactive No AI Content stance immediately after these manual actions (source).

Mediavine Policy on ai generated content

At the time of the update we analyzed the 14 publicly revealed websites and identified that all of them had some amount of AI content on them. This got a lot of discussion going…

Lot's of discussion on Twitter's happening after google manual action on ai-content

But… we wanted to do a more thorough analysis and go WAY deeper than just the 14 sites that had already been revealed at the time.

Methodology:

  1. Identify 200 sites with the most traffic that were deindexed
  2. Identify around 200 recent articles per site 
  3. Checked for AI content on over 40k URLs using Originality.ai AI detector**’**s URL APIsome text
    1. Removed error’s 
    2. Removed sites with less than 25 articles checked
  4. We were then able to analyze the content of 30,614 URLs across 175 websites

Key Findings:

151 of the 175 sites had likely published AI content

The vast majority of the sites deindexed appear to have had some AI content published on them. Using the AI threshold as a conservative 5% (false positive rate is < 3% in almost all datasets tested).

Percentage Of Suspected Ai Articles on Deindexed Sites

51 of the 175 were Pure (95%+ AI) AI Generated Content Sites

Some sites seem to have taken a mixed approach to both AI and Human written content while some of the sites were clearly showing nothing but AI generated content. 

51 of the sites or ~30% of the sites were pure AI generated content.

51 out of 175 sites had over 95% ai generated content

Other Similarities with Deindexed Sites:

  • Aggressive Ads
  • Content Only Sites

But these findings could likely be a sampling bias since the list of sites was taken from MediaVine, Raptive and EZoic (all ad networks popular amongst web publishers). 

Is This Amount of AI Content Normal?

These findings would not lead us to blame AI content for the sites being deindexed IF this amount of AI content was consistent across the rest of Google’s SERPs. 

We have an ongoing study looking at the content for a webpage in the top 20 search results for 500 different keywords dating back 60+ months. 

Conclusion

The risk of AI-generated spam overwhelming Googles search results is an existential threat to Google. This seems to be a clear attempt by Google to not just punish but also make a statement about their view on AI generated spam.

Zusammenfassen
Google announced a major update on March 5th to reduce unhelpful content by 40%, involving manual actions deindexing sites. Two studies were conducted: one identified 1,446 sites with manual actions in March 2024, and the other found AI-generated posts on all sites and 50% with 90-100% AI content. The update targets scaled content abuse, expired domain abuse, and site reputation abuse. Manual actions can lead to complete removal from Google search results. Over 1,446 sites were deindexed, resulting in a traffic loss of over 20 million visitors/month. Notable sites affected include zacjohnson.com, beingselfish.in, and equityatlas.org. The update aims to penalize mass content spam, repurposing expired domains, and reputable sites publishing Parasite SEO articles. Manual actions are separate from algorithm updates, with Google providing more communication on this update than previous ones.