AI Content·Nov 12, 2024

The problem with AI detection tools: Are they accurate? (with tests)

Amanda Johnson

Unreliable AI content detectors can have negative impacts on marketing teams

Earlier this year, I had a conversation with content professional Meenakshi Nautiyal via the Women in Tech SEO slack community.

She was struggling with a client’s reliance on AI content detectors, and it was impacting her ability to make progress for her client.

Long story short, the organization had created too many AI-generated generic content pieces, and their site had paid the price.

Meenakshi was tasked with helping the site recover by reworking them, optimizing them for higher quality, and better highlighting her clients products and services in their content.

But the client required the use of AI detection tools once an article had been completed.

The problem?

Even though she and her writing work partner were rewriting the older articles, the tool they were using was showing that their drafts were 98% likely AI content—even though they weren’t.

Meenakshi explained to me, “It's tricky situation of wanting to change the AI content, but then stakeholders see [the AI detection tool results] and say it still shows its AI... then why are we even reworking them?”

Meenakshi’s concerns represent just one story, but you’d be surprised how often I’ve seen discussions about this issue in SEO communities.

As someone who’s spent years now in the content marketing and SEO trenches, I’ve seen trends come and go—some helpful, others more hype than substance.

There’s no doubt that artificial intelligence and machine learning has changed the way we approach marketing and content generation, from brainstorming to drafting.

But when it comes to accurate detection of human vs. AI-generated text or measuring the quality of your content, these AI writing detection tools don’t quite hit the mark.

If they’re going to go the distance and enter into the “future of AI” with us all, they’ve got some growing to do.

Below, I’ll break down:

Balancing stakeholder requests for AI content detection software
Whether Google is actually penalizing AI content—or not
5 real-world examples of reliability issues (I ran these tests between 15-20 times so you don’t have to)
What your team can do to mitigate concerns and strengthen content quality
Why Clearscope doesn’t use AI detectors and why we don't offer one
Alternative solutions to traditional AI checkers that help produce quality content

But you need to know...

Are AI detection tools accurate? Does it even matter if you use them?

Unfortunately, AI writing detector tools regularly produce false negatives.

And because of the high occurrence of false negatives and errors when distinguishing between human-written text and AI-generated content, they can cause huge headaches for SEO and content practitioners who are encouraged or required to use them.

And that's no good.

TL;DR? AI checker tool test results

My tests show that if your team uses these tools on unedited, purely AI-generated copy without any editing, voicing guidance, or style direction at all, the tool will flag it as AI-generated content.

Pass / fail results of AI detection tool test.

I also walk you through the full test below, but if you're in a rush, here's a quick video synopsis.

I'll walk you through my full test results here. However, find details of the tests below.

Here’s why AI writing detector tools are flawed

AI writing detectors aren’t reliable—much to the bane of many content marketers and SEO strategists whose leadership or clients require their use.

Remember, AI language models have reached the advancements that we’ve seen from technologies like Claude, Perplexity, Gemini, OpenAI (GPT-3, GPT-4, and beyond) because they’re trained on us.

The humans! (We did this to ourselves, y’all!)

Even the best AI detectors rely on machine learning models to classify what’s human-generated and what isn’t.

And let’s be honest—AI chatbots and AI writer tools get facts wrong or give bad advice frequently.

So why wouldn’t their ability to detect themselves be just as flawed?

Also...

Google doesn’t necessarily penalize AI generated content

Google takes the stance of “rewarding high-quality content, no matter how it’s produced.” (Source)

Instead of flagging what is AI and what isn't, Google likes to showcase content that follows EEAT guidelines to its users—E-E-A-T stands for expertise, experience, authoritativeness, and trustworthiness.

However, using automation, like AI, to create SEO content solely for the purpose of manipulating search rankings is against Google’s spam policies, and you could be penalized. Google gives more guidance here.

So even if Google doesn’t “punish” AI-assisted or AI-generated content explicitly at this point, unless it seems crafted to manipulate search rankings, they will reward original or high-quality content.

Hold up...

What if your stakeholders shy away from AI content and require the use of AI detectors, even for original, human-written content?

If Google isn’t penalizing AI-assisted content that meets EEAT guidelines, what is a marketer to do in this situation? (More on that later in this article.)

READ MORE:

Do AI detectors work? 5 tools tested to answer the question

Below, I ran a few tests using content from this article to determine if AI detection tools are accurate.

The 5 tools I use below are often recommended to clients who reach out to us for AI detection support.

(Although our team does the disclaimer that we feel AI writing detectors are unreliable, these are some of the most trusted our team has found.)

I used free accounts for each of these tests below, and I have not received any payment to include them.

However, while I didn’t test turnitin, gptzero, and copyleaks, I’ve also heard that some folks like to use these for their checks.

If you’ve tested a tool extensively and found it has reliable results, connect with me on LinkedIn and let me know (send screenshots!).

Quick note: This guide has no affiliate links to AI content writing tools or generative AI content detector tools, and I’m not selling any courses or templates. It’s purely educational content to support you and your team.

If you’re not an SEO or content marketing professional, you might still find value here, especially if you’re checking student work for academic integrity. For more on that, consider this MIT article on the topic for those in school settings or higher education.

Test #1: Giant Language Model Test Room (GLTR)

GLTR is one of Clearscope founder Bernard Huang’s favorites.

It “detects” AI content, but not in the way that many content marketers might expect, since it doesn't give you a percentage reading or “pass/fail” grade.

The free GLTR demo allows you to visually analyze text to check if it was generated by a language model or written by a human.

It’s a joint project by Hendrik Strobelt, Sebastian Gehrmann, and Alexander Rush from the MIT-IBM Watson AI Lab and Harvard NLP. (Thanks y’all!)

Here’s how it works: You drop in your copy, and the presence of more red and purple occurrences equals more human-like intervention.
But here’s the problem: I dropped in my human-crafted introduction to this article... and look at these results. There’s very little red and purple indicators of human-like writing. And when I dropped in my AI-generated conclusion to this article? I got about the same results.

**GLTR test #1:** My **human-written introduction text** doesn't show very much human intervention, as indicated by red and purple occurrences.

I promise you, I’m a real human and not a natural language processing (NLP) content generator — and I actually wrote that introduction myself.

(Although, semantically and technically, all human writers are living, breathing NLPs. But that’s besides the point here.)

Here’s a piece of purely AI-generated copy dropped into the same tool—this is the AI-generated conclusion for the first draft of this article.

**GLTR test #2:** My AI-generated conclusion text shows approximately the same result.

If I was required to give these results as proof to stakeholders that I wrote or contributed to this content, how would this serve them?

It wouldn’t.

While this tool is pretty neat and can give us some insights, it’s not a reliable measurement of AI writing vs. human writing.

Test #2: Originality.AI

I personally like Originality.AI myself because it also has a plagiarism checker.

And it’s a good way to ensure any research incorporated in a draft is properly sourced with citations if you’re working with junior writers.

Here’s how it scores your content: Green indicates original, human-written text, while red indicates AI-written text.

It boasts being the “most accurate AI detector,” but it also had a false reading when I did my test.

Here, I dropped in content from my human-written introduction, and of course, I was pretty pleased with the score.

**Originality.AI test #1:** My human-written introduction text doesn't show high indications of AI use.

However, when I added my completely AI-generated conclusion from this article’s first draft, it also read as unlikely to be AI.

I also had to add more text before I could run the test, so I added my bullet point list that was AI-generated from the top of this article.

**Originality.AI test #2:** My AI-written conclusion text and bullet point copy doesn't show high indications of AI use.

Here’s what Originality.AI has to say about AI detectors working.

I used ChatGPT to generate the conclusion content I tested.

But to be fair to Originality.AI, I have sharply trained my ChatGPT. (More on that below in my final test.)

Originality.AI states that ChatGPT-generated text can be detected with high accuracy.

Test #3: QuillBot

Quillbot is another one that allows you to test their product via a free account.

What’s nice about this tool’s approach is that you’re given a percentage reading.

Here are my results when I input my human-generated introduction—so far so good.

**Quillbot test #1:** My human-written introduction text doesn't show high indications of AI use.

And here are my results when I input my AI-written conclusion.

I also had to add more text before I could run the test, so I added my bullet point list that was AI-generated.

It reads as 0% AI, although this copy is all 100% AI-generated:

**Quillbot test #2:** My AI-written conclusion text and bullet point copy doesn't show high indications of AI use.

Test #4: Merlin

Merlin has an AI checker that’s also free to try. And I get the same exact results.

My human-written intro is read as 100% human.

And so is my AI-written conclusion draft.

**Merlin test #1:** My human-written introduction text doesn't show high indications of AI use.

**Merlin test #2:** My AI-written conclusion text and bullet point copy doesn't show high indications of AI use.

Test #5: ZeroGPT

ZeroGPT is my final test, and again, it’s free for you to try out for yourself.

Same results here.

My human-written intro is read as 100% human.

And so is my AI-written conclusion draft.

**ZeroGPT test #1:** My human-written introduction text doesn't show high indications of AI use.

**ZeroGPT test #2:** My AI-written conclusion text and bullet point copy doesn't show high indications of AI use.

Conclusion: I gave up, then went wild

Okay, every bit of copy I fed these tools was read as human, even if some of it was purely AI generated.

I then generated some additional, fresh quick copy from one of my trained GPTs about Google’s EEAT guidelines.

I ran it through each of these tools, and they indicated that it was about 60% AI generated.

I was getting really frustrated by this point. So I wanted to try something different.

For a final test, I created an educational piece of text about Google’s Knowledge Graph via ChatGPT—without using a custom GPT or any training or prompts on sentence structure, writing style, sentence length, or word choices.

I left all the text generation as-is, and I didn’t change anything.

Next, I threw that in all the tools.

And the technology agreed: It was, in fact, completely from an AI text generator.

Finally. They were right.

However, I wasn’t done yet.

I became hyper-curious: What if I, the human, wrote a piece of content in the exact style of ChatGPT, without the use of any AI content?

What would the checkers say then?

I really put my “mastered the art of brand voice” skills to work on this one.

So I wrote up about 100 words on the topic of SEO without doing any brainstorming or research with AI.

Then, I popped my copy into ChatGPT to repair it for any grammatical issues that would give my humanity away.

The results were mixed. (Video below.)

GLTR: This tool decided the majority was AI-generated with some human writing involved. (Remember, the presence of more red and purple occurrences equals more human-like intervention.)
Originality.AI: Boasting as the most accurate detector, this one rated my ChatGPT-style human-written content as AI generated.
Quillbot: Quillbot rated this text at 63% AI-generated. While it was completely human written, I did use ChatGPT's style and ran it through for grammar and sentence structure issues before testing. So this still seems like a fair rating.
Merlin: Merlin believes 0% of my content written in the style of ChatGPT is likely to be AI-generated.
ZeroGPT: Same with ZeroGPT here. The tool scored my content as 0% AI generated.

In conclusion, if you’re using these tools on your team and your writers are using pure AI copy without any editing or style direction at all, the tool will likely flag it as AI content generation.

Here’s what our team thinks about AI detection tools

At Clearscope, we’re laser-focused on guiding you in creating high-quality, user-first content that ranks.

We don’t use AI detection software because we believe it’s an unnecessary step in the content production process.

But you’d be surprised at the amount of requests we get to add AI detector tools as a new feature to our software...

But we’re not going to do it. Here’s why:

Because of the high occurrence of false positives, they’re unreliable. Plain and simple.

Plus, whether or not your team has used AI in the content creation process doesn’t add value or assist with creating better SEO content.

Want to create better SEO content? Do the following:

Abide by Google’s EEAT guidelines
Incorporate information gain into your content, so it can stand the test of algorithm changes
Write with unique perspectives on your topic (ie, Ranch-style SEO)
Sharply match your target audience’s search intent, including content formats
Create entity-rich, people-first content that gets noticed by search engines
Hire experienced editors and writers who are trained in fact-checking to avoid misinformation

Better alternatives to AI writing detection tools

Most of the questions we get about AI writing detection software come from clients who are required to use it by their stakeholders, managers, or clients.

It’s our responsibility as marketers to educate our clients and teams about why AI checkers don’t work—and why AI-assisted content isn’t something to be afraid of.

Here are some great alternatives to making your team jump through the hoops of unreliable AI checkers:

Plagiarism checkers
Outlining strict AI use guidelines in your SEO content briefs—and sticking to them
Training your AI writing tools in the style and semantics of your brand voice
Monitor your content success by writer, evaluating performance based on effectiveness, rather than just AI detection or output quantity (Clearscope can help you do this easily with Custom Content Views)

So, do AI content detectors matter for SEO?

You already know what I think if you’ve made it this far.

But in my experience—and in the experience of our broader Clearscope team—AI content detection tools don’t matter so much in creating a successful, impactful SEO results.

It’s not whether a piece of content is flagged as AI-written that should concern you.

It’s whether that content follows Google’s EEAT guidelines (Experience, Expertise, Authoritativeness, Trustworthiness) and provides real information gain for your audience.

That’s what drives results.

Need some convincing? Run some tests for yourself.

If you’re a Clearscope user, you can create custom tags and content views in your Content Inventory to monitor tests easily.

You can monitor your pages by writer, topic, stage of the funnel, or whether you used AI or not.

That way, you can easily track your results and make adjustments ASAP to increase your organic visibility.

Amanda Johnson

Senior Marketing Manager at Clearscope

The problem with AI detection tools: Are they accurate? (with tests)

Table of Contents

Join our newsletter

Unreliable AI content detectors can have negative impacts on marketing teams

Are AI detection tools accurate? Does it even matter if you use them?

TL;DR? AI checker tool test results

Here’s why AI writing detector tools are flawed

Google doesn’t necessarily penalize AI generated content

Do AI detectors work? 5 tools tested to answer the question

Test #1: Giant Language Model Test Room (GLTR)

Test #2: Originality.AI

Test #3: QuillBot

Test #4: Merlin

Test #5: ZeroGPT

Conclusion: I gave up, then went wild

Here’s what our team thinks about AI detection tools

Better alternatives to AI writing detection tools

So, do AI content detectors matter for SEO?

Amanda Johnson

How to implement Ranch-Style SEO: 7 Successful frameworks

How to add information gain to your content: 3-phase plan

Who owns your AI generated content? Can you copyright it?