How AI Text Detection Actually Works
I'll be honest — when I first started looking into AI detection, I assumed it was basically magic. You paste text, the algorithm does something opaque, a number comes out. I didn't question it too hard because the numbers seemed roughly right. Then I actually dug into how these tools work, and I came away with a much more complicated opinion.
The short version: AI detection is real, it works better than chance, and anyone who tells you it's either foolproof or useless is oversimplifying. The longer version is more interesting.
The two approaches — and why most tools use the wrong one
There are two ways to detect AI text. The first is statistical: measure properties of the writing — how predictable the word choices are, how uniform the sentence lengths, how much the vocabulary varies. These signals come from how language models work at a mechanical level. LLMs generate text by picking statistically likely continuations. That leaves fingerprints.
The second approach is behavioral: look for the presence or absence of things that human writers do naturally. Do opinions shift mid-argument? Are there specific details that couldn't have been invented? Does the writer seem to be figuring something out, or just delivering a pre-formed answer?
Most commercial detectors lean almost entirely on the statistical approach because it's easier to quantify and faster to compute. That's a mistake, in my view. Statistical signals are real, but they're also the easiest to game — either accidentally, by humans who write in structured styles, or deliberately, by prompting models to be less predictable. Behavioral signals are harder to fake. They're also harder to measure, which is why most tools don't bother.
What perplexity actually measures
Perplexity is the signal everyone in this space talks about. The concept is straightforward: given what came before in a sentence, how surprising is the next word? Language models assign probabilities to every possible next token. Low perplexity means the text was predictable — each word followed naturally from the last. High perplexity means the text was full of unexpected choices.
Human writers have higher perplexity. Not because we're trying to be unpredictable, but because we're not optimizing for statistical safety. We use the weird word that fits better. We start a sentence in an unusual place. We reference something specific that shifts the expected trajectory of the paragraph.
The catch — and this took me a while to really internalize — is that perplexity is measured relative to a specific model. A sentence that's low-perplexity to GPT-4 might be high-perplexity to a smaller model. This means detector accuracy depends heavily on which model it's calibrated against, and as models improve, older detectors become less reliable. It's a moving target.
The behavioral signals that actually matter
Here's what I find genuinely interesting about behavioral detection: the signals aren't arbitrary. They reflect something real about how human cognition shows up in writing.
Take opinion drift. When a human writer works through an argument, their thinking often changes in motion. They start a paragraph committed to one position and end it somewhere slightly different, because the act of writing clarified something. AI doesn't do this. It commits to a conclusion before the first word and executes toward it. The result is writing that's technically logical but lacks the texture of actual thought.
Or take specificity. Human writers reach for concrete details — a particular number, a specific place, a named person — in ways that feel autobiographical rather than illustrative. AI reaches for illustrative details, which are subtly different. They serve the point without anchoring it to reality.
These are the signals Telltale Proof weighs most heavily, and honestly, they're the ones I trust most. They're harder to fake, harder to prompt away, and more directly connected to whether an actual mind was engaged in the writing.
Why no detector is 100% — and why that's okay
I want to push back on something I see a lot in coverage of AI detection: the framing that detectors are only useful if they're perfect. That's not how we evaluate any other diagnostic tool.
A doctor reading an X-ray isn't right 100% of the time. A fraud detection algorithm isn't right 100% of the time. The question is whether the signal is better than chance, whether the errors are systematic in ways you can account for, and whether the tool is honest about its limitations. Good AI detectors can absolutely meet that bar — if they explain their reasoning instead of hiding behind a single score.
That's the philosophy behind Telltale Proof. Not "here's your verdict," but "here are 32 signals, here's what each one found, make your own judgment." I think that's a more honest way to do this — and ultimately more useful than false confidence in either direction.