Six years ago generative text was a novelty. Today it writes student essays, news articles, marketing copy, and social-media threads at indistinguishable-from-human quality. This is the short history of how we got here — and why detection moved from academic research to everyday practice.
Pre-GPT-3 generative text was mostly a research curiosity. Markov chains, recurrent neural networks, and the earliest transformer-based models could produce coherent sentences but fell apart at paragraph length. A short sample could fool an inattentive reader; a full document never did.
AI-detection research existed but was niche. Papers like Zellers et al.'s Grover (2019) built detectors for GPT-2-era fake news but the practical demand was low — the volume of machine-generated text in circulation was minimal. Detection was a solution looking for a problem.
Three things changed simultaneously in 2020–2021: model scale crossed the billion-parameter threshold (GPT-3 at 175B), training data crossed the trillion-token threshold, and OpenAI opened API access with a simple, human-readable prompt interface. Text generation moved from research labs to anyone with a credit card.
ChatGPT launched in November 2022 on top of GPT-3.5 and acquired 100 million users within two months — the fastest consumer-product adoption in history. Within six months, student submissions, marketing copy, and customer-service scripts had measurably shifted toward LLM-generated content.
Educators noticed first. By spring 2023, every major university had an emergency AI policy meeting and many had mandated temporary AI-free assessment formats (in-class exams, oral defences). The detection tool market exploded — Originality.ai, GPTZero, Copyleaks AI, and a dozen others launched within 12 months of ChatGPT's release.
The pattern repeated in publishing. AI-generated articles flooded content farms and were detected by ranking algorithms; Google rolled out the helpful-content update specifically to deprioritise low-quality AI output; news publishers issued author-disclosure policies; academic journals required AI-use disclosures in author statements.
The first AI-detection tools achieved moderate accuracy on GPT-3.5 output. Vendors published AUC numbers in the 0.85–0.95 range on standard benchmarks. Within six months, humaniser tools emerged explicitly targeting these detectors — Undetectable AI (Oct 2023), StealthWriter, Humanbeing — offering paraphrasing services priced per 1000 words.
Detection vendors responded by retraining on humanised samples. Humaniser vendors responded by training against the new detectors. The arms race cycle tightened from months to weeks. By mid-2024, no publicly-deployed detector could honestly claim stable accuracy without continuous retraining against humaniser output.
Meanwhile, generator sophistication accelerated. GPT-4 (March 2023), Claude 3 (March 2024), Gemini 1.5 (Feb 2024), Llama 2/3 (July 2023 / April 2024), Mistral releases — each generation was measurably harder to detect than the previous. Detection became a moving-baseline problem.
As of 2026-04, the detection landscape has reached a rough steady state. Production detectors — including ours — achieve AUC in the 0.95–0.99 range on in-distribution academic text, dropping to 0.85–0.92 on frontier models (GPT-5, Claude 4.5, Gemini 2.5) until retraining catches up. See our accuracy benchmark for current per-generator numbers.
The tools that survived the 2023–2024 shakeout are the ones that treated detection as a continuous-retraining problem from day one. Vendors that shipped a one-shot model and called it done have quietly faded. The market has consolidated around a handful of providers with ongoing research investment — us, a small number of specialist vendors, and the detection features embedded in major plagiarism-detection platforms.
The user landscape has also stabilised. Educators have published policies; publishers have disclosure requirements; search engines deprioritise low-quality AI; social platforms label AI-generated content. Detection is now routine, not exceptional — embedded in workflows rather than run ad-hoc.
Try our AI & Plagiarism Checker on any text. Real numbers, real per-sentence verdict, no signup.
Two trends dominate the 2026–2027 outlook. Multi-modal evidence: text-only detection will be joined by typing-dynamics analysis, edit-history verification, and authorship-consistency checks against a known writing corpus. The pure-text score becomes a voting member in a richer decision.
Watermarking at generation time: OpenAI has deployed experimental text-watermarking in some GPT interfaces. If watermarking becomes standard across major providers, detection shifts from probabilistic inference to cryptographic verification. This is a fundamental architectural change and would reduce the value of statistical detection for watermarked models — while leaving open-weights models entirely in statistical territory.
Neither change eliminates the need for text-based statistical detection. Open-weights models will continue to generate un-watermarked text. Multi-modal evidence requires data that many workflows don't capture. Statistical text detection will remain the first-line defence for the foreseeable future — our commitment is to keep that line honest and current.
This is a historical overview intended to situate current AI-detection practice. Specific dates and product references reflect the 2026-04 state of the field. Consult the individual tool and generator vendors for authoritative timeline data.