If English is not your first language and an AI detector flagged your paper, the research is on your side. Multiple studies have now documented that AI detection tools, including Turnitin, misclassify writing from non-native English speakers as AI-generated at substantially higher rates than writing from native speakers. Here is what the published evidence actually shows, and how to use it.
The foundational Stanford finding
The most widely cited study on this question comes from a team led by James Zou at Stanford University. The paper, GPT detectors are biased against non-native English writers (Liang et al., 2023), was published in the Cell Press journal Patterns. The researchers ran TOEFL essays written by non-native English speakers, alongside essays written by native English-speaking eighth-graders in the United States, through seven widely used GPT detectors.
The detectors flagged native English writing as AI-generated at low rates. They flagged the non-native TOEFL essays as AI-generated at dramatically higher rates. In one configuration, more than half of the non-native essays were misclassified by every detector tested, and one detector flagged over 97 percent of them.
The researchers also showed that prompting GPT to rewrite the non-native essays with more sophisticated vocabulary dropped the false-positive rate sharply. In other words, the detectors were not identifying AI authorship. They were identifying low linguistic complexity, and penalizing it.
What is known about Turnitin specifically
Turnitin has publicly stated its AI detector has a false-positive rate of less than one percent at the document level, based on internal testing. Independent researchers have reported higher rates in real-world conditions, and the company itself has acknowledged that shorter documents and writing from non-native English speakers are more likely to be misclassified. Turnitin's public guidance instructs instructors not to use AI detection scores as the sole basis for an academic integrity finding.
The Weber-Wulff et al. (2023) study published in the International Journal of Educational Integrity tested fourteen AI detection tools, including Turnitin, and concluded that none performed reliably enough to serve as standalone evidence in academic integrity proceedings. The paper recommended that institutions not rely on detector output as conclusive proof of AI use.
Why non-native writing triggers false positives
AI detectors rely on statistical features of text, primarily perplexity (how predictable each word is given the preceding context) and burstiness (variation in sentence length and complexity). Text with low perplexity and low burstiness looks, by these metrics, like text generated by a large language model.
Non-native English writers often produce text with exactly these characteristics, for reasons that have nothing to do with AI:
- Use of common, high-frequency vocabulary instead of rarer synonyms
- Simpler, more uniform sentence structures that prioritize clarity over stylistic variation
- Avoidance of idiom, slang, and culturally specific references
- Heavier reliance on standard discourse markers and transitions taught in ESL instruction
These are features of careful, learned English. The same patterns appear in writing by some neurodivergent students, students writing in a second academic register, and students who have been explicitly trained in formal academic style.
How to use this research in your response
If you are a non-native English speaker accused based on a detector score, the Liang et al. study is directly relevant and should be cited by name in your written response. A useful response includes:
- A clear statement of your language background, including when you began learning English and your first language
- A specific citation of Liang et al. (2023) in Patterns, with the finding that GPT detectors flag non-native writing at rates often above 50 percent
- A citation of Weber-Wulff et al. (2023) in the International Journal of Educational Integrity, concluding that no tested detector was reliable enough for institutional decisions
- A reference to Turnitin's own published guidance that AI scores should not be the sole basis of a finding
- Process evidence: drafts, version history, browser history, notes, and any other artifacts that document how the paper was written
Our FAQ on procedural rights explains what you can request from your institution before a hearing, including the specific detector used and the score threshold applied. For a deeper look at detector limitations across the board, see our overview of what AI detector scores actually prove.
If you have already received an adverse decision, the appeal package is structured around the procedural and evidentiary grounds these cases turn on. If you are drafting your written response now, NotBot generates a personalized defense package that cites this research, addresses your specific detector, and incorporates your writing process and language background, ready in about a minute. If your case carries serious consequences such as expulsion or visa impact, this research can strengthen your defense, but consulting an education law attorney before your hearing is advisable.
Build your defense package
A personalized response that cites the research and your writing process, ready in minutes.
Get your defense package$49 one-time · Generated in 60 seconds