NotBot
ResearchAI detectionfalse positivesnon-native speakersUC Davis

AI Detection Bias Against Non-Native English Writers: The Research

May 29, 2026  ·  7 min read

If you write in English as a second language and an AI detector flagged your work, the score is not telling you what your instructor thinks it is. Multiple peer-reviewed studies have now found that commercial AI detectors misclassify writing by non-native English speakers as machine-generated at rates that would not survive scrutiny in any other evidentiary context.

The research record on detector bias against non-native writers

The most cited study in this area is Liang et al. (2023), published in the Cell Press journal Patterns. The Stanford team tested seven widely used GPT detectors against a set of TOEFL essays written by non-native English speakers and a set of essays written by US-born eighth graders. The detectors consistently flagged the non-native writing as AI-generated while classifying the native writing as human. Across the detectors tested, more than half of the TOEFL essays were misclassified, and one detector flagged nearly all of them.

This is the foundational dataset that subsequent work, including replication studies and university-led audits, has continued to test against. If you are building a defense, the Liang paper is the citation that does the heaviest lifting. Our deeper breakdown of the Liang study covers the exact tools tested and how to cite the numbers in a response letter.

Detector misclassification of TOEFL essays vs. US student essays (Liang et al., 2023)

US 8th-grade essays flagged as AInear 0%
TOEFL essays flagged as AI (avg across 7 detectors)~61%
TOEFL essays flagged by the most aggressive detector~97%

What the UC Davis pattern adds

UC Davis is one of several large public universities where international and multilingual students have reported being flagged by AI detectors at rates that do not match the native-speaker population on campus. The campus paper, The California Aggie, and student government statements have documented the concern, and UC Davis's Office of Student Support and Judicial Affairs has publicly cautioned instructors that AI detector output alone is not sufficient evidence in a Code of Academic Conduct case.

We have not been able to verify a single published, peer-reviewed UC Davis study quantifying the bias on Davis student writing specifically. What is on the record is the institutional caution about detector reliability and a steady pattern of cases involving multilingual students. If anyone tells you there is a "UC Davis study" with a specific percentage, ask for the citation before relying on the number. The defensible move is to cite the underlying research (Liang and follow-ups) and pair it with the UC system's own procedural guidance.

Important
Do not paraphrase a statistic you cannot source. In a hearing, an unverifiable number is worse than no number. Cite Liang et al. (2023) by author, year, and journal, and quote the institutional language your campus has actually published.

Why detectors penalize non-native English writing

Most detectors rely on two statistical signals: perplexity (how predictable each next word is, given a language model) and burstiness (variation in sentence length and complexity). Both signals can describe AI output. Neither is unique to it.

Non-native English writers often produce text with lower lexical diversity, more conventional syntax, and fewer idiomatic constructions. This is the result of writing carefully in a second language, and it produces exactly the statistical fingerprint detectors associate with machine output. The detector is not measuring whether AI wrote the text. It is measuring how closely the text resembles the most probable English a model would generate, and a careful L2 writer often lands in the same statistical neighborhood.

Signals that overlap with L2 writing

  • Short or moderately long sentences with low syntactic variation
  • Common, high-frequency vocabulary instead of rarer synonyms
  • Few contractions, idioms, or culturally specific references
  • Clear topic sentences and predictable paragraph structure (often explicitly taught in EAP and TOEFL preparation)
  • Repetition of connective phrases such as "in addition," "however," and "therefore"

How to use this research if you have been accused

Citing the research only works if it is tied to your specific situation. A response that says "studies show detectors are biased" reads as deflection. A response that establishes your language background, names the detector used, and cites the study that tested that detector against writers like you is much harder to dismiss. The procedural rights FAQ covers what you can request in writing before a hearing, including the detector name and the score threshold the instructor used.

A defensible written response generally includes:

  1. A factual statement of your language background: first language, years of English instruction, TOEFL or IELTS scores if relevant.
  2. The specific detector and score, requested in writing from the instructor or conduct office.
  3. A citation to Liang et al. (2023) in Patterns and any institutional guidance your university has published on detector reliability.
  4. Process evidence: drafts, version history (Google Docs or Word), notes, outlines, and any feedback from a writing center or tutor.
  5. Where possible, the same text run through a second detector to document inconsistency.
Tip
Do not edit your draft files after the accusation. Version history is evidence. Open the file once to confirm it exists, then export the revision history as a PDF and stop editing the original.

If this is you at UC Davis or another UC campus

At UC Davis, academic misconduct allegations are routed through the Office of Student Support and Judicial Affairs (OSSJA). The standard of evidence is "preponderance," meaning the instructor must show it is more likely than not that you violated the Code of Academic Conduct. A detector score alone, particularly against a writer whose language background sits in the part of the distribution the Liang study covers, is not a preponderance.

If you are preparing a written response, NotBot generates a personalized defense package that addresses the specific detector, cites the research relevant to your language background, and follows the procedural shape of UC conduct cases. If a finding has already been issued, the appeal package covers the procedural grounds that matter once the case is past the first hearing.

If your enrollment status depends on your visa, or if the proposed sanction includes suspension or dismissal, consult an education law attorney before your hearing. The research will help your case. It does not replace counsel when the stakes are that high.

Build your defense package

A personalized response that cites the research, names your detector, and reflects your language background.

Get your defense package

$49 one-time · Generated in 60 seconds

Related articles