@alex_prompter: This paper just exposed the bi...

51 views Oct 24, 2025

This paper just exposed the biggest AI research scam 💀

MIT just proved AI can generate novel research papers.

Stanford confirmed it. OpenAI showcased examples. the papers passed peer review at major conferences. scored higher than human-written work on novelty and feasibility.

major AI labs started citing these as evidence that autonomous research agents are here. that LLMs can actually do science now.

except... they didn't prove that at all.

researchers at Indian Institute of Science ran the exact same AI systems - same prompts, same models, same pipeline. generated 50 research documents using Claude and GPT-4o.

but they changed one thing in how they evaluated them.

previous studies asked experts: "rate this on novelty and feasibility." experts looked at shuffled papers - some human, some AI - and judged them blind. no reason to suspect plagiarism. just scoring ideas.

this study asked: "find what this plagiarized from."

they told 13 domain experts to presume plagiarism exists. go hunting for it. find the source papers.

different question. nuclear results.

24% plagiarized. scores of 4 or 5 on a 5-point scale. verified by contacting the original paper authors.

not sloppy copy-paste that any undergrad could spot. sophisticated methodological rewording that fooled everyone... expert reviewers who literally work in these subfields, conference peer reviewers, academic integrity officers.

every automated plagiarism detector failed. Turnitin? 0% detection rate. OpenScholar with its 45 million paper database? 0%. the Semantic Scholar RAG systems these AI agents use internally to "check their own work" for plagiarism before publishing? caught 51% in the easiest possible test scenario where proposals were deliberately plagiarized from single papers.

in real-world generation where the AI is trying to be novel? way worse.

the exemplar papers everyone's been citing as proof AI can do real science?

one had perfect 1-to-1 mapping with "Generating with Confidence: Uncertainty Quantification for Black-box LLMs" published in 2023.

each component of the "novel" methodology corresponded exactly to sections in the original paper. just skillfully reworded.

"resonance graph" instead of "weighted adjacency matrix."

"semantic resonance uncertainty quantification" instead of "uncertainty quantification."

"pairwise evaluations for consistency" instead of "pairwise similarity scores."

five steps. five direct correspondences. same methodology. same scientific contribution. same insight.

zero attribution. zero citations.

the original authors (Lin et al.) confirmed the plagiarism after reviewing both documents.

this paper was showcased as an exemplar of AI-generated research. it passed through expert review in the original study. nobody caught it.

another exemplar combined two papers without credit - one on diffusion model gating mechanisms, another on multi-resolution training. repackaged as "DualDiff." authors of the source papers confirmed: definitively plagiarized.

these aren't edge cases.

human-written papers from major conferences? plagiarism rate around 2-6% based on peer review comments.

AI-generated proposals? 24%.

and this assumes the experts found everything. the authors explicitly say this is likely a lower bound because finding plagiarism is incredibly labor-intensive.

the really disturbing part?

the AI-generated proposals are less diverse than human work. they cluster together in embedding space. you can train a basic classifier with 93% accuracy to detect them just from titles and abstracts.

which means these systems aren't exploring novel research directions. they're pattern-matching within a narrow band of what "sounds like research" and skillfully remixing existing papers.

we built systems that repackage existing ideas so well, we convinced ourselves - and expert reviewers - they were breakthroughs.

here's the methodology shift that broke everything:

previous studies: "here are 100 research proposals. some human, some AI. rate them on novelty and feasibility."

experts had no reason to be suspicious. they're evaluating ideas, not investigating fraud. they scored AI proposals higher than human ones.

this study: "here are 50 AI-generated proposals. presume plagiarism. find the source papers."

same experts. same AI systems. completely different situational logic.

and suddenly... 24% plagiarism rate.

this isn't about the quality of previous research. it's about what you're designed to look for.

when you evaluate novelty, you ask: "is this idea interesting? feasible? would it work?"

when you search for plagiarism, you ask: "what paper does this remind me of? let me search those keywords. let me check the methodology mapping."

one approach assumes good faith. the other assumes adversarial behavior.

the original studies weren't wrong to assume good faith. that's how science works. you don't peer review every paper expecting deliberate fraud.

but here's the problem:

LLMs aren't deliberately deceiving anyone. they're not trying to plagiarize.

they're just doing what they're trained to do—pattern match against their training data and generate text that "sounds right."

when that training data includes thousands of research papers... and you ask it to generate a research proposal... it remixes what it's seen.

not consciously. not maliciously.

just statistically.

"resonance graph" tests better than "weighted adjacency matrix" in the context window. so it uses that. the methodology still maps 1-to-1 to the source paper, but now it's linguistically novel.

this is adversarial plagiarism without adversarial intent.

and that's why it's so dangerous.

because the people building these systems aren't trying to plagiarize either. they genuinely believe they're generating novel research. they implement plagiarism detectors—Semantic Scholar RAG systems that query 100 papers, run pairwise comparisons, filter out matches.

those systems caught basically nothing.

51% detection rate in the easiest possible scenario. in real-world use? the researchers estimate much worse.

the systems we built to verify novelty... aren't verifying novelty.

they're verifying linguistic dissimilarity.

and LLMs are really good at linguistic dissimilarity while maintaining semantic and methodological identity.

so what happens now?

every paper showcased as "proof AI can do research" needs to be re-evaluated under adversarial assumptions.

every AI-generated proposal needs domain experts actively searching for source papers.

every conference considering AI-generated submissions needs reviewers who presume plagiarism and go looking.

because the default assumption that novel-sounding proposals are actually novel... just got destroyed.

source: arxiv.org/abs/2502.16487

10x your prompting skills with my prompt engineering guide

→ Mini-course
→ Free resources
→ Tips & tricks

Grab it while it's free ↓
godofprompt.ai/prompt-enginee…

@alex_prompter: This paper just exposed the bi...

Actions

What You Can Do