×
Former Cloudflare exec launches archive of pre-AI human content in time capsule-style move
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Former Cloudflare executive John Graham-Cumming has launched lowbackgroundsteel.ai, a catalog that preserves pre-2022 human-generated content from before widespread AI contamination began. The archive draws its name from scientists who once sought “low-background steel” from pre-nuclear shipwrecks to avoid radiation contamination, creating a parallel between nuclear fallout and AI-generated content polluting the internet.

The big picture: The project treats pre-AI content as a precious commodity, recognizing that distinguishing between human and machine-generated material has become increasingly difficult since ChatGPT’s November 2022 launch.

Why this matters: AI contamination has already forced at least one major research project to shut down entirely—wordfreq, a Python library that tracked word frequency across 40+ languages, announced in September 2024 it would stop updating because “the Web at large is full of slop generated by large language models, written by no one to communicate nothing.”

What’s included: The archive points to several major repositories of verified pre-AI content that researchers and developers can trust.

  • A Wikipedia dump from August 2022, captured before ChatGPT’s release.
  • Project Gutenberg’s collection of public domain books.
  • The Library of Congress photo archive.
  • GitHub’s Arctic Code Vault—open source code buried in a former coal mine near the North Pole in February 2020.
  • The now-frozen wordfreq project, preserved from before AI contamination made its methodology untenable.

Model collapse concerns: Some researchers worry about AI models training on their own outputs, potentially degrading quality over time, though recent evidence suggests this fear may be overblown under certain conditions.

  • Research by Gerstgrasser et al. (2024) indicates model collapse can be avoided when synthetic data accumulates alongside real data rather than replacing it entirely.
  • Properly curated synthetic data can actually assist with training newer, more capable models when combined with real data.

The backstory: Graham-Cumming created the website in March 2023 but only recently announced it publicly, having kept it as a quiet clearinghouse for uncontaminated online resources.

  • He’s known for creating POPFile spam filtering software and successfully petitioning the UK government to apologize for persecuting codebreaker Alan Turing in 2009.
  • The site accepts new submissions through its Tumblr page.

Looking ahead: Graham-Cumming emphasizes the project documents human creativity rather than opposing AI itself, similar to how low-background steel eventually became unnecessary as atmospheric nuclear testing ended and radiation levels normalized.

Why one man is archiving human-made content from before the AI explosion

Recent News

AI avatars help Chinese livestreamer generate $7.65M in sales

Digital humans can stream continuously without breaks, maximizing sales during peak shopping periods.

Plaud AI sells 1M voice recorders as workplace privacy debates intensify

Executives expect employees will use recordings to report problematic coworkers to HR.

Google launches Search Live for real-time voice conversations with AI

Conversations continue in the background while you switch between apps.