About This Archive

In 2025, the U.S. Department of Justice released over 370 gigabytes of documents related to the investigation of Jeffrey Epstein. These 12 data sets contain approximately 1.4 million files spanning 3.5 million pages — FBI interview summaries, police reports, emails, financial records, flight manifests, seized photographs and videos, and more.

This archive exists to make these public records genuinely accessible. Raw document dumps are functionally opaque to most people. We've indexed every file, applied OCR to scanned documents, and built full-text search across the entire collection. Every document is browsable, searchable, and downloadable.

Beyond the original DOJ release, our web crawler continuously discovers and indexes new Epstein-related documents from court dockets, government transparency portals, the Internet Archive, and news publications. New content is scored for relevance, deduplicated, and added to the searchable archive automatically.

How It Works

Full-Text Search

Every document is OCR'd and indexed with Meilisearch. Search across 3.5 million pages with typo tolerance, faceted filtering by data set and file type, and sub-200ms results.

Continuous Crawling

Our web crawler actively discovers new Epstein-related documents from court dockets, government sites, news outlets, and public archives. New content is automatically scored for relevance, deduplicated, and indexed.

Censorship Resistant

The archive is distributed via the Spill P2P network using Hyperswarm. If this server goes offline, other peer nodes retain full copies of the data.

Document Viewer

PDFs render inline with PDF.js. Images, videos, and audio files play natively. Extracted text is available for every document for accessibility and copy-paste.

Data Sources

All documents in this archive are public records released by the U.S. Department of Justice. The raw data sets are available from:

  • U.S. Department of Justice official release
  • Internet Archive community mirrors
  • BitTorrent community distribution
  • Court docket filings (PACER / public access)
  • Government transparency portals
  • News publications and investigative journalism

Processing Pipeline

1

Download

All 12 data sets downloaded via BitTorrent and verified against published checksums.

2

Catalog

Every file cataloged by type, size, and data set membership. File types detected by extension and magic bytes.

3

Text Extraction

Text-layer PDFs processed with PyMuPDF. Scanned documents OCR'd with Tesseract. Emails and spreadsheets parsed for content.

4

Thumbnail Generation

PDF pages, images, and video frames thumbnailed for visual browsing.

5

Indexing

All extracted text indexed in Meilisearch with filterable facets for data set, file type, and category.

6

P2P Distribution

Files published to Hyperdrives and announced on the Spill network for decentralized replication.

7

Web Crawling

A continuous crawler discovers new Epstein-related documents from court systems, government sites, and public archives. Content is scored for relevance by source-specific adapters and added to the archive automatically.

Privacy & Security

This archive does not require an account, does not set tracking cookies, and does not log search queries. No analytics service is used. The site is served over HTTPS with a Let's Encrypt certificate. The P2P distribution layer uses end-to-end encrypted connections via the Noise protocol.

Technical Stack

Frontend: Next.js + Tailwind CSS
Search: Meilisearch
Database: SQLite
P2P: Hyperswarm + Hyperdrive
OCR: Tesseract + PyMuPDF
Hosting: Hetzner Dedicated