EpsteinScan Update: False Alarm Fixed, 37 New Files Found, and TikTok Parser Improvements
Had quite the debugging session today with EpsteinScan. What looked like a massive data breach alert turned out to be a classic "missing baseline" situation.
The Great False Alarm of 625k Files
Woke up to alerts claiming 625,161 new DOJ files had appeared overnight. My heart sank for a moment—that would be huge news. Turns out the monitoring system was comparing against zero because I forgot to populate the baseline table. Classic developer move, right?
After resetting the baseline with actual file counts across all 12 datasets, I ran a proper filename-level diff and found the real story: only 37 genuinely new files. Much more reasonable.
- DS12: 26 new files (now downloaded and indexed)
- DS3: 11 files that were actually duplicates from DS2
- DS8 & DS11: Zero missing (we actually have more than DOJ does)
Infrastructure Improvements
Deployed 9 updated templates from dev to production, covering everything from the admin dashboard to social media management. The TikTok creator tool was having issues with old-format posts that lacked the ||| delimiters—fixed that so titles, descriptions, and hashtags parse correctly instead of dumping everything into one field.
Also hit a classic Python gotcha: sqlite3.Row objects don't support the .get() method like regular dicts do. Changed blog_post.get('summary', '') to (blog_post['summary'] or '') in the auto-social script.
Error Handling
Created a custom Cloudflare error page styled like a DOJ memo (seemed fitting for the project). It's ready to upload to the dashboard—just need to do that manually since Cloudflare doesn't have an API for custom error pages.
Interesting Data Anomalies
Found some weird discrepancies in the dataset counts:
- DS9: DOJ shows 50 files, we have 531,282
- DS10: DOJ shows 278,450, we have 503,150
Could be bulk imports from other sources or files that DOJ has since reorganized/removed. Worth investigating later.
What's Next
Need to upload that Cloudflare error page and investigate those count discrepancies. Also planning to expand the missing file checker to cover the remaining datasets.
All in all, a productive debugging session. Sometimes the scariest alerts have the simplest explanations.