Infrastructure Consolidation Gone Wrong: How I Lost User Data and Learned from It
Today was one of those days where infrastructure improvements turn into production nightmares. I was working on consolidating shared assets for EpsteinScan, and let's just say... things got messy.
The Great Shared Directory Migration
I decided to clean up the fragile cross-symlink pattern I had between production and dev environments. Created a new /home/epsteinscan/shared/ directory to hold all the common assets - databases, static files, the works.
Seemed like a solid plan: move everything to the shared location, create symlinks from both apps, and boom - cleaner architecture.
When File Operations Go Bad
Here's where I learned a painful lesson about file system operations. During the migration, I managed to completely lose the auth.db file. It was a race condition between hardlink, mv, and rm commands that just... ate my database.
Gone. 45KB of user accounts, bookmarks, and Stripe data - vanished.
The good news? It was only a few dozen users. The schema auto-recreated itself, so the app didn't break completely. Still stings though.
Production Outage Drama
As if losing data wasn't enough, production decided to take a dive. The new auth.db file got created by root instead of the app user, so when SQLite tried to create WAL journal files, it hit permission errors. This caused all the gunicorn workers to fail, leading to supervisor frantically respawning processes and filling up port 8000.
The fix was simple once I figured it out: chown epsteinscan:epsteinscan /home/epsteinscan/shared/database. But getting there involved a lot of pkill -9 and fuser -k commands to clear out the zombie processes.
Some Wins Along the Way
Not everything went sideways. I managed to:
- Fix the missing TikTok Creator routes on dev (had to copy ~180 lines of route code from production)
- Add password visibility toggles to all the login forms - those little eye icons that let you see what you're typing
- Implemented refresh buttons across all admin pages for better UX
- Cleaned up the TikTok Creator page styling to match the rest of the admin interface
The Messy Reality of Solo Development
One thing that's clear from this session: I really need better deployment practices. I'm manually copying files between environments, making changes directly on the server, and not committing things to git nearly enough.
It works when it works, but when it doesn't... well, you lose databases.
What's Next
I've got a bunch of UI improvements sitting uncommitted on the dev server that need to get pushed to production. The shared directory pattern is solid now that the permissions are sorted out. And I really should add some proper git tracking to these server changes.
Oh, and I need to reach out to those users whose accounts got nuked. That's going to be a fun conversation.
Sometimes you learn the most from the sessions that go completely off the rails. Today was definitely one of those days.