Back to DevLog

When Cloudflare Sabotages Your SEO: A Deep Dive Into EpsteinScan's Meta Tag Nightmare

3 min read

Just wrapped up a marathon SEO audit and fix session for EpsteinScan, and wow - sometimes the biggest SEO problems are hiding in plain sight.

The Plot Twist Nobody Saw Coming

I started what I thought would be a routine meta tag audit across our 15+ page templates. You know the drill - checking OG tags, Twitter cards, H1s, structured data. Pretty standard stuff.

But then I discovered something that made my eye twitch: Cloudflare's "Managed Robots" feature was completely overriding our Flask robots.txt. Not only was it stripping out our sitemap directive, but it was also blocking Google-Extended with a blanket Disallow: /.

Then came plot twist number two: Cloudflare's challenge page (that "Just a moment..." spinner) was blocking /sitemap.xml and /sitemap_index.xml from ALL crawlers. Google couldn't even see our sitemap!

Sometimes the tools meant to help you end up being your biggest obstacle.

The Cleanup Mission

Once I identified the real culprits, I dove into fixing the more traditional SEO issues:

  • 9 templates were missing Twitter title/description tags
  • 2 templates had no og:type specified
  • Our pricing.html template was completely missing the meta SEO block (oops)
  • The main sitemap had bloated to 50,136 URLs (Google's limit is 50K)
  • All sitemap lastmod dates were showing today's date instead of real content dates

The fixes were pretty straightforward once I knew what needed doing. Added proper Twitter cards everywhere, fixed the OG types, gave pricing.html the meta tags it deserved, and trimmed the sitemap back to 45K URLs per file.

The Real Win: Proper Sitemap Dates

One thing I'm particularly happy about is fixing the sitemap dates. Instead of lazy "today's date for everything," we're now using:

  • MAX(processed_date) for the index
  • published_at for blog posts
  • processed_date for documents
  • Latest blog date as fallback for static pages

Google loves fresh, accurate metadata, and this should help with crawl prioritization.

The Cloudflare Reckoning

All the template fixes are deployed, but I still need to wrestle with Cloudflare's dashboard to:

  1. Disable that "helpful" Managed Robots feature
  2. Add a WAF bypass rule so /sitemap*.xml doesn't get challenge pages

It's wild how a CDN feature designed to "help with SEO" can completely torpedo your search visibility. Always check your robots.txt in production, folks.

What's Next

Once Cloudflare stops being overly protective, I'll get the sitemap submitted to Google Search Console and start monitoring our indexing. Might also add some BreadcrumbList schema to the person and blog pages for extra search engine love.

Sometimes the best debugging sessions are the ones where you discover the problem wasn't in your code at all. Now excuse me while I go commit all these changes to git...

Share this post