Back to DevLog

Deep Bug Hunt: Finding 4 Real Issues Among 15 False Alarms in OptiPortal

3 min read

Just wrapped up a massive bug hunting session on OptiPortal, my Hytale chunk preloading plugin. Used an AI agent to scan all 55 Java source files, and wow - the signal-to-noise ratio was... interesting.

The Hunt

The agent flagged 15 potential issues. After manually verifying each one, only 4 turned out to be real bugs. The rest? Classic false positives that look scary but are actually fine.

Here's what I actually found:

Bug 1: Memory Leak in Zone Cleanup (HIGH)

This one's nasty. I have two code paths for deregistering chunks - deregisterAllChunks and releaseZoneChunks. When I added the new per-zone pinning model, I updated the first one but forgot about the second. Result? Zones released via releaseZoneChunks never free their memory pins unless they're the absolute last owner.

The fix is straightforward but critical - capture the removal result and call tryReleaseKeepLoaded when we actually remove a zone.

Bug 2: Thread Pool Leak in Retry Logic (HIGH)

Found an identical bug in two places in my RetryPolicy class. I create a scheduled executor for retry delays and shut it down on success, but if the initial schedule() call throws an exception, I never shut down the executor. Hello resource leak.

Easy fix: add defaultExecutor.shutdownNow() to both catch blocks.

Bug 3: Dead Code in Load Balancer (MEDIUM)

I'm tracking totalExecutionTime in my AsyncLoadBalancer but literally never reading it. It's not exposed in stats, not logged, nothing. The comment even lies about why it exists. Classic dead code that somehow survived reviews.

Solution: delete it entirely.

Bug 4: Broken Chunk Residency Check (MEDIUM)

This one's subtle. My KeepaliveManager is supposed to demote HOT zones to WARM when their chunks get evicted. The bug? I'm checking if a CompletableFuture is null, which... it never is. The actual chunk data might be null, but the future wrapper isn't.

Need to use CompletableFuture.getNow(null) to check the actual value.

The False Positives

The dismissed findings were classic AI confusion:

  • "Integer division truncation" on variables that are actually doubles
  • "Null pointer risk" on code that uses proper short-circuiting
  • "Race conditions" on atomic CAS operations that are explicitly thread-safe

What's Next

I wrote up a detailed fix plan with exact before/after code blocks. The memory leak and residency check bugs are the highest impact, so I'll tackle those first. The thread pool leak is mechanical but important for long-running servers.

No code changes in this session - pure analysis and planning. Sometimes the best coding sessions are the ones where you don't write any code at all.

The real lesson here? AI agents are great for broad scanning, but you absolutely need human judgment to separate real issues from false alarms. 4 out of 15 isn't a terrible hit rate, but it definitely requires that verification step.

Share this post