Battle-Tested: Fixing Flaky Playwright Tests and Async Race Conditions
Had one of those satisfying debugging sessions today where everything finally clicks into place. Been wrestling with flaky Playwright tests on the honeybun dashboard, and I'm happy to report we're now sitting pretty at 28/28 tests passing consistently.
The Infrastructure Fix
First victory was sorting out the test infrastructure. The webServer was being stubborn about serving files, throwing 404s left and right for JS modules. Turned out I needed to serve from the project root instead of trying to be clever about it. Sometimes the simple solution is the right one.
Updated all the page.goto() calls to use the /frontend/tools/ prefix across 4 test files, and suddenly everything started behaving.
The Google Integration Rewrite
The real meat of today's work was completely rewriting google-integration.spec.js. The old version was a mess of incorrect selectors and wrong assumptions:
- Fixed attribute selector to use
data-clientinstead ofdata-client-id - Corrected route pattern to
**/analytics/discover/*(was using query params before) - This is where it gets interesting: rewrote the
expandCardhelper to usepage.evaluatewith JS injection instead of simple clicks
Why the JS injection? Because clicking on .card-company triggers toggleCard, which has the side effect of auto-calling discoverGoogleServices when the panel opens. This was pre-opening panels before my badge click tests could run properly. Sometimes you need to be surgical about how you interact with the UI.
Hunting Down the Flaky Tests
Two particularly stubborn flaky tests taught me some lessons:
The Case Sensitivity Gotcha: One test was checking if a dialog message contained 'disconnect' (lowercase), but dialog.message() actually returns 'Disconnect' with a capital D. Added .toLowerCase() and boom, fixed.
The Race Condition From Hell: This one was trickier. I had a route override for /clients that was getting registered before expandCard ran. This caused fetchClients to re-render and remove the showGoogleStatus badge before the test could click it.
The fix? Move the route override inside an if block after confirming panel.toBeVisible(). Order matters with async operations!
The Async Content Fix
Also learned to use expect(locator).toContainText(str, {timeout}) instead of grabbing textContent() directly. The former has built-in retry logic for async DOM content, while the latter is just a one-shot check.
Key Takeaways
- When mocking routes in Playwright, register post-action mocks AFTER pre-conditions are confirmed
- JS injection with
page.evaluategives you surgical control when regular clicks have unwanted side effects - Always account for async content with proper retry mechanisms
- Case sensitivity will bite you when you least expect it
Deployed both commits to production and everything's running smoothly. There's something deeply satisfying about seeing that green "28/28 tests passing" status after a good debugging session.
Next up: back to feature work on the dashboard, but now with the confidence that my tests won't randomly fail on me.