V2: Intermittent 404 errors across our docker containers

We have multiple containers serving backend api (django) for a frontend (angular).

This has been working with traefik in between for about a month but since early this week we started having a problem.

If I access the frontend e.g. stage.domain.com I will see the page.
If I access the backend e.g. stage-api.domain.com, I will get a 404.

If I remain on frontend and do not access the backend services directly the frontend will (for the most part) remain stable.

If I CTRL+F5 to refresh the backend (stage-api), I may get a 200 OK after a few seconds. Then going back to the frontend (stage) I now see 404 when refreshing (it flip flops pretty consistently) -- though sometimes we may see the 404 when accessing all containers. Simultaneously while getting 404's in Chrome, Firefox may be working fine for a while. Eventually the issue is seen in either browser.

The HTTP request does not seem to make it to the containers when we get the 404, so we can confirm that it's coming from traefik, but we're not seeing anything in the traefik access log that would seem to help.

I suspect it has something to do with session management. Specifically because:

I can have three tabs open accessing different resources behind traefik, two tabs were returning 404 on ctrl+f5, the third was working fine. Closing Chrome completely or switching to Firefox will typically alleviate the issue until it pops up again shortly after.

My thinking is that traefik is binding to the client ip:port to establish a session with one particular backend server.

Since all things work intermittently, we're at a loss to understand where the break actually is. Are there any logs that would clearly indicate why traefik is sending a 404 response--we enabled debug logging but see nothing new but a generic 404 returning from traefik.

Downgraded from v2.2.2 -> v2.2.1 and everything works fine with the same configuration, only change is the downgrade.

Consider this post closed, but I'll leave it up in case anyone else runs into the problem.

I had a perfectly working set up until I updated to 2.2.4 last night and then it started to behave exactly as you've described. I was on 2.2.1 before that. Either we're making the same mistake somewhere or there is a bug in the later versions.

Interesting. Can you enable debug log and see if there is anything useful there?

Already have. Nothing to indicate any issues at all. I get a lot of successful authentication entries for the dashboard and entries for sites that actually work and when they work. The interesting thing is that it seems intermittent i.e. sometimes the Portainer page would work but not the Whoami one then after a restart the Whoami would work but not the Pihole admin page.

Are you using swarm and SSL? I'm wondering if we could create a minimal repo (with whoami or something) that is as simple as possible and still reproduces the issue, like no swarm, no ssl, etc... Is this something you could help with?

If you feel you can contribute to: Consolidated 404 issues thread in versions since 2.2.1 That would be awesome!