A link to my comment on the GitHub issue board: https://github.com/containous/traefik/issues/2986#issuecomment-506820573
I have two servers configured for my backend, one with a very large weight (performs better), and another with weight of 1 (slower, used as a backup if main server fails). If the server with the high weight fails a health check and then recovers, the wrr load balancer treats both servers like they have a weight of 1, until I manually change the value of the weight in EITHER server. Once one of the server’s weights is changed, the server’s load balancing returns to normal.
I believe I found the cause of the problem in Traefik’s source code. In the file pkg/healthcheck/healthcheck.go, when the health check detects that a failed server is up again it runs the following code:
if err := checkHealth(disableURL, backend); err == nil {
log.Warnf("Health check up: Returning to server list. Backend: %q URL: %q", backend.name, disableURL.String())
if err = backend.LB.UpsertServer(disableURL, roundrobin.Weight(1)); err != nil {
log.Error(err)
}
// FIXME serverUpMetricValue = 1
} else {
I think the issue specifically lies with if err = backend.LB.UpsertServer(disableURL, roundrobin.Weight(1)); err != nil {
because instead of passing in the previously assigned weight from the server, it is passing in a hard-coded roundrobin.Weight(1)
. I’ve been trying to dive deeper into traefik’s source to see if there’s a way to obtain the server’s previous weight in the function that contains the above code, but I’ve had no luck so far since I’m very unfamiliar with traefik’s code.
My question for anyone with good understanding of traefik’s source, is my assumption above correct? If so, how are server weights stored internally and how can it be obtained within the checkBackend
method?