TCP Failover for Master/Backup scenario

I'm trying to set up a configuration where I have two SFTP servers deployed and use one as the master, and when that fails, failover to the backup. The backup in this case is just another SFTP server perhaps with some replication from the master one. The backup is never used unless the master fails.

I've tried a WWR set up with the backup server's weight set to 0 (see config below). This doesn't seem to work though. When I disable the master server, trying to connect through the Traefik route gives a connect: no route to host error as it's still trying to connect to the master server. I don't think it's possible to set up health checks for TCP services so I don't think that can help me.

Is this even possible?

[entryPoints]

  [entryPoints.http]
    address = ":80"
  [entryPoints.https]
    address = ":443"
  [entryPoints.sftp]
    address = ":22"

Dynamic:

# Dynamic configuration for test. Managed by Ansible.

[tcp]
  [tcp.routers]
      [tcp.routers.test_at_sftp]
      entryPoints = ["sftp"]
      rule = "HostSNI(`*`)"
      service = "test"

  [tcp.services]
    #-------------------------------------------------------------------------------------------------------------------
    # Weighted Round Robin services
    #-------------------------------------------------------------------------------------------------------------------
    [tcp.services.test]
        [[tcp.services.test.weighted.services]]
        name = "local_sftp"
        weight = 1

        [[tcp.services.test.weighted.services]]
        name = "backup-sftp"
        weight = 0

    [tcp.services.local_sftp]
      [tcp.services.local_sftp.loadBalancer]
        [[tcp.services.local_sftp.loadBalancer.servers]]
          address = "sftp:22"

    [tcp.services.backup-sftp]
      [tcp.services.backup-sftp.loadBalancer]
        [[tcp.services.backup-sftp.loadBalancer.servers]]
          address = "centos-sftp2:2200"

Thanks,
Chris.

Hello,

A weight 0 means that the server is disabled.

Hi,
I'm trying to build a TCP loadbalancer as well which should do a failover if one of the backend systems stopps working. Did you solve your problem and how?
Thanks,
Thomas

Hi Thomas,

I don't think it's possible with the current capabilities of Traefik. Certainly, nobody on here has piped up to tell me any different :slight_smile:

We've left our set up as manual failover for now. But I've been thinking about solutions involving our other systems. We use Zabbix (monitoring), Jenkins (CI and operations jobs) and Ansible (configuration provisioning - this is how we roll out and configure Traefik). I was thinking of linking those together:

Zabbix alert after master fail detected -> webhook call -> Jenkins -> Job that runs Ansible -> Ansible re-configures Traefik TCP config to switch the slave/backup to master.

Which is relatively complicated compared to Traefik just having failover similar to the HTTP side of things.

Hope that helps,
Thanks,
Chris.

Hi Chris,
thank you, that helpde a lot and that's what I expected ;-( We are thinking about creating a "provider" for traefik which should be able to dynamically change the configuration... hope that works...

Thank you,
best regards
Thomas

1 Like

@chrisbrookes, you can achieve this failover goal if you take a "router priority" approach, rather than a service weighted round robin approach.

Instead of using weightings with services, create 2 routers and use the priority=2 on the router that routes to the primary host, and use a priority=1 on the router for the failover (all other configs can be the same). That way, all traffic will get routed to the primary, and then when it disappears, traffic will get routed to the secondary. I've got this working with ConsulCatalog for some regional endpoints that can failover to any service globally if there are no valid services in the same region.

On the topic of service weightings, I was wondering if I could take the same approach as you did with the weightings, since I suspect it would allow me to have a little more control over the order in which regional failovers could be defined...

For example, if weight values of less than zero meant the server won't receive traffic if there are services with weight values greater than zero, and if there are no servers with weight values greater than zero, the services with the largest negative value would receive traffic, then I think we could do some interesting failover scenario support. But that's beside the point.