Health Check failing with Network Load Balancer on EKS

gkrizek · July 30, 2020, 5:14am

Hello!

I'm having a hard time understanding what's going wrong here. I'm running Traefik (2.2) on Kubernetes (1.17) via the CRD in Amazon EKS. I'm trying to configure a Network Load Balancer in front of Traefik. The pod is failing Health Checks in the Target Group and it seems to be the Kubernetes Service's HealthCheck port is returning a 503. You can see when I describe the Traefik K8s Service:

$ kubectl describe svc/traefik -n traefik
Name:                     traefik
Namespace:                traefik
Labels:                   app=traefik
Annotations:
                          service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: Name=traefik-staging,Provisoner=kubernetes
                          service.beta.kubernetes.io/aws-load-balancer-type: nlb
Selector:                 app=traefik
Type:                     LoadBalancer
Port:                     admin  8888/TCP
TargetPort:               8888/TCP
NodePort:                 admin  32457/TCP
Endpoints:                172.17.113.150:8888,172.18.32.210:8888
...
External Traffic Policy:  Local
HealthCheck NodePort:     31769

The HealthCheck is automatically configured to use port 31769. This is also the ports the Target Group are using. When I get into my network and try to curl against port 31769 on the K8s node, it returns:

< HTTP/1.1 503 Service Unavailable
< Content-Type: application/json
< X-Content-Type-Options: nosniff
< Date: Thu, 30 Jul 2020 04:54:45 GMT
< Content-Length: 88
<
{
	"service": {
		"namespace": "traefik",
		"name": "traefik"
	},
	"localEndpoints": 0
}

So it seems like the health check is returning a 503 and thus failing the Load Balance health checks. Why is this failing and returning a 503? I can see from my K8s service that there is indeed endpoints in the service. Thank you for the help in advance!

gkrizek · July 30, 2020, 4:37pm

I figured out the issue and it was nothing to do with Traefik. I was mistaken in thinking that the 503 error was a health check in Traefik when it was actually a health check from kube-proxy. For anyone interest in the solution:

Ultimately, this was due to the fact that my EC2 instance's hostname was different than the node name registered in Kubernetes. For whatever reason, cube-proxy does not like that. So I had to set my cube-proxy hostname to the same value as my node's name in the control plan.

This was done by patching the kube-proxy daemon set with:

env:
- name: NODE_NAME
    valueFrom:
    fieldRef:
        apiVersion: v1
        fieldPath: spec.nodeName

Then you need to reference that NODE_NAME variable in your --hostname-override flag.

- command:
  - kube-proxy
  - --hostname-override=$(NODE_NAME)
  - --v=2
  - --config=/var/lib/kube-proxy-config/config

Now the health check is passing and Traefik is working flawlessly with a Network Load Balancer in EKS.

Topic		Replies	Views
Why the Traefik health check is not available for kubernetesCRD and kubernetesIngress providers Traefik v2 kubernetes-crd	6	3258	March 18, 2022
Using helm deployed traefik v2 to load balance external services Traefik v2 kubernetes-crd	3	858	June 1, 2021
AWS ALB -> Traefik -> 404 on everything Traefik v2 kubernetes-ingress	3	2392	September 13, 2020
EKS w/ ALB & NodePort - Migrated to v2.5, IngressRoutes and Dashboard broken Traefik v2 kubernetes-crd , dashboard-api	0	587	September 7, 2021
Traefik IngressRouteTCP with eks and nlb not working Traefik v2 kubernetes-crd , tcp	0	603	August 18, 2022

Health Check failing with Network Load Balancer on EKS

Related Topics