I experience traefik v1 gets out-of sync on kubernetes, usually once a week. This means, that for some ingresses traefik returns 'Service not available'. The fortunate situation is that I am running two replicas of traefik, and right now only one pod is in this state.
I have the following ingress:
$ kubectl -n monitoring get ingress prometheus -o yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
traefik.ingress.kubernetes.io/frontend-entry-points: http
creationTimestamp: "2020-01-01T19:58:29Z"
generation: 1
name: prometheus
namespace: monitoring
resourceVersion: "94371309"
selfLink: /apis/extensions/v1beta1/namespaces/monitoring/ingresses/prometheus
uid: 265b224a-0ac7-450c-9cf0-a34cd010b1ec
spec:
rules:
- host: prometheus.k8s.lan
http:
paths:
- backend:
serviceName: prometheus
servicePort: http
path: /
status:
loadBalancer: {}
$ kubectl -n monitoring get endpoints prometheus -o yaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
endpoints.kubernetes.io/last-change-trigger-time: "2020-03-22T15:21:11Z"
creationTimestamp: "2020-01-01T19:57:08Z"
labels:
service.kubernetes.io/headless: ""
name: prometheus
namespace: monitoring
resourceVersion: "133667974"
selfLink: /api/v1/namespaces/monitoring/endpoints/prometheus
uid: 9ccf0e33-a1ac-447c-ae93-6d2a7162d75f
subsets:
- addresses:
- ip: 10.112.10.2
nodeName: k8s-node12
targetRef:
kind: Pod
name: prometheus-5654c5c5df-rcwx6
namespace: monitoring
resourceVersion: "133667972"
uid: 6977a701-b639-4b6c-b1c6-d893a87e1315
ports:
- name: http
port: 9090
protocol: TCP
And, I have two traefik pods, endpoints:
$ kubectl -n ingress-traefik get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
traefik-ingress-controller-6656b6b56-5mrcj 1/1 Running 0 139m 10.112.6.216 k8s-node06 <none> <none>
traefik-ingress-controller-6656b6b56-8s2tn 1/1 Running 0 142m 10.112.13.40 k8s-node05 <none> <none>
$ kubectl -n ingress-traefik get endpoints traefik-ingress-service -o yaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
endpoints.kubernetes.io/last-change-trigger-time: "2020-03-22T13:25:55Z"
creationTimestamp: "2020-01-24T19:22:56Z"
name: traefik-ingress-service
namespace: ingress-traefik
resourceVersion: "133626548"
selfLink: /api/v1/namespaces/ingress-traefik/endpoints/traefik-ingress-service
uid: ca7441cf-6f33-446a-b328-5617e07ac26e
subsets:
- addresses:
- ip: 10.112.13.40
nodeName: k8s-node05
targetRef:
kind: Pod
name: traefik-ingress-controller-6656b6b56-8s2tn
namespace: ingress-traefik
resourceVersion: "133625315"
uid: 77fb2877-9591-40d3-834f-1e61a74cd894
- ip: 10.112.6.216
nodeName: k8s-node06
targetRef:
kind: Pod
name: traefik-ingress-controller-6656b6b56-5mrcj
namespace: ingress-traefik
resourceVersion: "133626546"
uid: b6b93c7d-ee5d-496c-8ee6-ede116dfd4c1
ports:
- name: https
port: 8443
protocol: TCP
- name: http
port: 8000
protocol: TCP
Now, if try to access the ingress through each traefik, I get different results:
# curl -H 'Host: prometheus.k8s.lan' http://10.112.6.216:8000/
<a href="/graph">Found</a>.
# curl -H 'Host: prometheus.k8s.lan' http://10.112.13.40:8000/
Service Unavailable#
Just deleteing the ingress-traefik pod with IP 10.112.13.40 usually resolves the problem.
$ kubectl -n ingress-traefik get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
traefik-ingress-controller-6656b6b56-5mrcj 1/1 Running 0 145m 10.112.6.216 k8s-node06 <none> <none>
traefik-ingress-controller-6656b6b56-bd9pq 1/1 Running 0 14s 10.112.4.238 k8s-node11 <none> <none>
The old one and the new pod serves the ingress right:
# curl -H 'Host: prometheus.k8s.lan' http://10.112.6.216:8000/
<a href="/graph">Found</a>.
# curl -H 'Host: prometheus.k8s.lan' http://10.112.4.238:8000/
<a href="/graph">Found</a>.
My traefik deployment is:
$ kubectl -n ingress-traefik get deploy traefik-ingress-controller -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "50"
creationTimestamp: "2019-09-23T20:32:27Z"
generation: 59
labels:
k8s-app: traefik-ingress-lb
name: traefik-ingress-controller
namespace: ingress-traefik
resourceVersion: "133626552"
selfLink: /apis/apps/v1/namespaces/ingress-traefik/deployments/traefik-ingress-controller
uid: 0154be87-f7ac-4e3f-affe-f4132422a216
spec:
progressDeadlineSeconds: 2147483647
replicas: 2
revisionHistoryLimit: 2147483647
selector:
matchLabels:
k8s-app: traefik-ingress-lb
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
k8s-app: traefik-ingress-lb
name: traefik-ingress-lb
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- traefik-ingress-lb
topologyKey: kubernetes.io/hostname
containers:
- args:
- --api
- --kubernetes
- --entrypoints=Name:http Address::8000
- --entrypoints=Name:https Address::8443 TLS
- --forwardingtimeouts.dialtimeout=5s
- --kubernetes.namespaces=xx,yy,zz
env:
- name: GOGC
value: "50"
image: traefik:v1.7.21
imagePullPolicy: IfNotPresent
name: traefik-ingress-lb
ports:
- containerPort: 8000
name: http
protocol: TCP
- containerPort: 8443
name: https
protocol: TCP
- containerPort: 8080
name: admin
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 8080
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
requests:
cpu: 250m
memory: 32Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
nodeSelector:
ingress-controller: traefik
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
runAsGroup: 8888
runAsUser: 8888
serviceAccount: traefik-ingress-controller
serviceAccountName: traefik-ingress-controller
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 60
I am running this on bare PI boards. Perhaps, they are not as fast as a vm on intel hardware, but still, I suspect it should work. Perhaps it misses some events from k8s during processing another?