Hi.
I raised the following issue, but the traefiker bot closed it thinking it is not a bug, but it definitely is a bug.
I'll repeat the details of the issue below:
I have been testing DR scenarios of a Kubernetes cluster, and while doing so was tearing it down completely and recreating it.
After doing that, the Traefik Kubernetes Service would get stuck in the pending state.
Describing the Traefik Kubernetes Service resource showed events like so:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning CreatingLoadBalancerFailed 13m service-controller Error creating load balancer (will retry): failed to ensure load balancer for service traefik/traefik: Error adding tags after creating Load Balancer: "ValidationError: Only one resource can be tagged at a time\n\tstatus code: 400, request id: ac350d6d-1267-40e5-8e53-06f285a6cc85"
Warning CreatingLoadBalancerFailed 13m service-controller Error creating load balancer (will retry): failed to ensure load balancer for service traefik/traefik: error authorizing security group ingress: "InvalidParameterValue: The same permission must not appear multiple times\n\tstatus code: 400, request id: 5f6c1198-e83c-4120-8399-bff08b36bc37"
Warning CreatingLoadBalancerFailed 13m service-controller Error creating load balancer (will retry): failed to ensure load balancer for service traefik/traefik: error authorizing security group ingress: "InvalidParameterValue: The same permission must not appear multiple times\n\tstatus code: 400, request id: 9a4b6308-7ac1-4731-bd88-b5ddf3a687db"
Warning CreatingLoadBalancerFailed 13m service-controller Error creating load balancer (will retry): failed to ensure load balancer for service traefik/traefik: error authorizing security group ingress: "InvalidParameterValue: The same permission must not appear multiple times\n\tstatus code: 400, request id: b672c5ed-8724-4e0c-be7e-ab90faebdc38"
Warning CreatingLoadBalancerFailed 12m service-controller Error creating load balancer (will retry): failed to ensure load balancer for service traefik/traefik: error authorizing security group ingress: "InvalidParameterValue: The same permission must not appear multiple times\n\tstatus code: 400, request id: 7a8b0541-63c4-4c5b-abde-27a953dee98d"
Warning CreatingLoadBalancerFailed 11m service-controller Error creating load balancer (will retry): failed to ensure load balancer for service traefik/traefik: error authorizing security group ingress: "InvalidParameterValue: The same permission must not appear multiple times\n\tstatus code: 400, request id: 74cf3b4f-3cce-4c77-8283-b3f007b62cc6"
Warning CreatingLoadBalancerFailed 8m30s service-controller Error creating load balancer (will retry): failed to ensure load balancer for service traefik/traefik: error authorizing security group ingress: "InvalidParameterValue: The same permission must not appear multiple times\n\tstatus code: 400, request id: a7ca4a3a-2daa-4511-92eb-46d0a2d8a74b"
Normal EnsuringLoadBalancer 3m30s (x8 over 13m) service-controller Ensuring load balancer
Warning CreatingLoadBalancerFailed 3m29s service-controller Error creating load balancer (will retry): failed to ensure load balancer for service traefik/traefik: error authorizing security group ingress: "InvalidParameterValue: The same permission must not appear multiple times\n\tstatus code: 400, request id: 8b359cc1-431e-49f5-99e8-740b0ec20263"
When looking at the details of one of the requests the parameters for it looked like so:
"requestParameters": {
"groupId": "sg-05d0100770a4040d1",
"ipPermissions": {
"items": [
{
"ipProtocol": "tcp",
"fromPort": 31363,
"toPort": 31363,
"groups": {},
"ipRanges": {
"items": [
{
"cidrIp": "10.200.0.0/16",
"description": "kubernetes.io/rule/nlb/health=a94ac80b3f0a011e9adbb06a83b63b7b"
}
]
},
"ipv6Ranges": {},
"prefixListIds": {}
},
{
"ipProtocol": "tcp",
"fromPort": 31363,
"toPort": 31363,
"groups": {},
"ipRanges": {
"items": [
{
"cidrIp": "10.192.0.0/16",
"description": "kubernetes.io/rule/nlb/health=a94ac80b3f0a011e9adbb06a83b63b7b"
}
]
},
"ipv6Ranges": {},
"prefixListIds": {}
},
{
"ipProtocol": "tcp",
"fromPort": 31363,
"toPort": 31363,
"groups": {},
"ipRanges": {
"items": [
{
"cidrIp": "10.224.0.0/16",
"description": "kubernetes.io/rule/nlb/health=a94ac80b3f0a011e9adbb06a83b63b7b"
}
]
},
"ipv6Ranges": {},
"prefixListIds": {}
},
{
"ipProtocol": "tcp",
"fromPort": 31363,
"toPort": 31363,
"groups": {},
"ipRanges": {
"items": [
{
"cidrIp": "10.224.0.0/16",
"description": "kubernetes.io/rule/nlb/health=a94ac80b3f0a011e9adbb06a83b63b7b"
}
]
},
"ipv6Ranges": {},
"prefixListIds": {}
}
]
}
},
As you can see, the 10.224.0.0/16
CIDR block, which is the one I was using for the testing cluster is trying to have 2 identical rules added.
After I gave up and tore the cluster down again, I thought of describing the associated CIDR blocks for the VPC and I saw:
$ aws ec2 describe-vpcs --query "Vpcs[].CidrBlockAssociationSet[]" --vpc-ids $VPC_ID
[
{
"AssociationId": "vpc-cidr-assoc-2f83dd46",
"CidrBlock": "10.200.0.0/16",
"CidrBlockState": {
"State": "associated"
}
},
{
"AssociationId": "vpc-cidr-assoc-0fb443f44a5bb8fdc",
"CidrBlock": "10.192.0.0/16",
"CidrBlockState": {
"State": "associated"
}
},
{
"AssociationId": "vpc-cidr-assoc-0ebb44b68a8aeb0db",
"CidrBlock": "10.224.0.0/16",
"CidrBlockState": {
"State": "disassociated"
}
},
{
"AssociationId": "vpc-cidr-assoc-0872f424aa32e3389",
"CidrBlock": "10.224.0.0/16",
"CidrBlockState": {
"State": "disassociated"
}
}
]
The 10.224.0.0/16
CIDR block appears twice here.
Both are disassociated because I tore the cluster down, but I've checked again and I can have one associated with multiple disassociated as well.
Over time (an hour or more) these disassociated CIDR blocks eventually go away at which point a Traefik install will work fine.
This is just a guess, but it would seem that the code that Traefik uses to build up the list of CIDRs to add to the Security Group ingress rules in AWS should filter on associated blocks only.