Ingress Controller Service trying to add duplicate rules to AWS Security Group

Hi.

I raised the following issue, but the traefiker bot closed it thinking it is not a bug, but it definitely is a bug.

I'll repeat the details of the issue below:

I have been testing DR scenarios of a Kubernetes cluster, and while doing so was tearing it down completely and recreating it.
After doing that, the Traefik Kubernetes Service would get stuck in the pending state.

Describing the Traefik Kubernetes Service resource showed events like so:

Events:
  Type     Reason                      Age                  From                Message
  ----     ------                      ----                 ----                -------
  Warning  CreatingLoadBalancerFailed  13m                  service-controller  Error creating load balancer (will retry): failed to ensure load balancer for service traefik/traefik: Error adding tags after creating Load Balancer: "ValidationError: Only one resource can be tagged at a time\n\tstatus code: 400, request id: ac350d6d-1267-40e5-8e53-06f285a6cc85"
  Warning  CreatingLoadBalancerFailed  13m                  service-controller  Error creating load balancer (will retry): failed to ensure load balancer for service traefik/traefik: error authorizing security group ingress: "InvalidParameterValue: The same permission must not appear multiple times\n\tstatus code: 400, request id: 5f6c1198-e83c-4120-8399-bff08b36bc37"
  Warning  CreatingLoadBalancerFailed  13m                  service-controller  Error creating load balancer (will retry): failed to ensure load balancer for service traefik/traefik: error authorizing security group ingress: "InvalidParameterValue: The same permission must not appear multiple times\n\tstatus code: 400, request id: 9a4b6308-7ac1-4731-bd88-b5ddf3a687db"
  Warning  CreatingLoadBalancerFailed  13m                  service-controller  Error creating load balancer (will retry): failed to ensure load balancer for service traefik/traefik: error authorizing security group ingress: "InvalidParameterValue: The same permission must not appear multiple times\n\tstatus code: 400, request id: b672c5ed-8724-4e0c-be7e-ab90faebdc38"
  Warning  CreatingLoadBalancerFailed  12m                  service-controller  Error creating load balancer (will retry): failed to ensure load balancer for service traefik/traefik: error authorizing security group ingress: "InvalidParameterValue: The same permission must not appear multiple times\n\tstatus code: 400, request id: 7a8b0541-63c4-4c5b-abde-27a953dee98d"
  Warning  CreatingLoadBalancerFailed  11m                  service-controller  Error creating load balancer (will retry): failed to ensure load balancer for service traefik/traefik: error authorizing security group ingress: "InvalidParameterValue: The same permission must not appear multiple times\n\tstatus code: 400, request id: 74cf3b4f-3cce-4c77-8283-b3f007b62cc6"
  Warning  CreatingLoadBalancerFailed  8m30s                service-controller  Error creating load balancer (will retry): failed to ensure load balancer for service traefik/traefik: error authorizing security group ingress: "InvalidParameterValue: The same permission must not appear multiple times\n\tstatus code: 400, request id: a7ca4a3a-2daa-4511-92eb-46d0a2d8a74b"
  Normal   EnsuringLoadBalancer        3m30s (x8 over 13m)  service-controller  Ensuring load balancer
  Warning  CreatingLoadBalancerFailed  3m29s                service-controller  Error creating load balancer (will retry): failed to ensure load balancer for service traefik/traefik: error authorizing security group ingress: "InvalidParameterValue: The same permission must not appear multiple times\n\tstatus code: 400, request id: 8b359cc1-431e-49f5-99e8-740b0ec20263"

When looking at the details of one of the requests the parameters for it looked like so:

    "requestParameters": {
        "groupId": "sg-05d0100770a4040d1",
        "ipPermissions": {
            "items": [
                {
                    "ipProtocol": "tcp",
                    "fromPort": 31363,
                    "toPort": 31363,
                    "groups": {},
                    "ipRanges": {
                        "items": [
                            {
                                "cidrIp": "10.200.0.0/16",
                                "description": "kubernetes.io/rule/nlb/health=a94ac80b3f0a011e9adbb06a83b63b7b"
                            }
                        ]
                    },
                    "ipv6Ranges": {},
                    "prefixListIds": {}
                },
                {
                    "ipProtocol": "tcp",
                    "fromPort": 31363,
                    "toPort": 31363,
                    "groups": {},
                    "ipRanges": {
                        "items": [
                            {
                                "cidrIp": "10.192.0.0/16",
                                "description": "kubernetes.io/rule/nlb/health=a94ac80b3f0a011e9adbb06a83b63b7b"
                            }
                        ]
                    },
                    "ipv6Ranges": {},
                    "prefixListIds": {}
                },
                {
                    "ipProtocol": "tcp",
                    "fromPort": 31363,
                    "toPort": 31363,
                    "groups": {},
                    "ipRanges": {
                        "items": [
                            {
                                "cidrIp": "10.224.0.0/16",
                                "description": "kubernetes.io/rule/nlb/health=a94ac80b3f0a011e9adbb06a83b63b7b"
                            }
                        ]
                    },
                    "ipv6Ranges": {},
                    "prefixListIds": {}
                },
                {
                    "ipProtocol": "tcp",
                    "fromPort": 31363,
                    "toPort": 31363,
                    "groups": {},
                    "ipRanges": {
                        "items": [
                            {
                                "cidrIp": "10.224.0.0/16",
                                "description": "kubernetes.io/rule/nlb/health=a94ac80b3f0a011e9adbb06a83b63b7b"
                            }
                        ]
                    },
                    "ipv6Ranges": {},
                    "prefixListIds": {}
                }
            ]
        }
    },

As you can see, the 10.224.0.0/16 CIDR block, which is the one I was using for the testing cluster is trying to have 2 identical rules added.
After I gave up and tore the cluster down again, I thought of describing the associated CIDR blocks for the VPC and I saw:

$ aws ec2 describe-vpcs --query "Vpcs[].CidrBlockAssociationSet[]" --vpc-ids $VPC_ID
[
    {
        "AssociationId": "vpc-cidr-assoc-2f83dd46",
        "CidrBlock": "10.200.0.0/16",
        "CidrBlockState": {
            "State": "associated"
        }
    },
    {
        "AssociationId": "vpc-cidr-assoc-0fb443f44a5bb8fdc",
        "CidrBlock": "10.192.0.0/16",
        "CidrBlockState": {
            "State": "associated"
        }
    },
    {
        "AssociationId": "vpc-cidr-assoc-0ebb44b68a8aeb0db",
        "CidrBlock": "10.224.0.0/16",
        "CidrBlockState": {
            "State": "disassociated"
        }
    },
    {
        "AssociationId": "vpc-cidr-assoc-0872f424aa32e3389",
        "CidrBlock": "10.224.0.0/16",
        "CidrBlockState": {
            "State": "disassociated"
        }
    }
]

The 10.224.0.0/16 CIDR block appears twice here.
Both are disassociated because I tore the cluster down, but I've checked again and I can have one associated with multiple disassociated as well.
Over time (an hour or more) these disassociated CIDR blocks eventually go away at which point a Traefik install will work fine.

This is just a guess, but it would seem that the code that Traefik uses to build up the list of CIDRs to add to the Security Group ingress rules in AWS should filter on associated blocks only.

Hello @jim.barber-he,

Traefik does not manage or interact with LoadBalancer services.

Those are implemented and managed by your cloud provider (AWS).

Traefik also does not interact with the AWS infra outside of connecting to the kubernetes API service.

That is why the github issue was closed.

Oh I see. Thanks Daniel.
So it's code within Kubernetes that is configuring these ingress rules, and so therefore the problem I encountered could potentially could be happening with any ingress controller.
I'll investigate that path.

1 Like

@jim.barber-he,

No worries. Feel free to report your findings here, as other users may encounter the same issue.

Thanks!