Containers Cannot Reach Each Other

Do you want to request a feature or report a bug ?

Bug, likely in my configuration.

What did you do?

Using docker-compose on a DigitalOcean host, I'm setting up a Gitlab server, with a connection to a Registry running in a separate container. I am able to reach both remotely, but when I test the connection between Gitlab and the Registry, it's clear the containers cannot communicate.

I've deliberately disabled inter-container communication in my daemon.json, but I've also added the necessary containers to the same network. I'm also able to ping the Gitlab container from the Registry container, so I seem to have a route...

My docker-compose configuration for Traefik:

$ cat ./docker-compose.yml 
version: "3.7"


networks:
  traefik:
  repositories:
    ipam:
      config:
        - subnet: 172.17.217.192/26


services:
  traefik:
    container_name: traefik
    image: traefik:latest
    restart: always
    hostname: "traefik.${DOMAIN}"
    ports:
      - "80:80"
      - "443:443"
      - "5000:5000"
    networks:
      traefik:
      repositories:
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
      - "./traefik:/config:ro"
      - "/content/certs/letsencrypt/acme.json:/letsencrypt/acme.json:z"
      - "/logs/traefik:/logs"
      - "/etc/localtime:/etc/localtime:ro"
    command: --configFile=/config/traefik.toml
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.traefik.rule=host(`traefik.${DOMAIN}`)"
      - "traefik.http.routers.traefik.service=api@internal"
      - "traefik.http.routers.traefik.tls=true"
      - "traefik.http.routers.traefik.tls.certresolver=letsencrypt"
      - "traefik.http.routers.traefik.entrypoints=websecure"

      # Global redirect
      - "traefik.http.routers.http-catchall.rule=hostregexp(`{any:.+}`)"
      - "traefik.http.routers.http-catchall.entrypoints=web"
      - "traefik.http.routers.http-catchall.service=api@internal"
      - "traefik.http.routers.http-catchall.middlewares=redirect-to-https"
      - "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"

      # Providing middlewares
      - "traefik.http.routers.traefik.middlewares=traefikauth"
      - "traefik.http.middlewares.traefikauth.basicauth.users=${TRAEFIK_AUTH}"

And for my repositories:

$ cat ./docker-compose.repositories.yml 
version: "3.7"


services:
  registry:
    container_name: registry
    image: registry:2.7
    restart: always
    hostname: "registry.${DOMAIN}"
    depends_on:
      - traefik
    networks:
      repositories:
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.registry.tls=true"
      - "traefik.http.routers.registry.tls.certresolver=letsencrypt"
      - "traefik.http.routers.registry.entrypoints=registry"
      - "traefik.http.routers.registry.rule=host(`registry.${DOMAIN}`)"
      - "traefik.http.routers.registry.service=registry"
      - "traefik.http.services.registry.loadbalancer.server.port=5000"
    volumes:
      - "/content/certs/registry/registry-auth.crt:/certs/gitlab-registry.crt"
      - "/content/repositories/registry:/registry"
      - "/etc/localtime:/etc/localtime:ro"
    environment:
      REGISTRY_LOG_LEVEL: debug
      REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY: /registry
      REGISTRY_STORAGE_DELETE_ENABLED: "true"
      REGISTRY_AUTH_TOKEN_REALM: "https://git.${DOMAIN}/jwt/auth"
      REGISTRY_AUTH_TOKEN_SERVICE: container_registry
      REGISTRY_AUTH_TOKEN_ISSUER: gitlab-issuer
      REGISTRY_AUTH_TOKEN_ROOTCERTBUNDLE: /certs/gitlab-registry.crt


  gitlab:
    container_name: gitlab
    image: gitlab/gitlab-ce:latest
    restart: always
    hostname: "git.${DOMAIN}"
    depends_on:
      - traefik
    ports:
      - "22:22"
    networks:
      repositories:
        ipv4_address: 172.17.217.222
    volumes:
      - "/content/certs/registry/registry-auth.key:/mnt/gitlab-registry.key:ro"
      - "/content/repositories/gitlab:/var/opt/gitlab"
      - "/content/certs/gitlab:/etc/gitlab"
      - "/logs/gitlab:/var/log/gitlab"
      - "/etc/localtime:/etc/localtime:ro"
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.git.tls=true"
      - "traefik.http.routers.git.tls.certresolver=letsencrypt"
      - "traefik.http.routers.git.entrypoints=websecure"
      - "traefik.http.routers.git.rule=host(`git.${DOMAIN}`)"
      - "traefik.http.routers.git.service=git"
      - "traefik.http.services.git.loadbalancer.server.port=80"

    environment:
      GITLAB_OMNIBUS_CONFIG: |
        gitlab_rails['gitlab_shell_ssh_port'] = 22

        external_url "https://git.${DOMAIN}"
        nginx['listen_port'] = 80
        nginx['listen_https'] = false
        nginx['http2_enabled'] = false
        nginx['proxy_set_headers'] = {
          "Host" => "$$http_host",
          "X-Real-IP" => "$$remote_addr",
          "X-Forwarded-For" => "$$proxy_add_x_forwarded_for",
          "X-Forwarded-Proto" => "https",
          "X-Forwarded-Ssl" => "on"
        }

        registry['enable'] = false
        registry['internal_key'] = File.read("/mnt/gitlab-registry.key")
        gitlab_rails['registry_enabled'] = true
        gitlab_rails['registry_api_url'] = "https://registry.${DOMAIN}:5000"
        gitlab_rails['registry_issuer'] = "gitlab-issuer"
        gitlab_rails['registry_host'] = "registry.${DOMAIN}"
        gitlab_rails['registry_port'] = "5000"

What did you expect to see?

I expect a 200 Response when I visit the Container Registry URL for a Gitlab project. For example, this endpoint: https://git.bytecache.io/cryptanalysis/mtp-attacker/container_registry

This is live and public. You can reach this now without authentication. The homepage of this project is https://git.bytecache.io/cryptanalysis/mtp-attacker

The Registry is also live and publicly reachable, here: https://registry.bytecache.io:5000/v2/. This requires authentication, which I provide to Gitlab with a self-signed certificate, but you can still reach this to verify networking.

What did you see instead?

500 Response from https://git.bytecache.io/cryptanalysis/mtp-attacker/container_registry

The Gitlab logs are explicit, that it is a timeout, rather than an authentication problem.

==> /var/log/gitlab/nginx/gitlab_access.log <==
172.17.217.196 - - [07/Jun/2020:11:01:00 -0700] "GET /cryptanalysis/mtp-attacker/container_registry HTTP/1.1" 500 2926 "https://git.bytecache.io/cryptanalysis/mtp-attacker" "Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0" -

...

==> /var/log/gitlab/gitlab-rails/production.log <==
Completed 500 Internal Server Error in 31588ms (ActiveRecord: 30.9ms | Elasticsearch: 0.0ms | Allocations: 43855)
  
Faraday::TimeoutError (Failed to open TCP connection to registry.bytecache.io:5000 (Connection timed out - connect(2) for "registry.bytecache.io" port 5000)):

Output of traefik version : ( What version of Traefik are you using? )

$ sudo docker run traefik version
Version:      2.2.1
Codename:     chevrotin
Go version:   go1.14.2
Built:        2020-04-29T18:02:09Z
OS/Arch:      linux/amd64

What is your environment & configuration (arguments, toml, provider, platform, ...)?

Docker daemon configuration:

$ cat /etc/docker/daemon.json 
{
    "storage-driver": "overlay2",
    "log-driver": "json-file",
    "log-opts": {
      "max-size": "25m",
      "max-file": "3"
    },

    "bip": "172.17.217.1/28",

    "icc": false,
    "live-restore": true,
    "userland-proxy": false,
    "no-new-privileges": true,

    "tls": true,
    "tlsverify": true,
    "tlscacert": "/certs/docker/ca.pem",
    "tlscert": "/certs/docker/server-cert.pem",
    "tlskey": "/certs/docker/server-key.pem",
    
    "authorization-plugins": ["openpolicyagent/opa-docker-authz-v2:0.6"]
}

Traefik configuration:

$ cat ./traefik/traefik.toml 
[global]
  checkNewVersion = true

[entryPoints]
  [entryPoints.web]
    address = ":80"
  [entryPoints.websecure]
    address = ":443"
  [entryPoints.registry]
    address = ":5000"

[providers]
  [providers.docker]
    exposedByDefault = false
  [providers.file]
    filename = "/config/dynamic.toml"
    watch = true

[api]
  insecure = false
  dashboard = true
  debug = true

[log]
  level = "DEBUG"
  filePath = "logs/traefik.log"
  format = "json"

[accessLog]
  filePath = "logs/access.log"
  format = "json"

[certificatesResolvers]
  [certificatesResolvers.letsencrypt.acme]
    email = "sentry@bytecache.io"
    caServer = "https://acme-v02.api.letsencrypt.org/directory"
    storage = "/letsencrypt/acme.json"
    [certificatesResolvers.letsencrypt.acme.tlsChallenge]
$ cat ./traefik/dynamic.toml 
[tls]
  [tls.options]
    [tls.options.default]
      minVersion = "VersionTLS12"
      sniStrict = true
      cipherSuites = [
  # TLS 1.2 cipher suites.
        "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256",
        "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
        "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",

  # TLS 1.3 cipher suites.
        "TLS_CHACHA20_POLY1305_SHA256",
        "TLS_AES_256_GCM_SHA384",
        "TLS_AES_128_GCM_SHA256"]

If applicable, please paste the log output in DEBUG level ( --log.level=DEBUG switch)

No errors in this log.

Should this be working? Would it indicate an issue with Gitlab?

It does not look like it's related to traefik. I have not read your whole post, but from the first few paragraphs it appears that your non-traefik containers cannot talk to each other. Traefik has no influence on this process. You might get better results in docker specific communities.