Traefik + Docker + LetsEncrypt | Migration v1.7 => v2.0 | epic fail!

Dear Traefik Team,

I deploy Docker with Wordpress, Matomo and Nextcloud Containers on a Server. The reverse-proxy I used until now was "nginx-proxy" from jwilder. But I wanted to get away from the nginx-proxy because it has some weird config problems plus what happens if jwilder abandons his Project?

So I stumbled upon Traefik v1.7 and after two weeks I got everything working super fine and was very happy!

Then I saw, that v2 is just aroud the corner. So I thought "hey, give it a try, what could possibly go wrong?". Well, in one word: Everything!

Look, I am aware that this is your baby and I mustn't complain in any way, since you give away such a fantastic product for free. But nevertheless I don't get it... I can't understand why you break a superior product, such a great piece of software, beyond repair..?

One thing first of all: Why are you using such obscure apostrophes? " ' " this one is the good one, right? That is at least the one that I used in every single Config until now. Why do you use " ` " that one for traefik v2? That alone caused one day of major frustration until I looked at it very, very closely and solved the issue.

I don't want to offend anyone, but sorry, there is just no way to sugar the hammer: Your Documentations are pretty useless for a person completely new to Traefik. They are more like a reference for very experienced folks that already have a vast knowledge of Traefik. But not knowing anything about it, it is impossible - well, at least for me - to get a Config out of this.
Together with the v1.7 Config, all Youtube Tutorials about Traefik and a lot of research at some Blog-Posts of people who got this running, I managed to finally understand how this all comes together in v1.7. But v2.0 Documentation? Alone? No way to get a working Config out of this one. Some examples use "toml", some prefer "label", the ones with toml lack the label sections and vice versa. None of them tell "the whole stroy". And if I mix things in the Configs, they break. This is so extremely frustrating...

See, here are my v1.7 Configs and some docker-compose.yml I use for spinning up the Containers:

/root/docker/traefik-17/acme/acme.json (touched and chmodded 600)
/root/docker/traefik-17/traefik.toml

##START
logLevel = "INFO" #DEBUG, INFO, WARN, ERROR, FATAL, PANIC
InsecureSkipVerify = true
sendAnonymousUsage = true
defaultEntryPoints = ["http", "https"]

# WEB interface of Traefik - Dashboard
[api]
  entryPoint = "dashboard"
  dashboard = true

# Entrypoints HTTP / Force HTTPS + Auth Dashboard
[entryPoints]
  [entryPoints.http]
    address = ":80"
    [entryPoints.http.redirect]
      entryPoint = "https"

  [entryPoints.https]
    address = ":443"

  [entryPoints.https.tls]

  [entryPoints.dashboard]
    address = ":8080"
    [entryPoints.dashboard.auth.basic]
# Ausgabe von "cat ~/docker/traefik-17/.htpasswd" verwenden
      users = ["some-user:$2y$05$N4BQ.qAVsUn8/f6Wct/ZNuBbiIlT9Z82GA3SkUXmRZKyw7PCAY4P6"]

# Let's Encrypt BASIC Settings
# https://docs.traefik.io/configuration/acme/
[acme]
  acmeLogging = true
  email = "bitterbolt@posteo.de"
  storage = "/etc/traefik/acme/acme.json"

# Let's Encrypt STAGING-Server zum Testen. AUSKOMMENTIEREN, wenn wir bereit sind für den RealDeal :)
#caServer = "https://acme-staging-v02.api.letsencrypt.org/directory"

onHostRule = true

# Let's Encrypt >>TLS<< Challenge
[acme.tlsChallenge]
  entryPoint = "https"

# Connection to docker host system (docker.sock)
[docker]
  endpoint = "unix:///var/run/docker.sock"
  watch = true
  exposedByDefault = false
##EOF

/root/docker/traefik-17/docker-compose.yml

##START
version: '3'

services:
 traefik:
    image: traefik:1.7
    container_name: traefik
    restart: always
    networks:
      - web
    ports:
      - "80:80"
      - "443:443"
    labels:
      - "traefik.enable=true"
      - "traefik.frontend.rule=Host:traefik.v22018083922571964.megasrv.de"
      - "traefik.backend=traefik"
      - "traefik.port=8080"
      - "traefik.docker.network=web"
      - "traefik.frontend.headers.STSSeconds=315360000"
      - "traefik.frontend.headers.browserXSSFilter=true"
      - "traefik.frontend.headers.contentTypeNosniff=true"
      - "traefik.frontend.headers.forceSTSHeader=true"
      - "traefik.frontend.headers.STSIncludeSubdomains=true"
      - "traefik.frontend.headers.STSPreload=true"
      - "traefik.frontend.headers.frameDeny=true"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /root/docker/traefik-17:/etc/traefik
networks:
  web:
    external: true
##EOF

/root/docker/maria104master/docker-compose.yml

##Start
version: '3'
services:

 maria104master:
  image: mariadb:10.4
  restart: always
  container_name: mariadb
  networks:
    - intern
  environment:
   MYSQL_ROOT_PASSWORD: 'some-pw'
  volumes:
   - /var/xp.maria104masterdata:/var/lib/mysql
   - /root/docker/maria104master/conf:/etc/mysql/mariadb.conf.d
  ports:
   - "127.0.0.1:3306:3306"

networks:
  intern:
    external: true
##EOF

/var/www/wordpress/docker-compose.yml

##Start
version: '3'
services:

 wordpress:
  image: wordpress-php73-apache:latest
  container_name: wordpress
  restart: always
  networks:
   - intern
   - web
  ports:
   - 80
  environment:
   WORDPRESS_DB_NAME: some-database
   WORDPRESS_DB_USER: some-username
   WORDPRESS_DB_PASSWORD: 'some-pw'
   WORDPRESS_TABLE_PREFIX: some-prefix_
   WORDPRESS_DB_HOST: maria104master:3306
  labels:
   - "traefik.enable=true"
   - "traefik.backend=wordpress"
   - "traefik.frontend.rule=Host:wordpress.xp-server.de"  
   - "traefik.docker.network=web"
   - "traefik.frontend.headers.STSSeconds=315360000"
   - "traefik.frontend.headers.browserXSSFilter=true"
   - "traefik.frontend.headers.contentTypeNosniff=true"
   - "traefik.frontend.headers.forceSTSHeader=true"
   - "traefik.frontend.headers.STSIncludeSubdomains=true"
   - "traefik.frontend.headers.STSPreload=true"
   - "traefik.frontend.headers.frameDeny=true"
  volumes:
   - ./wpdata:/var/www/html
  external_links:
   - maria104master

networks:
  intern:
    external: true
  web:
    external: true
##EOF

After firing those up, everything works wonderful! The ACME challenge works, the http-https redirect works, the whole reverse-proxy-thing works fine as well.
In fact, this is the fastest running setup I ever had. Traefik seems to make something better than the nginx-proxy, and I was very glad with all applications running smoothly.

So, I migrated this to v2.0. Well. Tried.
I just can post the Traefik-Config, since that one in itself even won't work and so I can't check with the other docker-compose.yml files yet...

/root/docker/traefik-20/acme/acme.json (touched and chmodded 600)
/root/docker/traefik-20/traefik.toml

[global]
  checkNewVersion = true
  sendAnonymousUsage = true

[serversTransport]
  insecureSkipVerify = true

[entryPoints]
  [entryPoints.web]
    address = ":80"

  [entryPoints.web-secure]
    address = ":443"

#  [entryPoints.dashboard]
#    address = ":8080"
#    [entryPoints.dashboard.auth.basic]
#      users = ["traff:$2y$05$N4BQ.qAVsUn8/f6Wct/ZNuBbpIlT9Z82GA3SkUXmRZKyw7PCAY4P6"]
# "Port already in use" error if used...

[certificatesResolvers.letsencrypt.acme]
#  acmeLogging = true
  email = "bitterbolt@posteo.de"
  storage = "/etc/traefik/acme/acme.json"
  caServer = "https://acme-staging-v02.api.letsencrypt.org/directory"

  [certificatesResolvers.letsencrypt.acme.tlsChallenge]

[providers]
  [providers.docker]
    watch = true
    endpoint = "unix:///var/run/docker.sock"
    exposedByDefault = false
#    useBindPortIP=true
#    network = "web"
    swarmMode = false

[api]
  dashboard = true
  debug = true

[log]
  level = "DEBUG"

/root/docker/traefik-20/docker-compose.yml

version: '3'

services:
 traefik:
    image: traefik:2.0
    container_name: traefik
#    restart: always
    networks:
      - web
    ports:
      - "80:80"
      - "443:443"
#      - "8080:8080"
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.traefik.rule=Host(`traefik.v22018083922571964.megasrv.de`)"
      - "traefik.http.routers.traefik.entrypoints=web-secure"
      - "traefik.http.routers.whoami.tls.certresolver=letsencrypt"
#      - "traefik.backend=traefik"
      - "traefik.port=8080"
      - "traefik.docker.network=web"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /root/docker/traefik-20:/etc/traefik
networks:
  web:
    external: true

So when I fire the docker-compose up, this is what I get as response:

- level=warning msg="Error checking new version: BaseURL must have a trailing slash, but \"https://update.traefik.io\" does not"

- level=info msg="Starting provider *acme.Provider {\"email\":\"bitterbolt@posteo.de\",\"caServer\":\"https://acme-staging-v02.api.letsencrypt.org/directory\",\"storage\":\"/etc/traefik/acme/acme.json\",\"keyType\":\"RSA4096\",\"tlsChallenge\":{},\"ResolverName\":\"letsencrypt\",\"store\":{},\"ChallengeStore\":{}}"
- level=info msg="Testing certificate renew..." providerName=letsencrypt.acme
- level=debug msg="Configuration received from provider letsencrypt.acme: {\"http\":{},\"tls\":{}}" providerName=letsencrypt.acme
- level=debug msg="No default certificate, generating one"

= So acme should work? But it doesn't. Traefik self-signs a certificate for traefik.v22018083922571964.megasrv.de. When I open it in the Browser, Trafik gives me a "404 page not found". The Log says "server.go:3055: http: TLS handshake error from 84.183.65.150:56868: remote error: tls: bad certificate" the moment I open the URL.

= A moment later the acme kicks in, but has a lot of errors. Seems the Host(traefik.v22018083922571964.megasrv.de)-Rule is negated and somehow replaced with rule="Host(traefik-traefik-20)" what Let's Encrypt declines because of course this is no valid Domain Name. But I haven't set this rule anywhere in the config:

Try to challenge certificate for domain [traefik-traefik-20] founded in HostSNI rule" providerName=letsencrypt.acme routerName=whoami rule="Host(`traefik-traefik-20`)"
level=debug msg="Looking for provided certificate(s) to validate [\"traefik-traefik-20\"]..." rule="Host(`traefik-traefik-20`)" providerName=letsencrypt.acme routerName=whoami
level=debug msg="Domains [\"traefik-traefik-20\"] need ACME certificates generation for domains \"traefik-traefik-20\"." routerName=whoami rule="Host(`traefik-traefik-20`)" providerName=letsencrypt.acme
level=debug msg="Loading ACME certificates [traefik-traefik-20]..." providerName=letsencrypt.acme routerName=whoami rule="Host(`traefik-traefik-20`)"
level=debug msg="Building ACME client..." providerName=letsencrypt.acme
level=debug msg="https://acme-staging-v02.api.letsencrypt.org/directory" providerName=letsencrypt.acme
level=debug msg="Using TLS Challenge provider." providerName=letsencrypt.acme
level=debug msg="legolog: [INFO] [traefik-traefik-20] acme: Obtaining bundled SAN certificate"
level=error msg="Unable to obtain ACME certificate for domains \"traefik-traefik-20\": unable to generate a certificate for the domains [traefik-traefik-20]: acme: error: 400 :: POST :: https://acme-staging-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:rejectedIdentifier :: Error creating new order :: Cannot issue for \"traefik-traefik-20\": DNS name does not have enough labels, url: " routerName=whoami rule="Host(`traefik-traefik-20`)" providerName=letsencrypt.acme

At the end of the log, Traefik sends statistics data and throws another error:

level=info msg="Anonymous stats sent to https://collect.traefik.io/9vxmmkcdmalbdi635d4jgc5p5rx0h7h8: {\"global\":{\"checkNewVersion\":true,\"sendAnonymousUsage\":true},\"serversTransport\":{\"insecureSkipVerify\":true,\"maxIdleConnsPerHost\":200},\"entryPoints\":{\"traefik\":{\"address\":\"xxxx\"},\"web\":{\"address\":\"xxxx\"},\"web-secure\":{\"address\":\"xxxx\"}},\"providers\":{\"providersThrottleDuration\":2000000000,\"docker\":{\"watch\":true,\"endpoint\":\"xxxx\",\"defaultRule\":\"xxxx\",\"swarmModeRefreshSeconds\":15000000000}},\"api\":{\"dashboard\":true,\"debug\":true},\"log\":{\"level\":\"DEBUG\",\"format\":\"xxxx\"},\"certificatesResolvers\":{\"letsencrypt\":{\"acme\":{\"email\":\"xxxx\",\"caServer\":\"xxxx\",\"storage\":\"xxxx\",\"keyType\":\"xxxx\"}}}}"
level=debug msg="unknown kind to hash: func"

Basically, Traefik is running, but not doing what I want him to do.
And now I am out of options, because I don't see obvious errors and even if, there is no info whatsoever on how to correct them.

What I'd like to know please, if this behaviour is caused by the "beta"-nature of the 2.0 Traefik, or this really is intended?
And if it is intentionally: How long do you plan to support Version 1.x, until everyone is forced to switch to 2.0?

I thank you so much for the 1.7 Traefik experience. I saw with my own eyes, how blazing fast and smooth v1.7 of Traefik worked on my Test-Machine. So I'd really deeply regret if I have to move back to nginx-proxy.

Yours
Peter.

In the Traefik v2 documentation:

The reference section contains a dedicated pages with all possible options for all kind of configuration:

The user guides contains several complete configuration for Docker and Let's Encrypt:


Your configuration for Traefik v1 contains errors, the valid configuration is:

[acme]
  acmeLogging = true
  email = "bitterbolt@posteo.de"
  storage = "/etc/traefik/acme/acme.json"
  entryPoint = "https"
#caServer = "https://acme-staging-v02.api.letsencrypt.org/directory"
  onHostRule = true
[acme.tlsChallenge]

I don't understand because we only use standard quotes:

  • " double quote (TOML, YAML, labels)
  • ' simple quote (YAML, labels)
  • ` backtick (only for rule values because the whole rule is already contains inside double quote)

The problem will be fixed in the next version Fix trailing slash with check new version by mmatur · Pull Request #5266 · traefik/traefik · GitHub


It's not an error but a debug log to say that a field is skipped during the serialization.


Traefik v2 is in the RC stage, so there may be some bugs.
The v1 will be supported (bug fixes only) ~1 year after the GA of the v2.0.
We do not force anyone to migrate to v2.

Thank you very much for the insight, Idez!

I will root out the error in my v1 Config, thanks for pointing this out.

With my v2 adventure though... I played a little and tried to understand the Docu, but there was no way to get this to work. So I will wait until v2 is finalized and there are plenty of tutorials out in the wild. Maybe I understand how this works by following those examples step by step, like I did with v1.

One year is plenty of time to learn all the tricks that are needed in order to upgrade to v2.

Thanks again for your time!
Peter.

P.S.: I will have a close look at the new "Migration" Section of the Documentation:
https://docs.traefik.io/v2.0/migration/v1-to-v2/
Maybe this will solve my issues over time :slight_smile:

P.P.S.: I am stupid and got the acme-config wrong in the v2 example using "...whoami..." instead of "...traefik...".

The beauty with v1 was, that I needed to only change two lines of code for each traefik-label-section: (traefik.frontend.rule=Host...) and (traefik.backend=...). With v2 it seems that there are a lot more lables that will need unique adjustments.

2 Likes

+1 on too many labels, with no ability to extract a common set of configurations into 1 location and re-use.

It seems like the configuration is an "all-or-nothing" game. When using marathon as provider, the only difference between my staging and prod setup is the endpoint URL. I cannot say, bake a majority of the marathon provider config inside, and then only override the URL endpoint with one config in CLI or environment file, I have to specify the whole set of configs, all over again at run time.

Easiest solution for me to run Traefik 2.0 between staging/prod is to bake separate images with different provider configs built-in. Or have different config files inside the traefik docker image, and at load-time run them differently depending upon environment variable.

The whole workflow is very un-containerized. Basically in order for me to de-duplicate everything, I have to run in a very un-containerized way.

With v2, if you want to use the api as explained in the doc, you'll face one issue I had where you have to expose some kind of port when using the labels. I'd say it's pretty much a bug as you cannot use "@" in the definition of services but the service for api is called "api@internal" so if you want to add a label

traefik.http.services.api@internal.loadbalancer.server.port = 8080 it wouldn't work... But regardless of that, you have to expose at least one port or traefik will discard the service/routers with the error "port missing on controller...blah.2

So I had to create a service called "api-service" pointing to an non existing port and then traefik was happy yet not using the newly created service anywhere.