Published on: Friday, November 29, 2024

Docker IPv6 Pitfalls, or "Why does my Traefik not see the Source IP for this service?"

Introduction

Recently, I deployed IPv6 connectivity for a medium-sized network—1,500 devices across 180 users. The logical next step was enabling IPv6 connectivity for services hosted within the network, so users could access these internal services using IPv6 as well.

We use Traefik as our central reverse proxy, which tunnels requests to various web services. Due to the nature of our network, some services need to be restricted to internal IP ranges that devices are assigned through NAT (Network Address Translation).

This story starts when I adapted the firewall rules in Traefik via a VPN connection. I added our IPv6 range to the Traefik firewall middleware and believed everything was set up correctly. However, a couple of days later, reports started piling in that one of the services was returning a 403 Forbidden error from the internal IP ranges.

At first, I was puzzled. The firewall rules seemed fine, and all other services were working fine. So why were internal users being denied access?

Have you ever encountered a situation where everything looks right on paper, but in practice, something just doesn't add up? That's where I found myself.

Our Setup

To manage service discovery, our Traefik setup relies exclusively on file-based configurations. This approach suits our environment because we have many legacy services running on their own virtual machines, not inside Docker containers. Here's how we define the services:

http:
  routers:
    unifi:
      entryPoints:
        - "https"
      rule: "Host(`unifi.mynetwork`)"
      middlewares:
        - mynetwork-internal
      service: unifi

  services:
    unifi:
      loadBalancer:
        servers:
          - url: "https://100.137.172.45:8443/"
        passHostHeader: true

Let's break down this configuration:

Router unifi: Listens on the https entry point and matches requests to unifi.mynetwork.
Middlewares: Applies the mynetwork-internal middleware chain to incoming requests.
Service unifi: Uses a load balancer to forward requests to the specified server URL. The passHostHeader: true setting ensures the original Host header is preserved when forwarding requests, which is important for services that rely on it for virtual hosting or SSL certificates.

The middleware is defined as follows:

middlewares:
  # Middleware that allows requests only from the Mynetwork IP ranges
  mynetwork-ipallowlist:
    ipAllowList:
      sourceRange:
        - "100.137.172.0/24"
        - "100.137.173.0/24"
        - "fd67:a758:0ca9:7aa9::/48"

  mynetwork-internal:
    chain:
      middlewares:
        - mynetwork-ipallowlist
        - default-headers
        - noindex-header

Here's what each middleware does:

mynetwork-ipallowlist: Restricts access to the specified IPv4 and IPv6 ranges (sidenote: The IP addresses shown have been adapted for this blog post. In our actual setup, they are globally routable IPv4 and IPv6 subnets provided by our ISP. ) , allowing only devices within our network to access the service.
mynetwork-internal: Chains together multiple middlewares:
- mynetwork-ipallowlist: As described above.
- default-headers: Adds standard security headers to responses (e.g., HSTS, X-Content-Type-Options) to enhance security.
- noindex-header: Adds headers to prevent indexing by search engines (e.g., X-Robots-Tag: noindex), ensuring internal services aren't accidentally exposed in search results.

The Problem

While checking the access logs, I noticed something peculiar: all services were correctly seeing the source IP for incoming requests, except for the one service that was returning a 403 Forbidden error.

traefik  | 100.137.173.216 - - [28/Nov/2024:22:11:11 +0000] "GET /api2/json/nodes/pvenode1/storage/backupserver/status HTTP/2.0" 200 143 "-" "-" 37 "pve-pvenode1@file" "https://100.137.172.82:8006/" 940ms
traefik  | 172.19.0.1 - - [28/Nov/2024:22:12:34 +0000] "GET / HTTP/2.0" 403 9 "-" "-" 40 "brokenservice@file" "-" 0ms

In the logs above, you can see that the first entry shows the source IP 100.137.173.216, which is an internal IP from our network. However, the second entry shows 172.19.0.1 as the source IP for the "brokenservice", which is the Docker bridge IP address.

This was odd and led me to believe that I had somehow misconfigured Traefik. I double-checked all the settings, but everything seemed in order.

Only later, after discussing with a colleague, did it occur to me that IPv6 might be the culprit. I had recently enabled IPv6 only for this particular service. Since I was accessing the network via a VPN that supports only IPv4, I hadn't noticed any issues when I initially adapted the firewall rules.

As it turns out, Docker does not enable IPv6 by default. When IPv6 requests come in, Docker's bridge network will NAT the IPv6 traffic, causing the original source IP to be lost and replaced with the bridge IP (172.19.0.1). This means that Traefik sees all incoming IPv6 requests as coming from the Docker bridge IP, not from the client's actual IP address.

Solution

After pinpointing that Docker's handling of IPv6 was the root of the problem, I realized that Docker does not enable IPv6 support by default. Initially, I didn't consider this a concern because I wasn't connecting my services via IPv6 internally; I was only terminating IPv6 at Traefik. However, without IPv6 support enabled in Docker, incoming IPv6 requests are NATed through the Docker bridge network, causing the original source IP address to be lost. This meant that Traefik saw all incoming requests as originating from the Docker bridge IP (172.19.0.1), which didn't match our internal IP ranges specified in the ipAllowList middleware.

To resolve this issue, I needed to enable IPv6 support in Docker. Here's how I did it:

1. Update Docker's Daemon Configuration

I edited the Docker daemon configuration file at /etc/docker/daemon.json to enable experimental features and IPv6 support:

/etc/docker/daemon.json

{
  "experimental": true,
  "ip6tables": true
}

experimental: Enables Docker's experimental features, which include IPv6 support in some versions.
ip6tables: Allows Docker to manipulate ip6tables rules for IPv6 traffic.

Note: Depending on your Docker version, enabling IPv6 might not require the experimental flag. Be sure to check the Docker documentation relevant to your version.

2. Restart the Docker Service

After saving the changes to daemon.json, I restarted the Docker service to apply the new configuration:

sudo systemctl restart docker

3. Recreate the Docker Network with IPv6 Support

Next, I needed to recreate the Docker network used by Traefik to ensure it supports IPv6:

docker network rm traefik
docker network create --ipv6 traefik

Remove the Existing Network: Deleting the existing traefik network ensures that it's recreated with the new IPv6 settings.
Create a New Network with IPv6: The --ipv6 flag enables IPv6 support on the new network.

4. Verify the Network Configuration

I checked the details of the newly created network to confirm that IPv6 was enabled:

docker network inspect traefik

The output showed that IPv6 was indeed enabled, and an IPv6 subnet was assigned.

5. Restart Traefik and Connected Services

Finally, I restarted Traefik and any services connected to the traefik network to ensure they picked up the new network configuration.

Final Thoughts

This experience underscored the importance of considering all aspects of network configuration when deploying IPv6, especially in a mixed IPv4/IPv6 environment. Even though I wasn't using IPv6 internally between services, the lack of IPv6 support in Docker affected how Traefik perceived incoming requests and applied access controls.

By sharing this troubleshooting journey, I hope to help others navigate the nuances of IPv6 deployment in Dockerized environments. If you have questions or need further clarification on any steps, feel free to reach out!