Misusing Træfɪk for accessing internal services on Kubernetes

October 17, 2016

What’s the problem?

At JustWatch we are always striving to adhere to industry best-practices and keep on the forefront of important technology paradigms. For us, Kubernetes solves a lot of problems we would otherwise have to solve ourselves, so we are currently migrating parts of our infrastructure to hosted Kubernetes in the Google Cloud.

Many of our services are for the users of our site, and these are accessed throught the existing ingress options Kubernetes provides for public access.

But we also run several services which are for internal use only, for example various dashboards and similar services. Unfortunately, hosted Kubernetes (GKE) does not seem to provide an easy way to expose these services to internal users. Internal users are, in this case, defined as being from our private RFC1918 IP ranges.

What are the options?

There are several ways of accessing services running in Kubernetes:

  • ExternalName - Resolves service names to Pod IPs, only useable inside the cluster
  • ClusterIP - Allocates a cluster-internal IP for the service, only useable inside the cluster
  • NodePort - Allocates a random port on all worker nodes
  • LoadBalancer - Allocates a NodePort and creates a managed LoadBalancer with the cloud provider (if supported)

Building on top of these options are so called Layer 7 Ingress resources like the nginx ingress or the GCE L7 LoadBalancer ingress which supports to full stack of GCP networking features (SSL Termination, URL Maps, CloudCDN, …).

Unfortunately none of these did really match our use-case. The targeted users are not within the Kubernetes cluster nor are they on the internet. We could have made the services available through an external IP using e.g. type LoadBalancer or any of the L7 ingress resources - of course using appropricate access control - but that just didn’t feel right.

Public load balancers get published to the internet and properly firewalling them takes expertise and time. Also, load balancers on GCP aren’t for free. We don’t need the paid features for our internal services anyway.

We had a look at common ways to access internal Kubernetes resources, but most came across to us as hacks.

We believe in clean and sustainable solutions instead, so we tried to find a better approach to this requirement.

What is our solution?

First, in order to provide a stable endpoint, there was a dire need for a static IP, otherwise DNS would not really work well.

The second requirement was that this stable endpoint would need to proxy requests into Kubernetes - and in order to do that, it would need to know where the actual service would be running.

Third, and lastly, the endpoint would need to keep up with any changes in the underlying cluster architecture, preferably autoregistering services as they would be created. In this way, it would need to be connected to our service discovery layer.

We essentially wanted to abstract the fast changing world inside of Kubernetes with a proxy on the outside.

Working backwards

One very helpful service we use a lot is Hashicorp’s fantastic Consul. We mainly use it for it’s CP (see CAP) Key-Value store, but also for service discovery.

Introducing Træfɪk

The missing link for us proved to be Træfɪk. If you don’t know it yet, think of it as the little less-known cousin of Caddy that concerns itself only with the reverse proxying side of things.

Træfɪk can natively talk to Docker, Swarm, Kubernetes, Consul, Etcd and many more. Additionally Kubernetes has a nice and easy API, so we just needed some more glueing to connect the components.

We created a small application we call the Consul Publisher. There are already several projects with similar goals, but none was quite right for us. We neither wanted to run a sidekick container in each Pod nor could we force every application to register itself with Consul. We build a lot of our stack in-house with Go, but of course we also use third-party applications and those usually don’t support consul service registration.

The core of the publisher is presented below. Of course this Go code is missing a lot of details and error checking, but it should give you a rough understanding of what we do.

for {
	services := listK8sServices()
	ips := listGCEInstanceIPs()

	for svc := range services {
		for ip := range ips {
			port := services[svc].Port
			serviceID := fmt.Sprintf("%s-%s:%d", svc, ip, port)
			tags := append(services[svc].Labels, "autogenerated-" + cfg.Namespace)
			consul.RegisterCatalog(serviceID, svc, fmt.Sprintf("%s:%d", ip, port), ip, int(port), tags)
		}
	}
}

The general idea is to fetch a list of Kubernetes services which expose a NodePort and join these to the list of GKE worker nodes running within our GCE network. Then we synthesize a Consul Service for each of these and register them with Consul. The autogenerated tags helps with deregistration of services which are gone. As you may have guessed, we are running one Consul Publisher per Kubernetes namespace.

Once these services are in Consul, Træfɪk will pick them up almost instantly and create virtual hosts for them. We then only need to point the appropriate internal subdomain to the IP(s) of Træfɪk (if they don’t point there already).

Our Træfɪk config is actually very simple:

defaultEntryPoints = ["http"]

[entryPoints]
  [entryPoints.http]
  address = ":80"

[web]
address = ":8080"

[consulCatalog]
endpoint = "consul.server.internal:8500"
domain = "internal.example.org"

The important part here is the hostname section of the virtual hosts: Træfɪk does name-based virtual hosts, so it needs to know on which Host headers to match.

When using Træfɪk’s consulCatalog provider, it will look for tags of the form traefik.frontend.rule=Host:db.internal.example.org.

To keep things simple and avoid additional mappings between Kubernetes and Consul, we just use the handy annotations feature of Kubernetes. Take this Kubernetes service as an example:

---
apiVersion: v1
kind: Service
metadata:
  name: my-service
  annotations:
    traefik.io/frontend.rule: "Host:db.internal.example.org"
spec:
  type: NodePort

The publisher code above will take this annotation and turn it into a Consul tag which Treafik can then pick up to create a virtual host:

labels := []string{}
for k, v := range service.Annotations {
	if strings.HasPrefix(k, "traefik.io/") {
		label := strings.Replace(k, "traefik.io/", "traefik.", 1) + "=" + v
		labels = append(labels, label)
	}
}

Wrapping up

We’ve shown how one can easily create a bridge that exposes HTTP services running within Kubernetes to “internal” users. The whole setup took one developer less than a day from inception to production.

At JustWatch we use a lot of open source software, but we also try to give back contributions whenever possible. We’re not ready to release this project yet, but try to do so soon. It still depends on some internal abstractions and incorporates some business logic that doesn’t make sense for external users. But once we sort that out we’re planning to put it on GitHub. In case you have a dire need for this tool please contact us and we’ll bump the priority.

This post was brought to you by Dominik Schulz, Head of Infrastructure, and Matthias Loibl, Junior SRE. If you like to work with Go, Kubernetes and the Google Cloud Platform, we’re currently hiring Data Engineers and SREs.

Stay tuned.

JustWatch We're the team behind JustWatch. We blog about business and tech and we are hiring.