So you want to deploy Kubernetes in your lab environment. You have your cluster stood up, and now its time to deploy your ingress controllers. You get a warning that says “Waiting for Service IP” in your helm deployment. You can go in and modify the values to make the ingress a node port rather than a service port. This is less than ideal for a couple of reason. First and foremost, you are shifting your failure layer to an outside source that you have no control over. If you run keepalived you might be able to get past this hurdle, but you are still not ensiling the true methodology of devops in treating your nodes as cattle rather than pets.
How do we solve this issue? If you are with a cloud provider, you might just throw an Application Load Balancer to expose a service in EKS. Unfortunately this is the state of Kubernetes, its amazing for managing all these self-hosted applications you want to deploy, but unless you are on a cloud provider, you are a second class citizen.
Enter MetalLB, but with some caveats.
Most of the deployments I’ve seen on the internet have used the Layer 2 functionality, but this has some limitations, namely its not dynamic on host changes.
Enter MetalLB with BGP, this lets you have a truly dynamic setup, just like the big boy cloud providers. Unfortunately in my research to get it deployed, I saw lots of guides on getting it to work with Unifi Edge Routers, but nothing really with getting it to work with a Cisco Nexus or an Arista switch.
In this section, I am making some assumptions: You have an existing k8s cluster, you have a basic understanding of routing, and you know how to pull logs and pod statuses.
Deploying MetalLB to your Kubernetes Cluster is relatively simple. In my case I’m using the kubectl method of install with a config map.
In this setup the speakers, which are doing the actual work, pull their config off a config map that is applied to the cluster.
This is the map I’m using
apiVersion: v1 kind: ConfigMap metadata: namespace: metallb-system name: config data: config: | peers: - peer-address: 10.4.4.1 peer-asn: 65101 my-asn: 65101 address-pools: - name: default protocol: bgp addresses: - 10.8.4.0/23
Breaking it down, there are 5 fields that matter, Peer Address, this is the Router ID detailed later, in this case its 10.4.4.1, then the ASNs, it is critical that we use ASNs that are delegated for private use, in this case 65101 will work fine, then the actual definition of the address pool, in my setup I have 10.4.0.0/14 allocated for my site, I allocated a range out of this to make life easier, 10.8.0.0/16 as the super-net and 10.8.4.0/23 as the pool for MetalLB to issue to services and Ingresses
Once you have tweaked the settings to your liking, apply it with your method of choice:
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.11.0/manifests/namespace.yaml kubectl apply -f metallb-configmap.yaml kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.11.0/manifests/metallb.yaml
Breaking down what’s happening here, we are creating the metallb-system namespace, we then are applying the config map, and then deploying the MetalLB deployment
On to setting up the switch…
Switch I'm Using: Cisco Nexus 9000 C9396Px
I’m going to document the Nexus setup here, as in my attempts to get it working, it was the biggest hurdle to getting this deployment to come together. Again I'm making some assumptions, you know the NX-OS command line enough to break down config snippets, you are using your Nexus as your core router, and that you have a basic understanding of routing as a whole.
If you haven’t already, enable the BGP feature using
feature bgp this will enable bgp on the switch
Now create a Loopback interface in my case it will be loopback1 with an address of 10.4.4.1 this is the peer-address we deployed in the config map earlier
interface loopback1 description router_ID ip address 10.4.4.1/32
Now time to configure the actual BGP router on the switch...
router bgp 65101 router-id 10.4.4.1 timers bgp 15 45 log-neighbor-changes address-family ipv4 unicast redistribute direct route-map allow maximum-paths 2 neighbor 10.6.64.0/24 remote-as 65101 update-source loopback1 address-family ipv4 unicast
This is the config I found that gets MetalLB to play ball with the Nexus. Lets break down what's happening. We are declaring a BGP router with an ASN of 65101, with the router-id coming from the address on our loopback interface we made earlier. We then add some declarations to set timeout, logging, and then allowing the router to modify the route table on the switch. We then instantiate the neighbor config, the thing that makes this truly dynamic is we are allocating the block that all my Kubernetes worker nodes live in. This allows me to dynamically change the size of my cluster without the need to update the switch config. The other critical take away here is that out peer ASN is the same as the instantiated ASN, this configuration is called iBGP, meaning the announcements are all taking place in the same ASN. We tell the switch to listen to updates from our loopback interface.
If everything applied correctly, running
sh ip bgp sum you should get an output similar to this:
# sh ip bgp sum BGP summary information for VRF default, address family IPv4 Unicast BGP router identifier 10.4.4.1, local AS number 65101 BGP table version is 55, IPv4 Unicast config peers 10, capable peers 5 18 network entries and 26 paths using 4952 bytes of memory BGP attribute entries [2/312], BGP AS path entries [0/0] BGP community entries [0/0], BGP clusterlist entries [0/0] Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 10.6.64.11 4 65101 4201 4200 55 0 0 17:29:26 2 10.6.64.12 4 65101 4207 4205 55 0 0 17:30:40 3 10.6.64.13 4 65101 4205 4204 55 0 0 17:30:27 2 10.6.64.14 4 65101 4208 4204 55 0 0 17:30:28 2 10.6.64.15 4 65101 4205 4204 55 0 0 17:30:26 2
If instead you are getting any hosts stuck in "IDLE" then you should go back and check for any typos in your configuration.
While BGP "load balancing" is not the same as what the big cloud providers are rolling out, it gets us close enough to enjoy the benefits of Kubernetes in a non-pet fashion. When I deploy MetalLB to Colo, I will update this to include configs for Vyos and Arista.