My Kubernetes Journey Part 4 – Deploying the application

This part of the journey marks over two months of reading my previously linked book, and learning how to setup my own cluster and storage/network components. While I wanted to continue on adding applications I use, like Pihole, Uptime Kuma, and Nextcloud, I thought a first “easy” deployment would be with Librespeed. While I could have run this all in one container, I wanted to explore the micro-service architecture idea and have multiple pods running. Seeing this all finally work was super rewarding!!

Kubernetes concepts covered

While I know I haven’t fully scratched the surface with all the possible Kubernetes components, I tried to take the main concepts that I learned and used them here. I will go over the few that I needed to use. The full Kubernetes MySQL database manifest is posted here, which I reference in sections below.

Services

Remember that container IPs and even containers themselves are stateless. Containers could spin up on one host and get one IP, but it may crash and is recreated on another with a different pod IP space. So, I asked myself, how would my web app have a consistent backend database to send to?

This is where services come into play that will give a consistent IP and DNS name to which other pods can communicate. In my head at least, this operates similar to a load balancer with a front end IP (that stays the same for the duration of the deployment) and a back end which will map to different backend containers. Below is part of the full manifest which I worked through, linked here.

apiVersion: v1
kind: Service
metadata:
  name: mysql
spec:
  ports:
  - port: 3306
  selector:
    app: speedtest-db

This simply tells Kubernetes to create a service named “MySQL” on port 3306, and to choose back ends which have a label of app: speedtest-db. This will make more sense in the Container Definition section.

Config Map

Config maps have many useful instances. They can be used for environmental variables and, in my case, init configuration commands. As a part of the Librespeed package, a MySQL template is published, which I used to create a table within a database to prepare for data from speed tests to be stored.The challenge was then when the MySQL container first deployed, I needed this template to be applied so upon start the database was ready to go. This was accomplished via config maps and an init.sql definition. I’ll only post part of the config map here, as the full file is in the repository linked above:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: mysql-initdb-config
data:
  init.sql: |
    use kptspeedtest;
    SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
    SET AUTOCOMMIT = 0;
    START TRANSACTION;
    SET time_zone = "+00:00";

The only addition from Librespeed was to first select the database “kptspeedtest” The rest is just a copy and paste of their template.

Persistent Volume Claim

In a previous post, I went over my storage setup for persistent storage in Kubernetes. I needed this so when the mysql container is either restarted/moved/deleted etc, the data is still there. The PVC’s job is to request a Persistent Volume for a container from a Storage Class. In my example I already have the SC defined, so I create a PVC for a 20gig storage block:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: speedtest-db-pvc
  annotations:
    volume.beta.kubernetes.io/storage-class: "freenas-nfs-csi"
spec:
  storageClassName: freenas-nfs-csi
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi

Pod Definition

Here is where all the components came together for the pod definition. I’ll step through this as it is a longer manifest:

apiVersion: v1
kind: Pod
metadata:
  name: speedtest-db
  labels:
    app: speedtest-db

Here, I made a pod named “speedtest-db” and applied a label of app: speedtest-db. Remember from the service definition I used the same name? This is how the service knows how to target this container.

spec:
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: speedtest-db-pvc
  - name: mysql-initdb
    configMap:
      name: mysql-initdb-config

Next under spec.volumes, I first associated the PVC. This references the PVC name above. Then I applied the configMap using the name of the config map here:

 containers:
  - name: speedtest-db
    image: docker.io/mysql:latest
    ports:
    - containerPort: 3306
    env:
    - name: MYSQL_DATABASE
      value: "kptspeedtest"
    - name: MYSQL_USER
      value: "speedtest"
    - name: MYSQL_PASSWORD
      value: "speedtest"
    - name: MYSQL_ROOT_PASSWORD
      value: "speedtest"
    volumeMounts:
      - mountPath: /var/lib/mysql
        name: data
      - name: mysql-initdb
        mountPath: /docker-entrypoint-initdb.d

Then, I gave the definition of the image/ports/environment variables and volumeMounts here. Note, the environment variables you would most likely use are Config Secrets/Config Maps for a more secure/traditional setup.

The volumeMounts are what I used to mount the PV under /var/lib/mysql using the data label, and then provide the initdb config map that was created earlier to prepare the database.

Speedtest Web Server Deployment

Again, the full manifest is linked here. This example is a Deployment, which is to control the lifecycle/scaling/down scaling of a pod. Technically it’s not needed, but I was throwing in some concepts I had previously learned.

Load Balancer Service

Just like I needed a consistent IP to reach the back end MySQL, I also need a way for consistent and externally accessible entrance into the pods.

apiVersion: v1
kind: Service
metadata:
  name: speedtest-lb
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: speedtest-app
  externalIPs:
  - 192.168.66.251
---

This creates a service of type LoadBalancer, on port 80, and using a defined external IP. This IP then is advertised by Kube Router via BGP to my network to allow routing to the pod. I manually specified this for now, but I hope to add logic in the future to tie into my ipam system and provide next available IP settings.

I will not go over the deployment part of the file, as those concepts I am still testing and learning about.

Onto the container I simply defined the image, a label, and expose it via port 80:

template:
    metadata:
      labels:
        app: speedtest-app
    spec:
      containers:
      - name: speedtest-app
        image: git.internal.keepingpacetech.com/kweevuss/speedtest:latest
        ports:
        - containerPort: 80

Deployment

Now it was time to finally give this a go!

Creating the speedtest-db container:

kubectl apply -f speedtest-db-storage-pod.yaml 

persistentvolumeclaim/speedtest-db-pvc created
pod/speedtest-db created
configmap/mysql-initdb-config created
service/mysql created

Several components were created: a PVC, the pod itself, a config map, and the mysql service. I verified with some show commands:

kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE

speedtest-db-pvc Bound pvc-54245a26-9fbe-4a8f-952e-fcdd6a25488b 20Gi RWO freenas-nfs-csi 63s
kubectl get pv | grep Bound

pvc-54245a26-9fbe-4a8f-952e-fcdd6a25488b   20Gi       RWO            Retain           Bound      default/speedtest-db-pvc   freenas-nfs-csi   <unset>                          90s
kubectl get svc
NAME         TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)    AGE
mysql        ClusterIP   192.168.100.145   <none>        3306/TCP   115s

Above, I saw what I expected to create. The important piece is that the MySQL service with my instance of a cluster IP of 192.168.100.145

To view the container progress, I ran kubectl get pods to see the container start. One thing I have learned is the init config can take some time to process, which you can see through the logs with kubectl describe or running kubectl logs.

kubectl describe pod speedtest-db

Events:
  Type     Reason                  Age                    From                     Message
  ----     ------                  ----                   ----                     -------
  Warning  FailedScheduling        2m32s (x2 over 2m42s)  default-scheduler        0/4 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
  Normal   Scheduled               2m30s                  default-scheduler        Successfully assigned default/speedtest-db to prdkptkubeworker04
  Normal   SuccessfulAttachVolume  2m29s                  attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-54245a26-9fbe-4a8f-952e-fcdd6a25488b"
  Normal   Pulling                 2m17s                  kubelet                  Pulling image "docker.io/mysql:latest"
  Normal   Pulled                  2m16s                  kubelet                  Successfully pulled image "docker.io/mysql:latest" in 780ms (780ms including waiting). Image size: 601728779 bytes.
  Normal   Created                 2m15s                  kubelet                  Created container speedtest-db
  Normal   Started                 2m15s                  kubelet                  Started container speedtest-db

Through viewing the logs, you can see the different stages of the service. First it will start, but it will eventually stop and run the init commands we passed in via the configMap.

2024-11-04 00:14:47+00:00 [Note] [Entrypoint]: Creating database kptspeedtest
2024-11-04 00:14:47+00:00 [Note] [Entrypoint]: Creating user speedtest
2024-11-04 00:14:47+00:00 [Note] [Entrypoint]: Giving user speedtest access to schema kptspeedtest

2024-11-04 00:14:47+00:00 [Note] [Entrypoint]: /usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/init.sql

Finally:

2024-11-04 00:14:51+00:00 [Note] [Entrypoint]: MySQL init process done. Ready for start up.

2024-11-04T00:14:58.424295Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '9.1.0'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server - GPL.

I needed to verify that the service is bound to this container:

kubectl describe svc mysql
Name:                     mysql
Namespace:                default
Labels:                   <none>
Annotations:              <none>
Selector:                 app=speedtest-db
Type:                     ClusterIP
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       192.168.100.145
IPs:                      192.168.100.145
Port:                     <unset>  3306/TCP
TargetPort:               3306/TCP
Endpoints:                10.200.1.34:3306
Session Affinity:         None
Internal Traffic Policy:  Cluster
Events:                   <none>

Seeing the “Endpoints” filled in, with the IP of this container is a good sign and traffic entering the DNS name of “MySQL” was be sent to the back end endpoint.

I created the web server container:

kubectl apply -f speedtest-deploy.yaml 
service/speedtest-lb created
deployment.apps/speedtest-deploy created

It was a little easier with two components, the load balancer and the pod, created.

Now I could go to the load balancers external IP and see it was working!

From my router, I saw the external IP was advertised via BGP. But, since my router does not support ECMP in overlay VPRN services, only one is active :(, otherwise it could load balance to any of the three worker nodes’ load balancer services.

*A:KPTPE01# show router 101 bgp routes 192.168.66.251/32 
===============================================================================
 BGP Router ID:10.11.0.2        AS:64601       Local AS:64601      
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP IPv4 Routes
===============================================================================
Flag  Network                                            LocalPref   MED
      Nexthop (Router)                                   Path-Id     Label
      As-Path                                                        
-------------------------------------------------------------------------------
u*>i  192.168.66.251/32                                  None        None
      192.168.66.171                                     None        -
      65170                                                           
*i    192.168.66.251/32                                  None        None
      192.168.66.172                                     None        -
      65170                                                           
*i    192.168.66.251/32                                  None        None
      192.168.66.173                                     None        -
      65170                                                           

Exploring the database connection and data

Then, I attached to the Kubernetes pod web server to try to connect to the database, and watched as results were loaded. First, I attach to the pod directly:

kubectl exec -it "pod_name" -- /bin/bash

My pod was named speedtest-deploy-6bcbdfc5b7-5b8l5

From the docker image build, I installed mysql-client to try to connect to the database.

mysql -u speedtest -pspeedtest -h mysql


Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> 

I was in! I simply connected via the login details that were passed in the environment variables of the speedtest database, and used the service name of “mysql” I was able to connect. Just like the web server’s config!

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| kptspeedtest       |
| performance_schema |
+--------------------+
3 rows in set (0.04 sec)

mysql> use kptspeedtest;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+------------------------+
| Tables_in_kptspeedtest |
+------------------------+
| speedtest_users        |
+------------------------+

mysql> select * from speedtest_users;
Empty set (0.00 sec)

At this point, I saw the database named “kptspeedtest” with the table created from the MySQL template from Librespeed. Since I had not run any tests yet, there was no data.

After running a speed test, the results are displayed on screen. The idea from the application is you could copy the results URL and send to someone else to view in a browser themselves. When I did the same query I saw data in the database!

mysql> select * from speedtest_users;
+----+---------------------+----------------+------------------------------------------------------------------------+-------+----------------------------------------------------------------------------------+----------------+---------+---------+------+--------+------+
| id | timestamp           | ip             | ispinfo                                                                | extra | ua                                                                               | lang           | dl      | ul      | ping | jitter | log  |
+----+---------------------+----------------+------------------------------------------------------------------------+-------+----------------------------------------------------------------------------------+----------------+---------+---------+------+--------+------+
|  1 | 2024-11-04 00:26:38 | 192.168.66.171 | {"processedString":"10.200.1.1 - private IPv4 access","rawIspInfo":""} |       | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:132.0) Gecko/20100101 Firefox/132.0 | en-US,en;q=0.5 | 1198.82 | 1976.27 | 2.00 | 0.88   |      |
+----+---------------------+----------------+------------------------------------------------------------------------+-------+----------------------------------------------------------------------------------+----------------+---------+---------+------+--------+------+

I know it will be hard to read, but you can see id=1, and the client information along with the upload/download/jitter/ping etc.

The first time seeing this work felt like a great accomplishment to get this far. Like I have said through this journey, I know there are enhancements to be made, like Config Secrets. My latest idea of using Init containers to check that the speedtest-db pod is started and the init commands have all run successfully before the web server pod is started.

I hope if you stumbled upon this, you found it useful and that it gave you hope to build your own cluster!

My Kubernetes Journey Part 3- Building Docker Containers

For many years, I have run several containers within my Homelab, such as Uptime Kuma, Librespeed, and Pihole. One thing that always intrigued me was building my own container. Now that I was attempting my first Kubernetes deployment, I thought this would be a good time to understand how to do the process from the beginning.

First, I installed the docker client for my OS. For the desktop OS’s, these are branded Docker Desktop.

Container Registry

Before you create a container image, you must decide where to store it. This is because docker/kubernetes expect images to come from a container registry. For my learning and own development, I use Gitea self-hosted.

There were not really any prerequisites, other than it seemed to be the Gitea instance needed to have https enabled to execute all the docker commands. Myself being lazy and using it locally anyways, I had mine only on HTTP, but this finally got me to have Gitea behind my HAProxy instance to have a valid SSL frontend. Docs for Gitea and container registry are here, and once below the image is built I will show the instructions to push it to the registry.

Docker file

The docker file is a list of commands/instructions to build the Docker image. It is executed top down, and builds layers to make a full image. Of course, Docker themselves can explain it best.

Fair warning: my example I am sure this is not efficient, nor lightweight but I wanted to try to use something I was at most familiar with to get started. I’m interested as time goes on and my learning grows to better shape these with smaller images, and more efficient commands.

My Dockfile is linked on my personal github. I will step through the important parts:

First I choose to use Ubuntu 22.04 as the base as it is a distribution I am most familiar with.

FROM ubuntu:22.04

Next, I refreshed package repositories and then installed the various packages needed for Librespeed to work with a mysql backend. The DEBIAN_FRONTEND=noninteractive instructions php to install non interactivity, as it will ask for a timezone on default install and not let the command ever finish:

RUN apt-get -y update
RUN DEBIAN_FRONTEND=noninteractive apt-get -y install tzdata apache2 curl php php-image-text php-gd git mysql-client inetutils-ping php-mysql

Then from the librespeed package maintainers, there were several configuration changes to be made to PHP, which cloned the package locally:

#Set post max size to 0
RUN sed -i 's/post_max_size = 8M/post_max_size = 0/' /etc/php/8.1/apache2/php.ini 

#Enable Extensions
RUN sed -i 's/;extension=gd/extension=gd/' /etc/php/8.1/apache2/php.ini

RUN sed -i 's/;extension=pdo_mysql/extension=pdo_mysql/' /etc/php/8.1/apache2/php.ini

#Prep directory for web root
RUN rm -rf /var/www/html/index.html 

#Clone
RUN git clone https://github.com/librespeed/speedtest.git /var/www/html

Next, as I wanted to save results to permanent storage, I changed the telemetry_settings.php file with the various db username/password, db name:

RUN sed -i 's/USERNAME/speedtest/' /var/www/html/results/telemetry_settings.php

RUN sed -i 's/PASSWORD/speedtest/' /var/www/html/results/telemetry_settings.php

RUN sed -i 's/DB_HOSTNAME/mysql/' /var/www/html/results/telemetry_settings.php

RUN sed -i 's/DB_NAME/kptspeedtest/' /var/www/html/results/
telemetry_settings.php

If not clear, my values are: username = speedtest, password = speedtest = dbhostname = mysql (important note here: this is the Kubernetes service name I create later!) and db_name = kptspeedtest.

Skipping some of the basic cleanup/permission settings which you can see in the file, in the end I instructed the container to list on port 80, and to run Apache server as the foreground so the container does not just stop after it is started. There is also a health check for the container to simply see if the Apache service is running.

EXPOSE 80
CMD ["apachectl", "-D", "FOREGROUND"]
HEALTHCHECK --timeout=3s \
  CMD curl -f http://localhost/ || exit 1

Building the image

With the file saved as “Dockerfile” I could build the image. First, I logged in to the container registry (in my case my Gitea instance):

docker login git."your_domain_name".com

Next, it would normally be as simple as running docker build -t and name of the repo, and image such as:

# build an image with tag
docker build -t {registry}/{owner}/{image}:{tag} .

I ran into some errors with the next step of uploading, and searching around found that adding the flag “–provenance=false” does not include certain metadata about the build and makes it compatible with Gitea. Depending on the container registry, it may or may not be needed. Full example:

docker build -t git."domain_name"/kweevuss/speedtest:latest . --provenance=false

Since the file is named Dockerfile, docker build tried to find that file in the directory. My user is “kweevuss” and this image is being called speedtest and using the “latest” tag.

Then I pushed the image to the registry:

docker push git.internal.keepingpacetech.com/kweevuss/speedtest:latest

Within Gitea then, I could see the image related to my user:

Now on any docker install, I simply pulled the image as:

docker pull git.internal.keepingpacetech.com/kweevuss/speedtest:latest

Or, instead, in a kubernetes deployment/pod manifest simply point to the image:

spec:
  containers:
  - name: prdkptspeedtestkube
    image: git."domain_name"/kweevuss/speedtest:latest

Next installment will finally be setting up this custom image along with another container, mysql, to store the data!

My Kubernetes Journey – Part 2 – Storage Setup

In this installment, I will be going over how I set up my external storage to Kubernetes for persistent storage. As with any container platform, the containers are meant to be stateless and do not store data inside the container. So if there is persistent data needed, it needs to be stored externally.

As mentioned in my first post, my book of reference to learn Kubernetes did touch on storage, but does limit it to cloud providers. Which wasn’t as helpful for me, because, of course, in the spirit of Homelab we want to self-host! Luckily there are some great projects out there that we can utilize!

I use TrueNAS core for my bulk data and VM-based iSCSI storage. I was pleasantly surprised to find that TrueNAS does support Container Storage Interface (CSI), so I cover this method below.

Installing CSI plugin

I found this great project, democratic-csi. The readme has great steps and examples using TrueNAS, so I will not duplicate here for the sake of simply re-writing the documentation. I am personally using NFS for storage as it seemed more straight forward and with iSCSI it for my back end of all my VM storage from Proxmox. I would rather not modify that config extensively to risk that setup for such an import piece of the lab.

First, I configured TrueNAS with the necessary SSH/API and NFS settings and ran the helm install:

helm upgrade   --install   --create-namespace   --values freenas-nfs.yaml   --namespace democratic-csi   zfs-nfs democratic-csi/democratic-csi

My example freenas-nfs.yaml file is below:

csiDriver:
  name: "org.democratic-csi.nfs"

storageClasses:
- name: freenas-nfs-csi
  defaultClass: false
  reclaimPolicy: Retain
  volumeBindingMode: Immediate
  allowVolumeExpansion: true
  parameters:
    fsType: nfs
      
  mountOptions:
  - noatime
  - nfsvers=4
  secrets:
    provisioner-secret:
    controller-publish-secret:
    node-stage-secret:
    node-publish-secret:
    controller-expand-secret:

driver:
  config:
    driver: freenas-nfs
    instance_id:
    httpConnection:
      protocol: http
      host: 192.168.19.3
      port: 80
      # This is the API key that we generated previously
      apiKey: 1-KEY HERE
      username: k8stg
      allowInsecure: true
      apiVersion: 2
    sshConnection:
      host: 192.168.19.3
      port: 22
      username: root
      # This is the SSH key that we generated for passwordless authentication
      privateKey: |
        -----BEGIN OPENSSH PRIVATE KEY-----
        KEY HERE
        -----END OPENSSH PRIVATE KEY-----
    zfs:
      # Make sure to use the storage pool that was created previously
      datasetParentName: ZFS_POOL/k8-hdd-storage/vols
      detachedSnapshotsDatasetParentName: ZFS_POOL/k8-hdd-storage/snaps
      datasetEnableQuotas: true
      datasetEnableReservation: false
      datasetPermissionsMode: "0777"
      datasetPermissionsUser: root
      datasetPermissionsGroup: wheel
    nfs:
      shareHost: 192.168.19.3
      shareAlldirs: false
      shareAllowedHosts: []
      shareAllowedNetworks: []
      shareMaprootUser: root
      shareMaprootGroup: wheel
      shareMapallUser: ""
      shareMapallGroup: ""

The above file, I felt, is very well-documented from the package maintainers. I had to input API/SSH keys, and the import piece is the dataset information:

datasetParentName: ZFS_POOL/k8-hdd-storage/vols AND
detachedSnapshotsDatasetParentName: ZFS_POOL/k8-hdd-storage/snaps

This was dependent on what I had setup in TrueNAS. My pool that looks like this:

Testing Storage

With the helm command run, I saw a “Storage Class” created in Kubernetes:

The name comes from the yaml file above. The Kubernetes Storage class is the foundation, and its job is to help automate the storage setup from containers that request storage. These all have specific names like Persistent Volume Claims and Persistent Volumes, which I will get to. The Storage class essentially uses a CSI plugin (which is API) to talk to external storage systems to provision storage automatically. This way Kubernetes has a consistent way to create storage no matter the platform. It could be Azure/AWS/TrueNAS etc.

Now, to get to the interesting part: we can first create a “Persistent Volume Claim” (PVC). The PVC’s job is to organize requests to create new Persistent Volumes on Storage classes. This will hopefully help with an example:

cat create_pvc.yaml 
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-pvc-test
  annotations:
    volume.beta.kubernetes.io/storage-class: "freenas-nfs-csi"
spec:
  storageClassName: freenas-nfs-test
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Mi

The above yaml file is essentially asking for 500Mb of storage from the storage class named “freenas-nfs-test”

I applied this with the normal “kubectl apply -f create_pvc.yaml”

With this applied, I saw it created, and once completed it was in a bound state:

Now to use this:

cat test-pv.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: volpod
spec:
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: nfs-pvc-test
  containers:
  - name: ubuntu-ctr
    image: ubuntu:latest
    command:
    - /bin/bash
    - "-c"
    - "sleep 60m"
    volumeMounts:
    - mountPath: /data
      name: data

In this pod, I referenced to create a volume with spec.volumes using the PVC name I created in the last step. Then under spec.containers.volumeMounts I give a mount directory inside the container to this directory.

In TrueNAS, at this point, I saw a volume created in the dataset that matches the volume ID displayed in Kubernetes

Attach to the pod:

kubectl exec -it volpod   -- bash

Inside the container now, I navigated to the /data directory and created a test file:

root@volpod:/# cd /data/
root@volpod:/data# ls
root@volpod:/data# touch test.txt
root@volpod:/data# ls -l
total 1
-rw-r--r-- 1 root root 0 Nov  3 04:11 test.txt

Just to see this work in action, now SSHing into the TrueNAS server and browsing to the dataset, we can see the file!

freenas# pwd
/mnt/ZFS_POOL/k8-hdd-storage/vols/pvc-dede93ea-d0cf-4bd7-8500-d052ce336c39
freenas# ls -l
total 1
-rw-r--r--  1 root  wheel  0 Nov  3 00:11 test.txt

In the next installment, I will go over how I created my container images. My initial idea was to use Librespeed and a second container with mysql for results persistent storage. So having the above completed gives a starting point for any persistent data storage needs!

My Kubernetes Journey – Part 1 – Cluster Setup

In this first part, my goal is to piece together the bits and pieces of documentation I found for the cluster setup and networking.

A must-read is the official documentation. I thought it was great, but be prepared for a lot reading. It’s not a one-command install by any means. I’m sure there are a lot of sources on how to automate this turn up – which is what all the big cloud providers do for you – but I wasn’t interested in that, so that won’t be covered in these Kubernetes posts.

I ran into two confusing topics on my first install of Kubernetes: the Container Runtime Environment (CRE) and the network plugin install. I will mostly cover the network plugin install below in case it helps others.

First I started out using my automated VM builds (you can find that post here) to build four Ubuntu 22.04 VMs: one controller and three worker nodes.

As you’ll see if you dive into the prerequisites for kubeadm, you have to install a container runtime. I will blame the first failure I had when I tried containerd on just not knowing what I was doing, but on my second attempt I tried Docker Engine, and this did work with success. With the instructions from Kubernetes, I was able to follow without issue.

Once the Kubernetes instructions were followed for the container runtime, the Kubernetes packages could be installed on the control node:

sudo apt-get install -y kubelet kubeadm kubectl

Now it’s time to bootstrap the cluster. If you’re using this post as a step-by-step guide, I would suggest coming back once the install and cluster is up to read up more on what kube-init does as it is intriguing.



kubeadm init --pod-network-cidr 10.200.0.0/16 --cri-socket unix:///var/run/cri-dockerd.sock 

--service-cidr 192.168.100.0/22

Dissecting what is above:

  • –pod-network-cidr – this is an unused CIDR range that I have available. By default this is not exposed outside the worker nodes, but it can be. Kubernetes will assume a /24 per worker node out of this space. It is something I would like to investigating changing – but I ran into complications and instead of debugging, I just accepted using a bigger space for growth.
  • –cri-socket – this is to instruct the setup process to use docker engine. My understanding is that Kubernetes now defaults to containerd, and if you use that CRI, this is not needed.
  • –service-cidr – I, again, decided to pick a dedicated range as the network plugin I used can announce these via BGP, and I wanted a range that was free on my network. I cover the networking piece more below.

At the end of this init process, it gave me a kubeadm join command, which is a token and other info to be able to join from the worker nodes to the controller.

kubeadm join 192.168.66.165:6443 --token <token>         --discovery-token-ca-cert-hash <cert>--cri-socket unix:///var/run/cri-dockerd

At this point, running kubectl get pods showed the worker nodes, but none were ready until there was a network plugin running and configured.

Network Plugin

I tried several network plugin projects, and ended up landing on Kube-router. This really seemed to give my end goal of being able to advertise different services or pods via BGP into my network.

I used this example yaml file from the project page and only had to make slight modifications to define the router to peer to.

For the container “kube-router” in the spec.args section, I defined the peer router ips, and ASN information. For example:

 containers:
      - name: kube-router
        image: docker.io/cloudnativelabs/kube-router
        imagePullPolicy: Always
        args:
        - --run-router=true
        - --run-firewall=true
        - --run-service-proxy=true
        - --bgp-graceful-restart=true
        - --kubeconfig=/var/lib/kube-router/kubeconfig
        - --advertise-cluster-ip=true
        - --advertise-external-ip=true
        - --cluster-asn=65170
        - --peer-router-ips=192.168.66.129
        - --peer-router-asns=64601

I made sure to adjust these settings to fit my environment. You can decide if you want cluster IPs and external IPs advertised. I did enable those but with more understanding, I only envision needing external IPs for load balancer services for example to be advertised.

I ran the deployment with:

kubectl apply -f "path_to_yaml_file"

After this, I saw several containers being made:

I saw the above containers will all be running, and when I looked at the route table I saw installed routes to the pod network cidrs to each host:

ip route
default via 192.168.66.129 dev eth0 proto static 
10.200.1.0/24 via 192.168.66.171 dev eth0 proto 17 
10.200.2.0/24 via 192.168.66.172 dev eth0 proto 17 
10.200.3.0/24 via 192.168.66.173 dev eth0 proto 17 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
192.168.66.128/25 dev eth0 proto kernel scope link src 192.168.66.170

Check out the kube-routers documentation for more info, but essentially the kube router containers formed BGP peers by listening to worker nodes join the pool, and creating routes so pods on different worker nodes can communicate.

I also noticed coredns containers should start, and “kubectl get nodes” showed all the nodes in a ready state:

kubectl get nodes
NAME                  STATUS   ROLES           AGE   VERSION
prdkptkubecontrol02   Ready    control-plane   16d   v1.31.1
prdkptkubeworker04    Ready    <none>          16d   v1.31.1
prdkptkubeworker05    Ready    <none>          16d   v1.31.1
prdkptkubeworker06    Ready    <none>          16d   v1.31.1

At this point I had a working Kubernetes cluster!

My Kubernetes Journey

In this post, I want to document my learning from knowing nothing to deploying a multi-node Kubernetes cluster that is currently running a Librespeed test server with a mysql back end.

A recent product release from Nokia, Event Driven Automation (EDA), which is based around many concepts of Kubernetes for network automation, piqued my interest in exploring Kubernetes itself. As the tool does not require any Kubernetes knowledge (although the software does run on a K8s cluster), I wondered if it was beneficial. In short, my purpose in learning about Kubernetes was to understand the design physiology and how to support the deployment of EDA.

Disclaimer: I am a network engineer, not a dev ops master so I only slightly know what I’m doing in this space and I know I have a lot to learn to be more efficient, but maybe others just entering into the space can find some ideas in my journey.

I would not be where I am today without The Kubernetes Book: 2024 Edition by Nigel Poulton. I thought it was a very great introduction and explanation of the concepts. The only thing it didn’t cover that I wanted was setting up a Kubernetes cluster from scratch, but the Kubernetes official documentation helped where it lacked.

The following parts are listed below, where I try to go from 0 to a deployed Kubernetes cluster, all self-hosted in my homelab!

Part 1

Part 2

Part 3

Part 4

Networking VMs with Proxmox SDN Features

In my last post, I went over the configuration of Proxmox, Vyos, and SROS. Here I want to show over how the setup looks and works with real VMs attached to this.

Above I have tried to show the topology. The VM in the bottom left is running on compute2 (Proxmox hypervisor), and attached to the VNET “Vlan31” which has the IP of 172.16.0.69. This VM’s gateway exists on the pair of Vyos instances, which tie the VXLAN tunnel into a VRF. The other VM, 172.16.1.69, is attached to a normal Proxmox bridge/vlan that is routed on the 7210 within the VPRN service. So these VMs are just to show connectivity between these systems.

With the Vnet configured in Proxmox, it is possible to assign that to a VM’s interface. Which simply looks like this below:

The other VM, I’m simply using a traditional linux bridge, and a vlan.

The VM above is the 172.16.0.69 VM, and showing connectivity to the other VM on 1.69. Let’s dive in how it works.

First I will lay out the IPs and macs for reference.

172.16.0.69 (testclone15)BC:24:11:F9:4D:AE
172.16.1.69 (testclone16)BC:24:11:9E:9A:41

First we can look at the EVPN type 2 route (mac address) for the first host.

A:KPTPE01# show router bgp routes evpn mac mac-address BC:24:11:F9:4D:AE 
===============================================================================
 BGP Router ID:10.11.0.2        AS:65000       Local AS:65000      
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP EVPN MAC Routes
===============================================================================
Flag  Route Dist.         MacAddr           ESI
      Tag                 Mac Mobility      Ip Address
                                            NextHop
                                            Label1
-------------------------------------------------------------------------------
*>i   192.168.254.18:2    bc:24:11:f9:4d:ae ESI-0
      0                   Seq:0             N/A
                                            192.168.254.18
                                            VNI 31

And we see it as we expect on the SROS router. We have the mac address, the VNI it’s attached to and the next hop which will be the interface on compute2 (Proxmox hypervisor).

vyos@devkptvyos02:~$ show evpn mac vni all

VNI 31 #MACs (local and remote) 4

Flags: N=sync-neighs, I=local-inactive, P=peer-active, X=peer-proxy
MAC               Type   Flags Intf/Remote ES/VTEP            VLAN  Seq #'s
d0:99:d5:5a:c8:ec remote       192.168.254.137                      0/0
ea:89:3c:f0:86:82 local        br1                            1     0/0
86:a8:39:b4:ec:43 remote       192.168.254.137                      0/0
bc:24:11:f9:4d:ae remote       192.168.254.18                       0/0

Then on Vyos, we can see the mac address of the VM is learned. This is because SROS is sending the EVPN route to the Vyos instances, and it installs it in it’s table. The gateway which is 172.16.0.1 that exists within our vrf called “test-vprn” has a mac of ea:89:3c:f0:86:82

Now the packet will route to the gateway, and Vyos will look in it’s route table on how to get to 172.16.1.0/24.

vyos@devkptvyos02:~$ show ip route vrf test-vprn 
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF test-vprn:
C>* 172.16.0.0/24 is directly connected, br1, 6d12h41m
B>  172.16.1.0/24 [200/0] via 10.11.0.2 (vrf default) (recursive), label 131056, weight 1, 01:48:05
  *                         via 10.0.0.8, eth1.34 (vrf default), label 131071/131056, weight 1, 01:48:05

Now Vyos has a route using a MPLS transport and service label to the SROS router.

Lastly on the Nokia router, it will simply deliver it over vlan 31.

A:KPTPE01# show router 231 route-table 

===============================================================================
Route Table (Service: 231)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                    Metric   
-------------------------------------------------------------------------------
172.16.0.0/24                                 Remote  BGP VPN   01h54m38s  170
       10.11.0.140 (tunneled)                                       0
172.16.1.0/24                                 Local   Local     11d08h37m  0
       vlan31                                                       0
-------------------------------------------------------------------------------
No. of Routes: 3
Flags: n = Number of times nexthop is repeated
       B = BGP backup route available
       L = LFA nexthop available
       S = Sticky ECMP requested
===============================================================================
A:KPTPE01# 

At this point, we can see some of what the traffic looks like on the network. I am capturing from “compute2” for anything with the port 4789 for VXLAN.

tcpdump -i any port 4789

13:43:29.824872 enp4s0 Out IP 192.168.254.18.53966 > 192.168.254.140.4789: VXLAN, flags I, vni 31
IP 172.16.0.69 > 172.16.1.69: ICMP echo request, id 3, seq 9, length 64 <– Inner

13:43:30.827019 enp4s0 In IP 192.168.254.140.52047 > 192.168.254.18.4789: VXLAN, flags I, vni 31
IP 172.16.1.69 > 172.16.0.69: ICMP echo reply, id 3, seq 10, length 64 <– Inner

So we can see the VM 172.16.0.69 sending a ping request to 172.16.1.69. That packet is encapsulated in VXLAN, and send from the Proxmox IP to Vyos. We also see the return, also encapsulated in VXLAN.

So of course, doing this all with two hosts in a single network isn’t all that useful per say. But imagine having many hosts, and even those hosts not being in the same area, that can support being on the same L2 network and mobility. Although stretching L2 across WANs is not always the best idea, but I’ll leave that up to your discretion 🙂

Exploring Proxmox SDN with EVPN and VXLAN

With Proxmox 8, and production support of software defined networking, I started to take a harder look at what is possible.

From my understanding, this feature is built on FRR. In this blog, I am looking to take an unconventional approach at using this. From the Proxmox documentation, most of what this feature seems to be used for is networking nodes together which could be over a WAN etc. There are limited options that are available in the GUI, which of course is all you need for this feature to work. I wanted to see what was happening under the hood, and expand on this by having networking features outside of just the hosts too.

Ideally, my setup would consist of VXLAN tunnels terminating on a router, where I have a layer 3 gateway within a VRF. Problem is, while the 7210 supports EVPN, it only supports transport over EVPN-MPLS. Most every opensource project that I have found only supports VXLAN. So my idea was then to use Vyos in the middle, which can terminate the VXLAN tunnels from the Proxmox nodes, and then route from a VRF, to another VRF on the 7210 via MPLS.

Topology

The first picture I’m depicting the control plane of this setup. I am using a Nokia 7210 as a route reflector with several different address families. EVPN for the Proxmox nodes, and EVPN/VPN-IPv4/v6 towards Vyos. More on VPN-IPv6 though, as sadly ran into some unresolved issues.

The second pictures shows the transport tunnels. Simply, VXLAN from Proxmox to Vyos ( It would be full mesh, I neglected to draw it here) and LDP tunnels to the 7210.

Proxmox Configuration

There are some prerequisites that are all covered in the Proxmox documentation in chapter 12. All of these settings are under Datacenter >SDN. First let’s setup a “controller” which is really just defining an external router which will run the EVPN protocol. Under options, add a EVPN controller.

Here simply give it a name, ASN, and peer IP. Then on my router config, I defined a peer group and a cluster id, which means this peer group will act as a route reflector:

A:KPTPE01>config>router>bgp# info 
----------------------------------------------

            group "iBGP-RR"
                family ipv4 vpn-ipv4 evpn
                cluster 10.11.0.2
                peer-as 65000
                advertise-inactive
                bfd-enable
                neighbor 192.168.254.13
                    description "compute3"
                exit
                neighbor 192.168.254.18
                    description "compute2"
                exit
            exit

If all goes well, then the peer should come up:

192.168.254.13
compute3
                65000  138929    0 04d19h46m 1/0/35 (Evpn)
                       139268    0           
192.168.254.18
compute2
                65000  138099    0 04d19h04m 1/0/35 (Evpn)
                       138432    0           

Next, need to define a zone. Proxmox gives this as a definition of “A zone defines a virtually separated network. Zones are restricted to specific nodes and assigned permissions, in order to restrict users to a certain zone and its contained VNets.” In my case, I’m using this to define the vxlan tunnel, and it’s endpoints, which end up being my two vyos instances.

Last thing is to create a vnet. The vnet will be what VMs are actually attached to, and to create a broadcast domain. This config is simple, basically tying together a zone and give it a name a tag, which the tag ends up being the VNI.

Exploring the details

Looking on the Nokia, we can look at what the include multicast routes. As a quick overview, inclusive multicast routes are what are advertised between routers to announce the service. If two peers have RTs and a VNI match, they can build a VXLAN tunnel and exchange data.

A:KPTPE01# show router bgp routes evpn inclusive-mcast rd 192.168.254.18:2 hunt 
===============================================================================
 BGP Router ID:10.11.0.2        AS:65000       Local AS:65000      
===============================================================================
 Legend -
 Status codes  : u - used, s - suppressed, h - history, d - decayed, * - valid
                 l - leaked, x - stale, > - best, b - backup, p - purge
 Origin codes  : i - IGP, e - EGP, ? - incomplete

===============================================================================
BGP EVPN Inclusive-Mcast Routes
===============================================================================
-------------------------------------------------------------------------------
RIB In Entries
-------------------------------------------------------------------------------
Network        : N/A
Nexthop        : 192.168.254.18
From           : 192.168.254.18
Res. Nexthop   : 192.168.254.18
Local Pref.    : 100                    Interface Name : vlan1
Aggregator AS  : None                   Aggregator     : None
Atomic Aggr.   : Not Atomic             MED            : None
AIGP Metric    : None                   
Connector      : None
Community      : bgp-tunnel-encap:VXLAN target:65000:31
Cluster        : No Cluster Members
Originator Id  : None                   Peer Router Id : 192.168.254.18
Flags          : Valid  Best  IGP  
Route Source   : Internal
AS-Path        : No As-Path
EVPN type      : INCL-MCAST             
ESI            : N/A
Tag            : 0                      
Originator IP  : 192.168.254.18         Route Dist.    : 192.168.254.18:2
Route Tag      : 0                      
Neighbor-AS    : N/A
Orig Validation: N/A                    
Add Paths Send : Default                
Last Modified  : 04d19h19m              
-------------------------------------------------------------------------------
PMSI Tunnel Attribute : 
Tunnel-type    : Ingress Replication    Flags          : Leaf not required
MPLS Label     : VNI 31                 
Tunnel-Endpoint: 192.168.254.18         

First we can see from the communities, the encapsulation is VXLAN, and the route target. Also then the VNI, which is the tag we picked as in “31” in this case. If this router supported vxlan, we would be good to go configuring a VPLS service with VXLAN encapsulation. But some more here, and where would the fun be then?

Vyos Configuration

I will keep expanding on this, but for now the relevant configuration for the VXLAN side of this communication.

#This is the VXLAN termination interface in the same L2 as the hypervisors
    ethernet eth1 {
        address 192.168.254.137/24
        hw-id bc:24:11:51:9c:48
        mtu 9000
#Routed interface towards 7210
        vif 32 {
            address 10.0.0.7/31
            mtu 9000
        }
    }
    loopback lo {
        address 10.11.0.137/32
    }
    vxlan vxlan31 {
        ip {
            enable-directed-broadcast
        }
        mtu 9000
        parameters {
            nolearning
        }
        port 4789
        source-address 192.168.8.167
        vni 31
    }
}

protocols {
    bgp {
        address-family {
            l2vpn-evpn {
                advertise-all-vni
                advertise-svi-ip
                vni 31 {
                    route-target {
                        both 65000:31
                    }
                }
            }
        }
#7210 RR client
        neighbor 10.11.0.2 {
            address-family {
                ipv4-vpn {
                }
                l2vpn-evpn {
                }
            }
            peer-group iBGP-RR-PE
        }
      #7210 RR Client
        neighbor fc00::2 {
            address-family {
                ipv6-vpn {
                }
            }
            remote-as 65000
        }
        parameters {
            router-id 10.11.0.137
        }
        peer-group iBGP-RR-PE {
            remote-as 65000
        }
        system-as 65000
    }
#IGP towards 7210 and enable LDP
    isis {
        interface eth1.32 {
            network {
                point-to-point
            }
        }
        interface lo {
            passive
        }
        level level-1-2
        metric-style wide
        net 49.6901.1921.6800.0167.00
    }
    mpls {
        interface eth1.32
        ldp {
            discovery {
                transport-ipv4-address 10.11.0.137
            }
            interface eth1.32
            router-id 10.11.0.137
        }
    }
}

I tried to comment the import parts of the config, and to provide it all with examples. At a high level ethernet eth1 is the interface in the same vlan as the hypervisors to send vxlan traffic between the two. Ethernet1.32 is a point to point interface (just being transported over a vlan in my network) to the 7210 to run LDP over.

VPN-IPv4 Configuration

Now turning focus to the other side of Vyos, the mpls/vpn-ipv4 configuration. To recap the traffic flow will look as such:

Proxmox hypervisor <—VXLAN-> vyos (with L3 gateway for VXLAN service) <—MPLS/LDP —> 7210 with a VPRN service

high-availability {
    vrrp {
        group vxlan31 {
            address 172.16.0.1/24 {
            }
            hello-source-address 172.16.0.2
            interface br1
            peer-address 172.16.0.3
            priority 200
            vrid 31
        }
    }
}
interfaces {
    bridge br1 {
        address 172.16.0.2/24
        member {
            interface vxlan31 {
            }
        }
        mtu 9000
        vrf test-vprn
    }
 vxlan vxlan31 {
        ip {
            enable-directed-broadcast
        }
        mtu 9000
        parameters {
            nolearning
        }
        port 4789
        source-address 192.168.254.137
        vni 31
    }
vrf {
    name test-vprn {
        protocols {
            bgp {
                address-family {
                    ipv4-unicast {
                        export {
                            vpn
                        }
                        import {
                            vpn
                        }
                        label {
                            vpn {
                                export auto
                            }
                        }
                        rd {
                            vpn {
                                export 10.11.0.137:231
                            }
                        }
                        redistribute {
                            connected {
                            }
                        }
                        route-map {
                            vpn {
                                export export-lp200
                            }
                        }
                        route-target {
                            vpn {
                                both 65000:231
                            }
                        }
                    }
                }
                system-as 65000
            }
        }
        table 231
    }
vrf {
    name test-vprn {
        protocols {
            bgp {
                address-family {
                    ipv4-unicast {
                        export {
                            vpn
                        }
                        import {
                            vpn
                        }
                        label {
                            vpn {
                                export auto
                            }
                        }
                        rd {
                            vpn {
                                export 10.11.0.137:231
                            }
                        }
                        redistribute {
                            connected {
                            }
                        }
                       
                        route-target {
                            vpn {
                                both 65000:231
                            }
                        }
                    }
                }
                system-as 65000
            }
        }
        table 231
    }

Few comments on the Vyos config.

  • Using VRRP to provide gateway redundancy between the two vyos instances for a gateway. This way if a instance goes down, the other vyos instance will still provide a valid gateway.
  • Bridge interface is created with the members of the vxlan31 interface, which is what defines the VNI parameters
  • Then the vrf configuration
    • Most import is defining the route target, redistribute the local interfaces, route distinguisher and to import/export using vpn routes.

Now the 7210:

A:KPTPE01# configure service vprn 231 
A:KPTPE01>config>service>vprn# info 
----------------------------------------------
            route-distinguisher 65000:231
            auto-bind-tunnel
                resolution any
            exit
            vrf-target target:65000:231
            interface "vlan31" create
                address 172.16.1.1/24
                sap 1/1/25:31 create
                    ingress
                    exit
                    egress
                    exit
                exit
            exit
            no shutdown

On the Nokia side, a little more simple as we are only focused with the VPN-IPv4 side of things here

  • RD and RT is defined
  • I created a L3 interface here just to have something to route to outside the interface in vyos

Review and Checks

To close out this part, let’s run some show commands and make sure the environment is ready to support some clients.


*A:KPTPE01# show router bgp summary 
===============================================================================
 BGP Router ID:10.11.0.2        AS:65000       Local AS:65000      
===============================================================================

BGP Summary
===============================================================================
Neighbor
Description
                   AS PktRcvd InQ  Up/Down   State|Rcv/Act/Sent (Addr Family)
                      PktSent OutQ
-------------------------------------------------------------------------------

10.11.0.137
prdkptvyos01
                65000   21466    0 06d09h28m 1/1/312 (VpnIPv4)
                        21947    0           4/0/40 (Evpn)
10.11.0.140
prdkptvyos02
                65000      10    0 00h02m24s 1/1/313 (VpnIPv4)
                          213    0           5/0/45 (Evpn)
192.168.254.13
compute3
                65000  323110    0 11d05h15m 1/0/40 (Evpn)
                       326755    0           
192.168.254.18
compute2
                65000  322278    0 11d04h34m 1/0/40 (Evpn)
                       325924    0           
       
-------------------------------------------------------------------------------
*A:KPTPE01#   show router ldp discovery 

===============================================================================
LDP IPv4 Hello Adjacencies
===============================================================================
Interface Name                   Local Addr                              State
AdjType                          Peer Addr                               
-------------------------------------------------------------------------------

to-devkptvyos01                  10.11.0.2:0                             Estab
link                             10.11.0.137:0                           
                                                                         
to-devkptvyos02                  10.11.0.2:0                             Estab
link                             10.11.0.140:0                           
                                                                         
-------------------------------------------------------------------------------


*A:KPTPE01# show router tunnel-table 

===============================================================================
IPv4 Tunnel Table (Router: Base)
===============================================================================
Destination       Owner     Encap TunnelId  Pref     Nexthop        Metric
-------------------------------------------------------------------------------
10.11.0.137/32    ldp       MPLS  65542     9        10.0.0.7       20
10.11.0.140/32    ldp       MPLS  65541     9        10.0.0.9       20
-------------------------------------------------------------------------------
Flags: B = BGP backup route available
       E = inactive best-external BGP route
===============================================================================
A:KPTPE01>config>service>vprn# show router 231 route-table protocol bgp-vpn 

===============================================================================
Route Table (Service: 231)
===============================================================================
Dest Prefix[Flags]                            Type    Proto     Age        Pref
      Next Hop[Interface Name]                                    Metric   
-------------------------------------------------------------------------------
172.16.0.0/24                                 Remote  BGP VPN   00h26m19s  170
       10.11.0.140 (tunneled)                                       0
-------------------------------------------------------------------------------
vyos@devkptvyos01:~$ show mpls ldp binding 
AF   Destination          Nexthop         Local Label Remote Label  In Use
ipv4 10.11.0.2/32         10.11.0.2       16          131071           yes
ipv4 10.11.0.137/32       0.0.0.0         imp-null    -                 no
ipv4 10.11.0.140/32       10.11.0.2       18          131045           yes

vyos@devkptvyos01:~$ show mpls ldp neighbor 
AF   ID              State       Remote Address    Uptime
ipv4 10.11.0.2       OPERATIONAL 10.11.0.2       6d10h56m

#View EVPN Type 2 Inclusive multicast routes
vyos@devkptvyos01:~$ show bgp vni all

VNI: 31

BGP table version is 1041, local router ID is 10.11.0.137
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [1]:[EthTag]:[ESI]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]

   Network          Next Hop            Metric LocPrf Weight Path
 #192.168.254.13 and .18 are proxmox hosts
 *>i[3]:[0]:[32]:[192.168.254.13]
                    192.168.254.13                100      0 i
                    RT:65000:31 ET:8
 *>i[3]:[0]:[32]:[192.168.254.18]
                    192.168.254.18                100      0 i
                    RT:65000:31 ET:8
#The two vyos routers
 *> [3]:[0]:[32]:[192.168.254.137]
                    192.168.254.137                    32768 i
                    ET:8 RT:65000:31
 *>i[3]:[0]:[32]:[192.168.254.140]
                    192.168.254.140               100      0 i
                    RT:65000:31 ET:8

vyos@devkptvyos01:~$    show ip route vrf test-vprn 
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF test-vprn:
C>* 172.16.0.0/24 is directly connected, br1, 6d11h23m
B>  172.16.1.0/24 [200/0] via 10.11.0.2 (vrf default) (recursive), label 131056, weight 1, 6d11h22m
  *                         via 10.0.0.6, eth1.32 (vrf default), label 131071/131056, weight 1, 6d11h22m

I know this is a ton of data above, but what I’m trying to show is that we have the expected inclusive multicast EVPN routes from Proxmox and the Vyos instances. Then there is MPLS/LDP connectivity between vyos and the 7210. Finally in the route tables, we have a route bidirectionally between the two so in theory if hosts were to communicate between the two it should work.

This is all I have for this post. The next post I will actually enable VMs on these networks, and show the connectivity.

Automating VM Builds

A few years ago, after installing Ubuntu server for the 100th time, I thought how cool it would be to automate the build. I also had some some inspiration from my workplace and the tooling that was used.

I personally build mostly Ubuntu servers, so that is what this will be focused on. My other systems, like Windows or other distros are infrequent but I hope to add them to add them to my catalog.

The code

I created this script which is on GitHub here. It is somewhat customized to my situation, but I hope to grow it into something more modular.

First I built this around the idea of having IPAM data in Phpipam, and a supplemental mysql database. Also once the VM is built, I have the option to add the VM to Librenms.

To run the code, first need to fill in the secret file here. This is imported by the various python files, to use as credentials.

Then to run the script there are a few command line inputs:

--template_id TEMPLATE_ID proxmox template ID to clone

--new_vmid NEW_VMID VM ID of new cloned vm

--new_vm_name NEW_VM_NAME name of new vm

--ipam_section IPAM_SECTION ipam section, enter ipv4 or ipv6
--disk_size DISK_SIZE disk size of disk after clone

--add_librenms ADD_LIBRENMS add device to librenms after vm is booted. new_vm_name to be set

--vlan_id VLAN_ID vlan ID assignment for VM and DB lookup

The vlan id field is used to query the mysql database, to get the correct IPv4 and IPv6 address assignment. The vlan then is updated on the cloned VM with this information as well.

The database table looks like this below as an example:

vlan_id | subnet | subnet_gateway | subnet_mask
---|---|---|--
2 | 192.168.2.0 | 192.168.2.1 | 24

Creating the image

First to start, Ubuntu offers specific images for cloud init here. Find the release, and architecture you need. In my example I used wget to download directly to Proxmox:

wget https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-amd64.img

Next create the VM that we will use as a template. The memory and network are just place holders, my script updates these parameters when the clone happens:

qm create 302 --memory 2048 --name kptdc-ubuntu-22.04 --net0 virtio,bridge=vmbr4,tag=10

Now import the downloaded img file as a disk to the VM. Note this will be now an unused disk in the GUI:

#Syntax: qm importdisk "vmID" "image-file-name" "storage-lvm-name"

#My example:

qm importdisk 302 ubuntu-22.04-server-cloudimg-amd64.img compute2-zvol-vm1

<abbreviated output>
transferred 2.2 GiB of 2.2 GiB (100.00%)
Successfully imported disk as 'unused0:compute2-zvol-vm1:vm-302-disk-0'

With it attached to the VM, now convert it to a real disk

#Syntax:
qm set 302 --scsihw virtio-scsi-pci --scsi0 "lvm-disk-name:vm-disk-name" (which was created in the last command)

#My example:
qm set 302 --scsihw virtio-scsi-pci --scsi0 compute2-zvol-vm1:vm-302-disk-0

Now adding a cloud init drive:

qm set 302 --ide2 compute2-zvol-vm1:cloudinit
update VM 302: -ide2 compute2-zvol-vm1:cloudinit

And some final settings:

#Make scsi drive bootable:

qm set 302 --boot c --bootdisk scsi0

#Attach a display:

qm set 302 --serial0 socket --vga serial0

Now it time to utilize the custom cloud init configuration. Cloud init supports 3 different types of configuration files:

  • user
  • network
  • meta

These allow you expand configuration that is not available within the GUI of Proxmox. A lot of good information is available here of all the possibilities.

My main goal is to install packages which I apply to all VMs, some iptables statements for QoS, and finally copying over some configuration files for several of the services. This is my example file.

Any of the custom cloud init files, need to exist on a data store which is enabled for snippets. By default the local storage is enabled for this, and the location would be /var/lib/vz/snippets

The last step of this is to associate the custom file with cicustom utility. For example:

qm set <VM_ID> --cicustom "user=local:snippets/user.yaml"

Example:

qm set 302 --cicustom "user=local:snippets/user.yaml"

Running the code

python3 build_vm_.py --template_id 302 --new_vmid 223 --new_vm_name testclone16 --disk_size 20G --vlan_id 66
Connecting to database prdkptdb01.internal.keepingpacetech.com

Running Query on database for vlan 66
IPv4 network found:  {'network': '192.168.66.128', 'gateway': '192.168.66.129', 'subnet_mask': '25'}
IPv6 network found:  {'network': 'fd00:0:0:66b::', 'gateway': 'fc00:0:0:66b::1', 'subnet_mask': '64'}

Running clone vm
checking vm status 223
clone
VM is still locked
checking vm status 223
VM is still locked
checking vm status 223
VM is still locked
checking vm status 223
VM is still locked
checking vm status 223
VM is still locked
checking vm status 223
VM is still locked
checking vm status 223
VM is still locked
checking vm status 223
VM is still locked
checking vm status 223
vm must be cloned
was able to write IP to IPAM

A few details on what the script does. First, there seems to be documented cases where when a custom file is used, Proxmox does not update the host name automatically. From a great post on the Proxmox forums, I found that you can set the host name via the smbios settings available the options in Proxmox.

The script will read the UUID of the cloned vm, and set it again with a rest call. Also with the serial, here we can set the host name. Then of course, it sets the cloud init config with the network settings that I choose in the development.

Now the fun part! Time to boot the VM, and if you watch the console it will eventually stop scrolling and have a line “cloud-init reached target” if all was successful. Enjoy your freshly built and customized vm!

Logical Homelab Network – Part Two

If you missed part one, here is a link to read that first. It will give a good overview of how I divided up my network. This part I wanted to dive into is how I setup BGP peering between two systems. Even those this is specific to my needs, this would be a great overview of how to accomplish any BGP peering, so I wanted to bring everyone along!

I am using eBGP between pfSense and each VPRN service on the Nokia router. I will assume, in this post, that you know how to already configure additional vlan interfaces on pfSense, but if it helps to see that config I can always add it; just leave a comment!

pfSense FRR Package

To install the FRR package, it is as simple as any other package in pfSense. Navigate to system > package manager, and install “frr”

Now, under services, you should notice a number of new FRR options. These are broken up by protocol, so you will see some options for OSPF, RIP and BGP. Technically, in this setup, you can use anything you prefer but I like BGP in this instance to make nice import/export policies and even have options to still have reachability between VPRNs when pfSense is down.

Now under FRR > BGP we will start with some global parameters

In my setup, I use pfSense as the first AS I consider to use which is 64500. Then I give it a router ID which is the LAN interface.

Then in FRR’s newest version that pfSense uses, it is updated to respect RFC 8212 where bgp routes are not imported/exported unless a policy is applied. For now, to keep it simple, I have a policy which simply accepts all. This is under services > FRR Global > Route Map. Here, I simply created one that is named “Allow-All” and set action to “accept.”

Building First BGP Peer

Under FRR BGP > Neighbor > Add we can now define the first neighbor that we want to establish a IPv4 BGP session with.

If you take a look at my topology diagram from part 1, or the close up picture of the topology below, I use 10.1.100.0/24 for my IPv4 point to point links. Each vlan, which represents a routed link from pfSense to the Nokia router, is a /29. I went to this because I am using two pfSense VMs, and that allows for the 3 CARP addresses needed on pfSense. CARP is a way to have a virtual floating IP – more in another post to come. But this allows a fail over between the pfSense VMs and the Nokia will not need to update any BGP peer, it will simply re-establish if there is a failure.

In any case, for a simple IPv4 eBGP peer, you need 3 things:

  • IP address of the neighbor
  • ASN (autonomous system number)
  • Route Map

The rest should be able to stay default. There are of course a lot of options in BGP. Not that I want to skip over these, but personally I like to keep pfSense simple. I’m not an expert at FRR and these options in the GUI. I like to create the complexity on the Nokia as I am just more used to it.

Lastly make sure to go to services > FRR Global > and check the “Enable” box to turn the package up.

Building peer on the Nokia router

To start out, I simply have printed the interfaces. I have standardized on naming my interfaces “to-pfsense-01-vlxxx” which the vlan corresponds to the VPRN number I use. Notice here, the IPv4 IP is 10.1.100.12 which we defined in pfSense as the neighbor earlier.

For some of the basic parameters, like ASN, router ID, etc. This is defined at the root level of the VPRN service, which is “/configure service vprn 101”

Now I have created a bgp group in the VPRN service. Here I simply defined the neighbor, its ASN, a family, and an export policy.

For completeness, here is the policy:

I wish I could talk about all of this, but I will be skipping a few things to cover later in another blog post. If you already know the other families and extensions of BGP, you probably understand some of the above. To keep this from being a book, I will go over the basics.

Each prefix is in this service, VPRN 101, will be evaluated top down in this policy. So I have several entries that perform drops, accepts, and then finally if a route does not match any of the other entries, the default action is to drop it.

Prefix lists, are just groups of networks. For example the “Routed-Links”

In an effort to try to keep the routing tables clean as possible I, for example, block any of the /29 networks in the 10.1.100.0/24 blockwhich I have set up between pfSense and this Nokia router. The longer just means that anything that falls within the 10.1.100.0/24 range – that has a longer subnet mask, included here. My /29s are longer. I could define every /29 here, but that’s a lot more work, and then every time I would turn up a new VPRN I would have to update policies (which is not ideal).

The main points of interest for this post are the “direct” networks. Direct means any of the layer 3 interfaces that are configured on this router. In my first picture where I looked at the L3 interface, you can see VL66 and VL666. These are direct interfaces, and since I want to be able to have other hosts on my network reach these, and conversely, those networks be able to access other hosts, I am exporting these to pfSense.

Validation

If everything goes right, the peer should be established.

There is a ton of information to digest in this image:

  1. We can see the BGP information that is local to this VPRN service
    • The AS is 64601 (we defined this in pfSense for the neighbor, and local in the Nokia VPRN service)
    • BGP is operationally up
    • The router ID for this service is 10.11.0.2
  2. The first neighbor, we can see is the IP of pfSense, 10.1.100.9.
    • It has been up for 6+ days (when it is in the up state a addr family will be shown)
    • Also we can see that we have received 36 routes, 31 are active in the route table, and 5 are advertised (this is from the heading Rcv/Act/Sent)
    • Finally, the address family is IPv4

Now let’s looked at received routes, and advertised. This is called the RIB (Routing Information Base) – I cut this down as we do not need to see everything:

Routes that exist in the received RIB, you can think of as a “dumping ground.” Routes are populated here from whatever the peer has sent, but that does not mean that it will be the active route in the route table. An import policy may reject these, and/or other protocols which are advertising the same prefix to this router that may be preferred. If the logic and router determines the route should be installed in the route table because it is best, it will have a status code. Here looking at the static codes u*>i, means that it is used, valid, best, and IGP (mostly just a old carry over when EGP was a protocol before BGP). If we look in the route table we can see the default route exists via the protocol BGP and a next hop of pfSense.

Not forgetting the advertised routes, we can see the two 192.168.66.0 (and 128)/25 networks are there, with a few others which match other accepts in the policies which are being sent to pfSense.

Similarly, we can check pfSense for the information it has. First under Status > FRR. BGP, this will show you the simiar RIB information. Here we can see the two /25s (among others of course) that pfSense learns about from the various eBGP peers.

Under status > routes, shows the actual FIB (routing table) in pfSense. Again, we can see the routes are installed, and the RIB gave us that clue they would be as well because the status code is *>

This closes out this part of the discussion. At this point we have a eBGP peer up, and routes are being advertised between the routers. From this point, firewall rules can be assigned to allow/block traffic, and nat rules need to be created to allow the hosts to access the internet since they are private RFC 1918 space. In the next part, I will cover the firewall rules and NAT rules I have applied for this. Eventually, we will get to IPv6, which will be very similar but with a few slight changes.

Logical Homelab Network Overview – Part One

Homelab’s logical network was a lot more work than I thought to put together, but it has been a great exercise to finally put this down on paper.

In part one, I am trying to give some background to the services I run, and how I setup my network. I will have a follow-up with some more hardware based details, and more specific vm/containers.

To start, I have some EOL Nokia network gear for my core routing and use pfSense for edge firewall/VPN/HA. I have logically broken up these services in IP VRFs (Virtual Routing and Forwarding) or a Nokia specific service – a VPRN (Virtual Private Routed Network)

This allows me to control access between the VPRN services, through a firewall (pfSense) instead of relying on ACLs on the Nokia router, or moving all my routing to pfSense, which would slow performance. This, in my opinion, is the best of both worlds. Routers route, and firewalls inspect. Now when I have services on multiple networks, within the same VPRN, I get the benefit of wireline bandwidth as it will route like a normal router. Otherwise, if traffic needs to cross VPRNs, it will need to go through pfSense where firewall rules are evaluated.

Without further ado here is the drawing!

Here is a quick run down of the services from top down:

  • Offsite:
    • I run Truenas scale on an old server over a Wireguard site-to-site VPN running on a Raspberry Pi. This is not powered on all the time, and I use it to replicate snapshots from my main Truenas core’s SMB shares.
    • The other offsite location I have is a Synology NAS, to which I also replicate Truenas core’s SMB shares using rsync.
  • I have Comcast business service and several static IPv4 IPs and IPv6 block.
  • On my Edge, I have two pfSense VMs running on two hypervisors which provide HA failover for themselves and my internal network.
  • Below is a list of the VPRN services:
VPRNVPRN NamePurposeNotes
100Inband/AD Hypervisor gui mgmt, and AD ServicesCritical AD services (DNS/DHCP/NPS)
101Infrastructure servicesNetwork hardware mgmt, and VMs running critical softwareSyslog/snmp monitoring/stats
102Unrestricted clientsClients that have no rules blocking to other zonesTLS/PEAP WLAN clients or domain joined machines
103Guest/IOTGuest SSID and IOT devices
104DMZMix of internal DMZ and inbound services allowedWeb/mail/NTP pool members
105Restricted DevicesDevices that have access blocked to AD/infrastructure servicesclients that do not need full access to the internal network
106File ServicesMostly for SMB accessTruenas scale SMB shares
108CamerasNVR and camerasBlueiris NVR and IP cameras
109StorjStorj node operator VM

Hopefully that layout makes sense so you can understand how I went about separating these. Feel free to ask me questions! Now, let’s spend a little bit of time on the routing and how it looks:

I am using VPRN service 100 as an example. I send the default route via the eBGP neighbor (pfSense). Then if I look for another prefix that would be in another VPRN (this case 102) it does not route directly to that network; instead it goes to the firewall first even though both of these exist on the same router. A cleanup I want to do is to only send the default route from pfSense to these VPRNs, not on specific routes, but that will be a future blog post!

We can view the local configured subnets in the VPRN. These would be exported to pfSense so other VPRNs, and pfSense would know how to reach these hosts.

Finally, in pfSense, here is a quick shot of how the route table looks. First there is 192.168.8.0/25 that is in VPRN 100, and the 192.168.9.0/24 prefix that is in VPRN 102. Notice how they have two different eBGP next hop IPs which correspond to the interface on the Nokia router.

Part 2 I’ll dive into the FRR package in pfSense and the BGP setup on the Nokia router.