Integration Testing a Kubernetes Admission Controller

Quick Take Link to heading

After building a Kubernetes admission controller, I found that simple shell scripts were more effective than complex Go code for integration testing. This approach provides better debuggability, easier maintenance, and seamless integration with CI/CD pipelines.

Introduction Link to heading

In the previous parts of this series, we’ve built a Kubernetes admission controller, secured it, set up releases, and implemented robust configuration. Today, I’ll tackle a critical but often neglected aspect of software development: integration testing.

Integration testing ensures that all the pieces of our controller work together correctly in a real-world environment. While unit tests are valuable for verifying individual components, integration tests give me confidence that our controller functions properly in an actual Kubernetes cluster with all its dependencies and integrations.

The Integration Testing Challenge Link to heading

When I started working on this project, I knew I wanted integration tests from the beginning. Too often, they’re added as an afterthought or, worse, never implemented at all. The challenge was finding a testing approach that was:

Simple enough to set up quickly
Robust enough to give us confidence in our code
Easy to debug when things went wrong
Quick to run as part of our CI/CD pipeline

My first instinct was to implement tests in Go, following typical Kubernetes testing patterns. However, I quickly ran into complications. The tests were mostly running shell commands via Go’s exec package, making them hard to debug and maintain. When I stepped back and considered what I was really trying to accomplish, a simpler solution emerged.

Embracing Shell Scripts Link to heading

Sometimes the best solution is the simplest one. Instead of wrapping shell commands in Go code, I decided to use shell scripts directly. This approach offered several advantages:

Scripts are easy to debug (I can run parts of them manually)
They’re straightforward to maintain
The scripts themselves can be reused outside of testing
No need for additional dependencies or complicated test frameworks

The foundation of my testing strategy is a simple Make target that orchestrates the entire test process:

integration-test:
	trap './scripts/delete-kind-cluster.sh' EXIT; \
	./scripts/create-kind-cluster.sh && \
	$(MAKE) build && \
	./scripts/kind-deploy.sh && \
	./scripts/integ-test.sh

This target:

Sets up a trap to clean up resources even if tests fail
Creates a kind (Kubernetes in Docker) cluster
Builds our admission controller
Deploys it to the cluster
Runs the integration tests

The beauty of this approach is its simplicity and reliability. Each step is contained in a separate script, making it easy to debug any particular stage.

The Testing Scripts Link to heading

Let’s take a closer look at each of the scripts that make up my testing framework:

Creating a Kind Cluster Link to heading

The first step is setting up a local Kubernetes environment. I use kind (Kubernetes in Docker), which provides a lightweight way to run Kubernetes clusters locally:

#!/bin/bash

set -euo pipefail

# Set default cluster name if not provided
CLUSTER_NAME=${KIND_CLUSTER_NAME:-webhook-test}
NAMESPACE=${NAMESPACE:-webhook-test}

echo "Creating kind cluster: ${CLUSTER_NAME}"

# Create kind cluster using external config
kind create cluster --name "${CLUSTER_NAME}" --config test/e2e/manifests/kind-config.yaml

echo "Installing cert-manager..."

# Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.4/cert-manager.yaml

echo "Waiting for cert-manager deployments to be available..."

kubectl wait --for=condition=Available --timeout=300s -n cert-manager deployment/cert-manager
kubectl wait --for=condition=Available --timeout=300s -n cert-manager deployment/cert-manager-cainjector
kubectl wait --for=condition=Available --timeout=300s -n cert-manager deployment/cert-manager-webhook

kubectl create namespace ${NAMESPACE} --dry-run=client -o yaml | kubectl apply -f -

echo "Kind cluster ${CLUSTER_NAME} is ready with cert-manager installed!"

This script:

Creates a kind cluster with a predefined configuration
Installs cert-manager, which is required for managing TLS certificates
Sets up the namespace for our admission controller
Waits for all dependencies to be available before continuing

Deploying the Controller Link to heading

Once the cluster is ready, I need to deploy the admission controller:

#!/bin/bash

set -euo pipefail

# Set default cluster name if not provided
CLUSTER_NAME=${KIND_CLUSTER_NAME:-webhook-test}
NAMESPACE=${NAMESPACE:-webhook-test}
IMAGE_NAME=${IMAGE_NAME:-ghcr.io/jjshanks/pod-label-webhook}
VERSION=${VERSION:-latest}

kind export kubeconfig --name ${CLUSTER_NAME}
kind load docker-image ${IMAGE_NAME}:${VERSION} --name ${CLUSTER_NAME}

kubectl apply -f test/e2e/manifests/webhook.yaml
kubectl wait --for=condition=Ready --timeout=60s -n ${NAMESPACE} certificate/pod-label-webhook-cert
kubectl apply -f test/e2e/manifests/deployment.yaml
kubectl wait --for=condition=Available --timeout=60s -n ${NAMESPACE} deployment/pod-label-webhook

This script:

Sets up environment-specific variables for flexibility
Exports the kubeconfig for cluster access
Loads the controller image into the kind cluster
Applies the webhook configuration and deployment manifests
Waits for the controller to be fully deployed and ready

Running the Tests Link to heading

Now for the heart of the testing framework - the actual integration tests:

#!/bin/bash

set -euo pipefail

# Source common test utilities
source "$(dirname "$0")/utils.sh"

# Use a high port number that doesn't require root
LOCAL_PORT=18443

# Set up cleanup on script exit
trap cleanup EXIT

echo "Applying test deployments..."
kubectl apply -f test/e2e/manifests/test-deployment.yaml

echo "Waiting for deployments to be available..."
kubectl wait --for=condition=Available --timeout=60s -n default deployment/integ-test
kubectl wait --for=condition=Available --timeout=60s -n default deployment/integ-test-no-label

echo "Checking webhook pod health status..."
WEBHOOK_POD=$(kubectl get pods -n webhook-test -l app=pod-label-webhook -o jsonpath='{.items[0].metadata.name}')

# Wait for pod to be ready
echo "Waiting for webhook pod to be ready..."
kubectl wait --for=condition=Ready --timeout=60s -n webhook-test pod/$WEBHOOK_POD

# Get the service port
WEBHOOK_PORT=$(kubectl get service pod-label-webhook -n webhook-test -o jsonpath='{.spec.ports[0].port}')

# Setup port forwarding
echo "Setting up port forwarding..."
kubectl port-forward -n webhook-test service/pod-label-webhook $LOCAL_PORT:$WEBHOOK_PORT &
PORT_FORWARD_PID=$!

# Wait for port forwarding to be established
if ! wait_for_port $LOCAL_PORT; then
    echo "ERROR: Port forwarding failed to establish"
    exit 1
fi

# Verify the webhook is labeling pods correctly
echo "Checking for expected label presence..."
if ! kubectl get pods -n default -l app=integ-test,hello=world --no-headers 2>/dev/null | grep -q .; then
    echo "ERROR: Label 'hello=world' not found when it should be present"
    exit 1
fi

echo "Checking for expected label absence..."
if kubectl get pods -n default -l app=integ-test-no-label,hello=world --no-headers 2>/dev/null | grep -q .; then
    echo "ERROR: Label 'hello=world' found when it should not be present"
    exit 1
fi

echo "Testing metrics endpoint..."
metrics_output=$(curl -sk https://localhost:$LOCAL_PORT/metrics)
if [ $? -ne 0 ]; then
    echo "ERROR: Failed to fetch metrics"
    exit 1
fi

echo "Initial metrics check..."
echo "Got $(echo "$metrics_output" | wc -l) lines of metrics"

# Verify metrics
echo "Verifying metrics..."

# Wait a moment for metrics to be available
sleep 5

# Check readiness status first
if ! check_metric "pod_label_webhook_readiness_status" "" "1" "Webhook readiness" "$LOCAL_PORT"; then
    exit 1
fi

# Check request metrics
if ! check_metric "pod_label_webhook_requests_total" 'method="POST",path="/mutate",status="200"' "1" "Successful mutate requests" "$LOCAL_PORT"; then
    exit 1
fi

# Check label operations
if ! check_metric "pod_label_webhook_label_operations_total" 'namespace="default",operation="success"' "1" "Successful label operations" "$LOCAL_PORT"; then
    exit 1
fi

if ! check_metric "pod_label_webhook_label_operations_total" 'namespace="default",operation="skipped"' "1" "Skipped label operations" "$LOCAL_PORT"; then
    exit 1
fi

echo "All tests passed successfully!"

This script performs a comprehensive test of the admission controller:

It applies test deployments that will trigger the controller
It verifies that the controller’s pod is healthy
It sets up port forwarding to access the controller’s metrics endpoint
It checks that the hello=world label was correctly added to the appropriate pods
It verifies that pods with the proper annotation to opt out of labeling don’t receive the label
It tests the metrics endpoint to ensure monitoring is working

The beauty of this approach is that it tests the entire system as a unit, just as it would function in production.

Common Test Utilities Link to heading

To keep my test scripts clean and reusable, I extracted common testing functions into a utilities file:

#!/bin/bash

# test/integration/utils.sh
# Common utility functions for integration tests

# Extract metric value using grep and awk
get_metric_value() {
    local metric_name=$1
    local labels=$2
    local metrics_output=$3
    local value
    
    if [ -z "$labels" ]; then
        # For metrics without labels (like gauges)
        value=$(echo "$metrics_output" | grep "^$metric_name " | awk '{print $2}')
    else
        # For metrics with labels
        value=$(echo "$metrics_output" | grep "^$metric_name{${labels}}" | awk '{print $2}')
    fi
    
    # If no value found, print the matching lines for debugging
    if [ -z "$value" ]; then
        echo "DEBUG: Lines matching ${metric_name}:" >&2
        echo "$metrics_output" | grep "^${metric_name}" >&2
    fi
    
    echo "$value"
}

# Check metric with retries
check_metric() {
    local metric_name=$1
    local labels=$2
    local expected_min=$3
    local description=$4
    local metrics_port=$5
    local max_attempts=10
    local attempt=1
    local value
    local metrics_output

    while [ $attempt -le $max_attempts ]; do
        metrics_output=$(curl -sk https://localhost:${metrics_port}/metrics 2>/dev/null)
        value=$(get_metric_value "$metric_name" "$labels" "$metrics_output")
        
        if [ -n "$value" ]; then
            if awk "BEGIN {exit !($value >= $expected_min)}"; then
                echo "✓ $description verified (value: $value)"
                return 0
            fi
        fi
        
        echo "Attempt $attempt: Waiting for $description (current: ${value:-none}, expected min: $expected_min)"
        sleep 5
        attempt=$((attempt + 1))
    done
    
    echo "ERROR: Failed to verify $description after $max_attempts attempts"
    echo "Current metrics output:"
    echo "$metrics_output"
    return 1
}

# Wait for port to be available
wait_for_port() {
    local port=$1
    local max_attempts=${2:-10}
    local attempt=1

    echo "Waiting for port $port to be available..."
    while [ $attempt -le $max_attempts ]; do
        if nc -z localhost "$port"; then
            echo "Port $port is available"
            return 0
        fi
        echo "Attempt $attempt: Port $port not available yet"
        sleep 2
        attempt=$((attempt + 1))
    done

    echo "ERROR: Port $port not available after $max_attempts attempts"
    return 1
}

# Clean up resources and handle interrupts
cleanup() {
    echo "Cleaning up test resources..."
    # Kill port forwarding if it exists
    if [ -n "${PORT_FORWARD_PID:-}" ]; then
        echo "Stopping port forwarding process..."
        kill $PORT_FORWARD_PID || true
        wait $PORT_FORWARD_PID 2>/dev/null || true
    fi
    # Delete test resources
    echo "Deleting test deployments..."
    kubectl delete -f test/e2e/manifests/test-deployment.yaml --ignore-not-found
}

These utilities provide:

Functions for checking metrics with retry logic
Port availability checks
Cleanup functions to ensure tests don’t leave resources behind

Evolving from Simple to Comprehensive Link to heading

My testing approach started simple but grew more sophisticated as the controller matured. The initial test script was straightforward:

#!/bin/bash
set -euo pipefail
 
kubectl apply -f tests/manifests/test-deployment.yaml
kubectl wait --for=condition=Available --timeout=60s -n default deployment/integ-test
 
if kubectl get pods -n default -l app=integ-test,hello=world --no-headers 2>/dev/null | grep -q .; then
     echo "Label exists"
     exit 0
else
     echo "Label not found"
     exit 1
fi

This minimal version tested the core functionality - adding the hello=world label to pods. But as I added features like metrics and configuration options, the tests evolved to cover these aspects as well.

The beauty of using shell scripts is how easily they can be extended. When I added metrics to the controller, it was trivial to extend the tests to verify the metrics were being generated correctly.

Benefits for Automated Updates Link to heading

One of the most significant benefits of these integration tests has been their role in automating dependency updates. With Dependabot configured to update our dependencies, the integration tests provide confidence that these updates don’t break the controller.

The workflow is now completely automated:

Dependabot identifies an update and creates a pull request
GitHub Actions runs our integration tests on the PR
If tests pass, the PR can be automatically merged
If tests fail, we’re alerted to investigate the issue

This has saved me countless hours of manual verification and ensured the controller stays up-to-date with security patches and bug fixes.

Lessons Learned Link to heading

This journey has taught me several valuable lessons about integration testing:

Start with the simplest approach: My initial instinct to use Go for testing made things unnecessarily complex. Starting with simple shell scripts allowed me to get testing in place quickly.
Build incrementally: The tests started minimal and grew as the controller added features. This incremental approach kept testing aligned with development.
Prioritize debuggability: Being able to run individual scripts or parts of scripts made debugging much easier than with a more monolithic testing framework.
Reuse components: Breaking testing into modular scripts allowed me to reuse components like cluster creation and deployment in other contexts.
Consider cleanup from the start: Using trap to ensure cleanup even when tests fail prevented resource leakage and made the testing more robust.

Looking Forward Link to heading

While my shell-based testing approach has served well, there’s always room for improvement. Future enhancements could include:

Parallel test execution for faster CI/CD pipelines
More comprehensive testing of edge cases and error conditions
Integration with performance testing frameworks
Automated compatibility testing across multiple Kubernetes versions

I’m particularly interested in exploring how I might supplement these integration tests with performance benchmarks to detect potential performance regressions during development.

Conclusion Link to heading

Integration testing is a critical but often overlooked aspect of software development. For my Kubernetes admission controller, a simple, shell-based approach has proven remarkably effective. It gives me confidence that the controller works correctly in real-world scenarios while remaining easy to maintain and debug.

The key takeaway is that effective testing doesn’t have to be complex. Sometimes, the simplest approach is the most sustainable. By starting with basic tests and evolving them as the controller matured, I’ve built a robust testing framework that ensures the controller remains reliable even as dependencies and features change.

For anyone building Kubernetes controllers or similar infrastructure components, I highly recommend prioritizing integration testing early in your development process. The confidence it provides is invaluable, and the infrastructure you build for testing can often be repurposed for other aspects of your development workflow.

In the next part of this series, I’ll explore monitoring and observability, ensuring the controller not only works correctly but can be effectively operated in production environments.