Quick Take Link to heading
After building a Kubernetes admission controller, I found that simple shell scripts were more effective than complex Go code for integration testing. This approach provides better debuggability, easier maintenance, and seamless integration with CI/CD pipelines.
Introduction Link to heading
In the previous parts of this series, we’ve built a Kubernetes admission controller, secured it, set up releases, and implemented robust configuration. Today, I’ll tackle a critical but often neglected aspect of software development: integration testing.
Integration testing ensures that all the pieces of our controller work together correctly in a real-world environment. While unit tests are valuable for verifying individual components, integration tests give me confidence that our controller functions properly in an actual Kubernetes cluster with all its dependencies and integrations.
The Integration Testing Challenge Link to heading
When I started working on this project, I knew I wanted integration tests from the beginning. Too often, they’re added as an afterthought or, worse, never implemented at all. The challenge was finding a testing approach that was:
- Simple enough to set up quickly
- Robust enough to give us confidence in our code
- Easy to debug when things went wrong
- Quick to run as part of our CI/CD pipeline
My first instinct was to implement tests in Go, following typical Kubernetes testing patterns. However, I quickly ran into complications. The tests were mostly running shell commands via Go’s exec
package, making them hard to debug and maintain. When I stepped back and considered what I was really trying to accomplish, a simpler solution emerged.
Embracing Shell Scripts Link to heading
Sometimes the best solution is the simplest one. Instead of wrapping shell commands in Go code, I decided to use shell scripts directly. This approach offered several advantages:
- Scripts are easy to debug (I can run parts of them manually)
- They’re straightforward to maintain
- The scripts themselves can be reused outside of testing
- No need for additional dependencies or complicated test frameworks
The foundation of my testing strategy is a simple Make target that orchestrates the entire test process:
integration-test:
trap './scripts/delete-kind-cluster.sh' EXIT; \
./scripts/create-kind-cluster.sh && \
$(MAKE) build && \
./scripts/kind-deploy.sh && \
./scripts/integ-test.sh
This target:
- Sets up a trap to clean up resources even if tests fail
- Creates a kind (Kubernetes in Docker) cluster
- Builds our admission controller
- Deploys it to the cluster
- Runs the integration tests
The beauty of this approach is its simplicity and reliability. Each step is contained in a separate script, making it easy to debug any particular stage.
The Testing Scripts Link to heading
Let’s take a closer look at each of the scripts that make up my testing framework:
Creating a Kind Cluster Link to heading
The first step is setting up a local Kubernetes environment. I use kind (Kubernetes in Docker), which provides a lightweight way to run Kubernetes clusters locally:
#!/bin/bash
set -euo pipefail
# Set default cluster name if not provided
CLUSTER_NAME=${KIND_CLUSTER_NAME:-webhook-test}
NAMESPACE=${NAMESPACE:-webhook-test}
echo "Creating kind cluster: ${CLUSTER_NAME}"
# Create kind cluster using external config
kind create cluster --name "${CLUSTER_NAME}" --config test/e2e/manifests/kind-config.yaml
echo "Installing cert-manager..."
# Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.4/cert-manager.yaml
echo "Waiting for cert-manager deployments to be available..."
kubectl wait --for=condition=Available --timeout=300s -n cert-manager deployment/cert-manager
kubectl wait --for=condition=Available --timeout=300s -n cert-manager deployment/cert-manager-cainjector
kubectl wait --for=condition=Available --timeout=300s -n cert-manager deployment/cert-manager-webhook
kubectl create namespace ${NAMESPACE} --dry-run=client -o yaml | kubectl apply -f -
echo "Kind cluster ${CLUSTER_NAME} is ready with cert-manager installed!"
This script:
- Creates a kind cluster with a predefined configuration
- Installs cert-manager, which is required for managing TLS certificates
- Sets up the namespace for our admission controller
- Waits for all dependencies to be available before continuing
Deploying the Controller Link to heading
Once the cluster is ready, I need to deploy the admission controller:
#!/bin/bash
set -euo pipefail
# Set default cluster name if not provided
CLUSTER_NAME=${KIND_CLUSTER_NAME:-webhook-test}
NAMESPACE=${NAMESPACE:-webhook-test}
IMAGE_NAME=${IMAGE_NAME:-ghcr.io/jjshanks/pod-label-webhook}
VERSION=${VERSION:-latest}
kind export kubeconfig --name ${CLUSTER_NAME}
kind load docker-image ${IMAGE_NAME}:${VERSION} --name ${CLUSTER_NAME}
kubectl apply -f test/e2e/manifests/webhook.yaml
kubectl wait --for=condition=Ready --timeout=60s -n ${NAMESPACE} certificate/pod-label-webhook-cert
kubectl apply -f test/e2e/manifests/deployment.yaml
kubectl wait --for=condition=Available --timeout=60s -n ${NAMESPACE} deployment/pod-label-webhook
This script:
- Sets up environment-specific variables for flexibility
- Exports the kubeconfig for cluster access
- Loads the controller image into the kind cluster
- Applies the webhook configuration and deployment manifests
- Waits for the controller to be fully deployed and ready
Running the Tests Link to heading
Now for the heart of the testing framework - the actual integration tests:
#!/bin/bash
set -euo pipefail
# Source common test utilities
source "$(dirname "$0")/utils.sh"
# Use a high port number that doesn't require root
LOCAL_PORT=18443
# Set up cleanup on script exit
trap cleanup EXIT
echo "Applying test deployments..."
kubectl apply -f test/e2e/manifests/test-deployment.yaml
echo "Waiting for deployments to be available..."
kubectl wait --for=condition=Available --timeout=60s -n default deployment/integ-test
kubectl wait --for=condition=Available --timeout=60s -n default deployment/integ-test-no-label
echo "Checking webhook pod health status..."
WEBHOOK_POD=$(kubectl get pods -n webhook-test -l app=pod-label-webhook -o jsonpath='{.items[0].metadata.name}')
# Wait for pod to be ready
echo "Waiting for webhook pod to be ready..."
kubectl wait --for=condition=Ready --timeout=60s -n webhook-test pod/$WEBHOOK_POD
# Get the service port
WEBHOOK_PORT=$(kubectl get service pod-label-webhook -n webhook-test -o jsonpath='{.spec.ports[0].port}')
# Setup port forwarding
echo "Setting up port forwarding..."
kubectl port-forward -n webhook-test service/pod-label-webhook $LOCAL_PORT:$WEBHOOK_PORT &
PORT_FORWARD_PID=$!
# Wait for port forwarding to be established
if ! wait_for_port $LOCAL_PORT; then
echo "ERROR: Port forwarding failed to establish"
exit 1
fi
# Verify the webhook is labeling pods correctly
echo "Checking for expected label presence..."
if ! kubectl get pods -n default -l app=integ-test,hello=world --no-headers 2>/dev/null | grep -q .; then
echo "ERROR: Label 'hello=world' not found when it should be present"
exit 1
fi
echo "Checking for expected label absence..."
if kubectl get pods -n default -l app=integ-test-no-label,hello=world --no-headers 2>/dev/null | grep -q .; then
echo "ERROR: Label 'hello=world' found when it should not be present"
exit 1
fi
echo "Testing metrics endpoint..."
metrics_output=$(curl -sk https://localhost:$LOCAL_PORT/metrics)
if [ $? -ne 0 ]; then
echo "ERROR: Failed to fetch metrics"
exit 1
fi
echo "Initial metrics check..."
echo "Got $(echo "$metrics_output" | wc -l) lines of metrics"
# Verify metrics
echo "Verifying metrics..."
# Wait a moment for metrics to be available
sleep 5
# Check readiness status first
if ! check_metric "pod_label_webhook_readiness_status" "" "1" "Webhook readiness" "$LOCAL_PORT"; then
exit 1
fi
# Check request metrics
if ! check_metric "pod_label_webhook_requests_total" 'method="POST",path="/mutate",status="200"' "1" "Successful mutate requests" "$LOCAL_PORT"; then
exit 1
fi
# Check label operations
if ! check_metric "pod_label_webhook_label_operations_total" 'namespace="default",operation="success"' "1" "Successful label operations" "$LOCAL_PORT"; then
exit 1
fi
if ! check_metric "pod_label_webhook_label_operations_total" 'namespace="default",operation="skipped"' "1" "Skipped label operations" "$LOCAL_PORT"; then
exit 1
fi
echo "All tests passed successfully!"
This script performs a comprehensive test of the admission controller:
- It applies test deployments that will trigger the controller
- It verifies that the controller’s pod is healthy
- It sets up port forwarding to access the controller’s metrics endpoint
- It checks that the
hello=world
label was correctly added to the appropriate pods - It verifies that pods with the proper annotation to opt out of labeling don’t receive the label
- It tests the metrics endpoint to ensure monitoring is working
The beauty of this approach is that it tests the entire system as a unit, just as it would function in production.
Common Test Utilities Link to heading
To keep my test scripts clean and reusable, I extracted common testing functions into a utilities file:
#!/bin/bash
# test/integration/utils.sh
# Common utility functions for integration tests
# Extract metric value using grep and awk
get_metric_value() {
local metric_name=$1
local labels=$2
local metrics_output=$3
local value
if [ -z "$labels" ]; then
# For metrics without labels (like gauges)
value=$(echo "$metrics_output" | grep "^$metric_name " | awk '{print $2}')
else
# For metrics with labels
value=$(echo "$metrics_output" | grep "^$metric_name{${labels}}" | awk '{print $2}')
fi
# If no value found, print the matching lines for debugging
if [ -z "$value" ]; then
echo "DEBUG: Lines matching ${metric_name}:" >&2
echo "$metrics_output" | grep "^${metric_name}" >&2
fi
echo "$value"
}
# Check metric with retries
check_metric() {
local metric_name=$1
local labels=$2
local expected_min=$3
local description=$4
local metrics_port=$5
local max_attempts=10
local attempt=1
local value
local metrics_output
while [ $attempt -le $max_attempts ]; do
metrics_output=$(curl -sk https://localhost:${metrics_port}/metrics 2>/dev/null)
value=$(get_metric_value "$metric_name" "$labels" "$metrics_output")
if [ -n "$value" ]; then
if awk "BEGIN {exit !($value >= $expected_min)}"; then
echo "✓ $description verified (value: $value)"
return 0
fi
fi
echo "Attempt $attempt: Waiting for $description (current: ${value:-none}, expected min: $expected_min)"
sleep 5
attempt=$((attempt + 1))
done
echo "ERROR: Failed to verify $description after $max_attempts attempts"
echo "Current metrics output:"
echo "$metrics_output"
return 1
}
# Wait for port to be available
wait_for_port() {
local port=$1
local max_attempts=${2:-10}
local attempt=1
echo "Waiting for port $port to be available..."
while [ $attempt -le $max_attempts ]; do
if nc -z localhost "$port"; then
echo "Port $port is available"
return 0
fi
echo "Attempt $attempt: Port $port not available yet"
sleep 2
attempt=$((attempt + 1))
done
echo "ERROR: Port $port not available after $max_attempts attempts"
return 1
}
# Clean up resources and handle interrupts
cleanup() {
echo "Cleaning up test resources..."
# Kill port forwarding if it exists
if [ -n "${PORT_FORWARD_PID:-}" ]; then
echo "Stopping port forwarding process..."
kill $PORT_FORWARD_PID || true
wait $PORT_FORWARD_PID 2>/dev/null || true
fi
# Delete test resources
echo "Deleting test deployments..."
kubectl delete -f test/e2e/manifests/test-deployment.yaml --ignore-not-found
}
These utilities provide:
- Functions for checking metrics with retry logic
- Port availability checks
- Cleanup functions to ensure tests don’t leave resources behind
Evolving from Simple to Comprehensive Link to heading
My testing approach started simple but grew more sophisticated as the controller matured. The initial test script was straightforward:
#!/bin/bash
set -euo pipefail
kubectl apply -f tests/manifests/test-deployment.yaml
kubectl wait --for=condition=Available --timeout=60s -n default deployment/integ-test
if kubectl get pods -n default -l app=integ-test,hello=world --no-headers 2>/dev/null | grep -q .; then
echo "Label exists"
exit 0
else
echo "Label not found"
exit 1
fi
This minimal version tested the core functionality - adding the hello=world
label to pods. But as I added features like metrics and configuration options, the tests evolved to cover these aspects as well.
The beauty of using shell scripts is how easily they can be extended. When I added metrics to the controller, it was trivial to extend the tests to verify the metrics were being generated correctly.
Benefits for Automated Updates Link to heading
One of the most significant benefits of these integration tests has been their role in automating dependency updates. With Dependabot configured to update our dependencies, the integration tests provide confidence that these updates don’t break the controller.
The workflow is now completely automated:
- Dependabot identifies an update and creates a pull request
- GitHub Actions runs our integration tests on the PR
- If tests pass, the PR can be automatically merged
- If tests fail, we’re alerted to investigate the issue
This has saved me countless hours of manual verification and ensured the controller stays up-to-date with security patches and bug fixes.
Lessons Learned Link to heading
This journey has taught me several valuable lessons about integration testing:
-
Start with the simplest approach: My initial instinct to use Go for testing made things unnecessarily complex. Starting with simple shell scripts allowed me to get testing in place quickly.
-
Build incrementally: The tests started minimal and grew as the controller added features. This incremental approach kept testing aligned with development.
-
Prioritize debuggability: Being able to run individual scripts or parts of scripts made debugging much easier than with a more monolithic testing framework.
-
Reuse components: Breaking testing into modular scripts allowed me to reuse components like cluster creation and deployment in other contexts.
-
Consider cleanup from the start: Using
trap
to ensure cleanup even when tests fail prevented resource leakage and made the testing more robust.
Looking Forward Link to heading
While my shell-based testing approach has served well, there’s always room for improvement. Future enhancements could include:
- Parallel test execution for faster CI/CD pipelines
- More comprehensive testing of edge cases and error conditions
- Integration with performance testing frameworks
- Automated compatibility testing across multiple Kubernetes versions
I’m particularly interested in exploring how I might supplement these integration tests with performance benchmarks to detect potential performance regressions during development.
Conclusion Link to heading
Integration testing is a critical but often overlooked aspect of software development. For my Kubernetes admission controller, a simple, shell-based approach has proven remarkably effective. It gives me confidence that the controller works correctly in real-world scenarios while remaining easy to maintain and debug.
The key takeaway is that effective testing doesn’t have to be complex. Sometimes, the simplest approach is the most sustainable. By starting with basic tests and evolving them as the controller matured, I’ve built a robust testing framework that ensures the controller remains reliable even as dependencies and features change.
For anyone building Kubernetes controllers or similar infrastructure components, I highly recommend prioritizing integration testing early in your development process. The confidence it provides is invaluable, and the infrastructure you build for testing can often be repurposed for other aspects of your development workflow.
In the next part of this series, I’ll explore monitoring and observability, ensuring the controller not only works correctly but can be effectively operated in production environments.