Securing a Kubernetes Admission Controller: From TLS to Security Scanning

Quick Take Link to heading

I used AI to implement security measures for my Kubernetes admission controller, from TLS configuration to certificate handling. The journey revealed both strengths and limitations of AI-assisted security development, resulting in a robustly secured controller with multiple validation layers.

Introduction Link to heading

In my previous post, I shared how AI assistance helped me build a working Kubernetes admission controller. While getting the basic functionality working was exciting, I quickly realized that moving from a functioning prototype to a production-ready controller required much more attention to security. I decided to continue my experiment of letting AI drive the development process, having it write almost all of the security-related code and tests.

This decision raised an interesting question: how well could AI handle implementing secure configurations, proper certificate handling, and comprehensive security testing? I maintained control over the security requirements and verification, but wanted to see just how far AI could take me in implementing robust security measures.

The Security Problem Link to heading

Admission controllers occupy a privileged position in Kubernetes - they can modify or reject any request to the API server. This power comes with significant responsibility, as a compromised controller could:

Inject malicious containers or configurations into workloads
Leak sensitive information through labels or annotations
Deny legitimate requests, causing service disruptions
Consume excessive resources, impacting cluster performance

What I found particularly challenging was that TLS isn’t just a best practice for admission controllers - it’s required. Kubernetes API server will only connect to webhook endpoints over HTTPS, making proper TLS configuration essential.

Starting with Security Scanning Link to heading

Before diving into specific security measures, I wanted to ensure we had automated security scanning in place. With AI’s help, I added several security-focused GitHub Actions workflows:

name: Security

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  schedule:
    - cron: "0 0 * * 0" # Weekly on Sundays

jobs:
  security:
    name: Security Scan
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Run Gosec
        uses: securego/gosec@master
        with:
          args: "-no-fail -fmt sarif -out results/gosec.sarif ./..."

      - name: Run Trivy filesystem scan
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: "fs"
          format: "sarif"
          output: "results/trivy-fs.sarif"
          severity: "CRITICAL,HIGH"

      - name: Run Trivy image scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: "pod-label-webhook:latest"
          format: "sarif"
          output: "results/trivy-image.sarif"
          severity: "CRITICAL,HIGH"

This workflow runs multiple security tools:

Gosec for Go-specific security issues
Trivy for both filesystem and container vulnerabilities
Results are uploaded in SARIF format for GitHub’s security dashboard

Having these tools in place early proved invaluable - they caught several issues during development that I might have missed otherwise.

Implementing TLS Configuration Link to heading

The first security layer I tackled was proper TLS configuration. This wasn’t optional - admission webhooks must serve HTTPS endpoints. Working with AI to implement this was interesting; while it was excellent at generating secure TLS configurations, getting these to pass our security scanning tools required several iterations.

I started with a basic TLS setup, but security scanners flagged multiple issues. After a few rounds of refinement with the AI, we landed on this hardened configuration:

server := &http.Server{
    Addr:    cfg.Address,
    Handler: mux,
    TLSConfig: &tls.Config{
        MinVersion: tls.VersionTLS13,
        CipherSuites: []uint16{
            tls.TLS_AES_128_GCM_SHA256,
            tls.TLS_AES_256_GCM_SHA384,
            tls.TLS_CHACHA20_POLY1305_SHA256,
        },
        CurvePreferences: []tls.CurveID{
            tls.X25519,
            tls.CurveP384,
        },
        SessionTicketsDisabled: true,
        Renegotiation:          tls.RenegotiateNever,
        InsecureSkipVerify:     false,
        ClientAuth:             tls.VerifyClientCertIfGiven,
    },
    ReadHeaderTimeout: readHeaderTimeout,
    WriteTimeout:      writeTimeout,
    ReadTimeout:       readTimeout,
    IdleTimeout:       idleTimeout,
}

I was impressed with how the AI helped craft a configuration that:

Enforces TLS 1.3 only (removing vulnerabilities in earlier TLS versions)
Specifies only the strongest cipher suites
Disables session tickets and TLS renegotiation
Sets proper timeouts to prevent certain denial-of-service attacks

Certificate Management Challenges Link to heading

While cert-manager handles certificate generation and rotation in the cluster, proper certificate handling within our application proved challenging. Getting the AI’s help with security scanning compliance was particularly valuable here.

The initial certificate validation code was simplistic, but security scanners flagged several issues around file permissions and validation. After multiple iterations, we arrived at this implementation:

func (c *Config) ValidateCertPaths() error {
    certInfo, err := os.Stat(c.CertFile)
    if err != nil {
        return fmt.Errorf("certificate file error: %v", err)
    }
    if !certInfo.Mode().IsRegular() {
        return fmt.Errorf("certificate path is not a regular file")
    }

    keyInfo, err := os.Stat(c.KeyFile)
    if err != nil {
        return fmt.Errorf("key file error: %v", err)
    }
    if !keyInfo.Mode().IsRegular() {
        return fmt.Errorf("key path is not a regular file")
    }

    keyMode := keyInfo.Mode().Perm()
    if keyMode&0o077 != 0 {
        return fmt.Errorf("key file %s has excessive permissions %v", c.KeyFile, keyMode)
    }
    if keyMode > 0o600 {
        log.Warn().Str("key_file", c.KeyFile).Msgf("key file has permissive mode %v", keyMode)
    }
    return nil
}

This code checks not just that certificates exist, but that they have appropriate file permissions - a critical security consideration I hadn’t initially thought about. Private keys with overly permissive file modes are a common security oversight that our validators now catch.

Testing Security Features Link to heading

What particularly impressed me about the AI-assisted development process was how it helped develop comprehensive test cases for security-critical code. For example, the certificate validation tests covered multiple edge cases:

func TestConfig_ValidateCertPaths(t *testing.T) {
    tests := []struct {
        name    string
        config  *Config
        setup   func() error
        wantErr bool
        errMsg  string
    }{
        {
            name: "key too permissive",
            config: &Config{
                CertFile: certFile,
                KeyFile:  keyFile,
            },
            setup: func() error {
                return os.Chmod(keyFile, 0o644)
            },
            wantErr: true,
            errMsg:  "has excessive permissions",
        },
        // More test cases...
    }
    // Test implementation...
}

These tests catch issues like overly permissive file permissions, missing files, and other security-critical edge cases. Having this comprehensive test coverage has already prevented a couple of regressions during development.

Input Validation and Security Monitoring Link to heading

Input validation is critical for any admission controller since we’re processing untrusted input from API requests. I implemented multiple layers of validation to ensure safe operation:

func (s *Server) shouldAddLabel(pod *corev1.Pod) bool {
    val, ok := pod.Annotations[annotationKey]
    if !ok {
        return true
    }

    parsed, err := strconv.ParseBool(val)
    if err != nil {
        s.logger.Warn().
            Str("value", val).
            Str("pod", pod.Name).
            Str("namespace", pod.Namespace).
            Msg("Invalid annotation value, defaulting to true")
        return true
    }

    return parsed
}

I also added security metrics tracking to monitor potentially suspicious activity:

func (s *Server) recordAnnotationMetrics(pod *corev1.Pod) {
    if pod.Annotations == nil {
        s.metrics.recordAnnotationValidation(annotationMissing, pod.Namespace)
        return
    }

    if val, ok := pod.Annotations[annotationKey]; ok {
        if _, err := strconv.ParseBool(val); err != nil {
            s.metrics.recordAnnotationValidation(annotationInvalid, pod.Namespace)
        } else {
            s.metrics.recordAnnotationValidation(annotationValid, pod.Namespace)
        }
    } else {
        s.metrics.recordAnnotationValidation(annotationMissing, pod.Namespace)
    }
}

These metrics help identify patterns of invalid input that might indicate attempted exploitation. During testing, we’ve already caught a few unexpected inputs that would have been hard to spot without this monitoring.

The Security Challenges I Faced Link to heading

Implementing these security measures wasn’t straightforward. Some of the key challenges I encountered were:

1. TLS Configuration Complexity Link to heading

Getting TLS right is harder than it looks. While the AI gave me solid starting configurations, the security scanners kept finding issues with subtle details like cipher suite selection and TLS version requirements. I spent several iterations fine-tuning the TLS configuration to satisfy both security best practices and operational requirements.

2. Certificate Handling Edge Cases Link to heading

Certificate handling introduced several edge cases I hadn’t considered initially. What happens if certificates are rotated while the server is running? What about permissions on certificate files? The AI helped me think through these scenarios and implement proper validation.

3. Error Handling Without Information Leakage Link to heading

One subtle issue was ensuring that error messages didn’t leak sensitive information while still being useful for debugging. This led to the creation of our custom error types:

type WebhookError struct {
    Op   string
    Path string
    Err  error
}

func (e *WebhookError) Error() string {
    if e.Path != "" {
        return fmt.Sprintf("webhook %s failed for %s: %v", e.Op, e.Path, e.Err)
    }
    return fmt.Sprintf("webhook %s failed: %v", e.Op, e.Err)
}

This approach gives us detailed internal errors for logging while presenting sanitized errors externally.

What I Learned About AI-Assisted Security Link to heading

Working with AI on security-critical code was an eye-opening experience. Here are my key takeaways:

AI excels at generating secure configurations - The TLS and certificate handling code the AI produced was robust and followed best practices.
Verification is still human work - While AI wrote great security code, I still needed security scanners and manual review to verify everything was actually secure.
AI generates comprehensive test cases - One of the most valuable contributions was the thorough security test coverage, which caught several edge cases I might have missed.
Iteration is key - Getting security right required multiple rounds of refinement as scanners and testing identified subtle issues.

Conclusion Link to heading

Securing a Kubernetes admission controller requires attention to multiple security layers - from TLS configuration to certificate handling to input validation. Using AI to help implement these measures proved surprisingly effective, though the process still required human oversight and several iterations.

The combination of AI assistance and automated security scanning provided a robust foundation for building these security controls. While AI excelled at generating secure configurations and comprehensive test cases, the practical implementation often required iteration to satisfy all requirements.

The end result is a controller with multiple layers of security, validated both through automated scanning and extensive testing. In the next part of this series, I’ll explore the release process and some interesting challenges around AI’s limitations with newer tools like goreleaser. Stay tuned!