SSH Access to TKG Cluster Nodes via a PodVM

Table of Contents

Introduction

In the rapidly evolving world of container orchestration and cloud-native computing, VMware’s Tanzu Kubernetes Grid (TKG) stands out as a robust solution for deploying and managing Kubernetes clusters across various infrastructures, including vSphere, AWS, and Azure. TKG simplifies the process of running production-grade Kubernetes, but like any complex system, it occasionally requires direct access to the underlying nodes for troubleshooting, maintenance, or configuration tweaks. One common method to achieve this is through Secure Shell (SSH) access.

However, in environments like vSphere with Tanzu, direct SSH access to TKG cluster nodes isn’t always straightforward due to network segmentation, security policies, and the virtualized nature of the nodes. This is where the concept of a PodVM comes into play. A PodVM, in this context, refers to a lightweight pod deployed as a virtual machine (VM) within the same vSphere namespace as the TKG cluster. It acts as a “jumpbox” or bastion host, providing a secure intermediary for SSH connections. This approach leverages Kubernetes’ pod architecture to mount sensitive credentials, such as SSH private keys, without exposing them directly to external systems.

This article delves into the intricacies of gaining SSH access to TKG cluster nodes via a PodVM. We’ll explore the background, prerequisites, a detailed step-by-step guide, potential pitfalls and troubleshooting tips, best practices for security, and real-world applications. By the end, you’ll have a thorough understanding of this technique, enabling you to apply it confidently in your VMware Tanzu environments. Whether you’re a DevOps engineer, system administrator, or Kubernetes enthusiast, mastering this method can significantly enhance your cluster management capabilities.

Understanding TKG and PodVM in vSphere with Tanzu

Before diving into the technical steps, it’s essential to grasp the foundational elements. Tanzu Kubernetes Grid is VMware’s distribution of upstream Kubernetes, designed for enterprise-grade deployments. In vSphere with Tanzu, TKG operates in two primary modes: the Supervisor Cluster, which is the control plane managed by vSphere, and TKG Service Clusters (also known as guest or workload clusters), which are user-provisioned Kubernetes clusters running on top of the Supervisor.

The nodes in these TKG Service Clusters are essentially virtual machines provisioned by vSphere. Each node runs a lightweight operating system like Photon OS, optimized for Kubernetes workloads. SSH access to these nodes is restricted to a system user account, typically “vmware-system-user,” to prevent unauthorized root access and maintain security compliance.

Direct SSH from an external machine might be blocked by network policies, especially in setups using NSX-T for networking, where clusters are isolated in logical segments. This isolation enhances security but complicates access. Enter the PodVM: In vSphere with Tanzu, pods can be deployed directly on the hypervisor using vSphere Pods, which are VM-like entities that run containers without a full guest OS overhead. However, for our purpose, we use a standard pod (often called a “podVM” in documentation due to its VM-backed nature) as a jumpbox.

This jumpbox pod is created in the same namespace as the TKG cluster. It mounts a Kubernetes secret containing the SSH private key, which is automatically generated during cluster provisioning. The secret, named something like “<cluster-name>-ssh,” holds the key pair needed for authentication. By exec-ing into this pod and initiating SSH from there, you bypass external network restrictions, as the pod resides within the same secure namespace and network segment.

This method is particularly useful in air-gapped or highly segmented environments, where external bastions aren’t feasible. It aligns with Kubernetes’ declarative model, allowing for ephemeral jumpboxes that can be spun up and torn down as needed, minimizing security risks.

Prerequisites for SSH Access

To successfully implement SSH access via a PodVM, several prerequisites must be in place. First, ensure your environment is set up with vSphere with Tanzu using NSX networking. Note that vDS networking isn’t supported for this specific jumpbox method due to differences in pod networking.

You’ll need administrative access to the vSphere Supervisor Cluster via kubectl. This typically involves logging in as a vCenter Single Sign-On (SSO) user with appropriate privileges. Install the vSphere Plugin for kubectl if you haven’t already, as it facilitates authentication.

The target TKG Service Cluster must be provisioned in a vSphere Namespace. During cluster creation (via Tanzu CLI or YAML manifests), an SSH key secret is automatically generated. Verify its existence beforehand.

Additionally, prepare a machine with kubectl installed and configured to connect to the Supervisor. If your environment uses a private container registry for images like Photon OS, create a registry credential secret (e.g., “regcred”) to pull images securely.

Familiarity with basic Kubernetes commands, YAML manifests, and SSH concepts is assumed. Ensure the cluster nodes are healthy and accessible within the namespace—check this with “kubectl get nodes -o wide” to retrieve IP addresses.

Finally, for security, perform these operations from a trusted workstation, and always clean up resources post-use to avoid lingering vulnerabilities.

Step-by-Step Guide to SSH Access via PodVM

Now, let’s walk through the process in detail. This guide is based on official VMware documentation and community best practices.

Step 1: Connect to the Supervisor Cluster

Start by authenticating to the vSphere Supervisor Cluster using kubectl. Run:

text

kubectl vsphere login –server=<SUPERVISOR-IP> –vsphere-username=<YOUR-USERNAME> –insecure-skip-tls-verify

Replace <SUPERVISOR-IP> with the IP of the Supervisor control plane and <YOUR-USERNAME> with your vSphere SSO account. The –insecure-skip-tls-verify flag is optional but useful for self-signed certificates.

Once logged in, list available contexts:

text

kubectl config get-contexts

Switch to the namespace where your TKG cluster resides:

text

kubectl config use-context <NAMESPACE>

Set an environment variable for convenience:

text

export NAMESPACE=<YOUR-NAMESPACE>

Step 2: Verify the SSH Secret

Confirm the presence of the SSH private key secret:

text

kubectl get secrets

Look for “<CLUSTER-NAME>-ssh.” If it’s missing, the cluster might not have been provisioned with SSH enabled—recreate it if necessary.

Step 3: Create a Registry Credential Secret (If Needed)

If pulling the Photon OS image requires authentication (e.g., from a private registry), create a secret:

text

kubectl create secret docker-registry regcred –docker-server=<REGISTRY-URL> –docker-username=<USERNAME> –docker-password=<PASSWORD> –docker-email=<EMAIL>

Step 4: Deploy the Jumpbox PodVM

Create a YAML file named “jumpbox.yaml” with the following content:

text

apiVersion: v1

kind: Pod

metadata:

namespace: <YOUR-NAMESPACE>

spec:

containers:

– image: “photon:3.0”

command: [“/bin/bash”, “-c”, “–“]

args: [“yum install -y openssh-server; mkdir /root/.ssh; cp /root/ssh/ssh-privatekey /root/.ssh/id_rsa; chmod 600 /root/.ssh/id_rsa; while true; do sleep 30; done;”]

volumeMounts:

– mountPath: “/root/ssh”

readOnly: true

resources:

requests:

memory: 2Gi

volumes:

– name: ssh-key

secret:

secretName: <CLUSTER-NAME>-ssh

imagePullSecrets:

– name: regcred

Apply it:

text

kubectl apply -f jumpbox.yaml

This pod pulls the Photon OS image, installs OpenSSH, mounts the SSH key from the secret, sets permissions, and runs an infinite loop to keep it alive.

Step 5: Verify the Pod is Running

Check the pod status:

text

kubectl get pods

Wait until it’s “Running.” This may take a minute as it installs packages. The pod will appear as a VM in vCenter under the namespace.

Step 6: Obtain the Target Node IP

Get the node IPs:

text

kubectl get nodes -o wide

Alternatively, for virtual machine details:

text

kubectl get virtualmachines

export VMNAME=<VM-NAME>

export VMIP=$(kubectl get virtualmachine/$VMNAME -o jsonpath='{.status.vmIp}’)

Step 7: SSH into the Node via the PodVM

Exec into the jumpbox and SSH:

text

kubectl exec -it jumpbox — /usr/bin/ssh vmware-system-user@$VMIP

Accept the host key if prompted:

text

The authenticity of host ‘<VMIP>’ can’t be established.

ECDSA key fingerprint is SHA256:<FINGERPRINT>.

Are you sure you want to continue connecting (yes/no/[fingerprint])? yes

You’ll land in the node shell as vmware-system-user. Use sudo for elevated privileges:

text

sudo su

Step 8: Perform Operations and Exit

Execute your tasks, such as checking logs (/var/log/), restarting services (e.g., systemctl restart kubelet), or debugging. When done:

text

exit

This exits the SSH session, returning to the pod. Exit the pod exec with another “exit.”

Step 9: Clean Up

Delete the jumpbox for security:

text

kubectl delete pod jumpbox

Troubleshooting Common Issues

If the jumpbox pod fails to start, check logs with “kubectl logs jumpbox.” Common errors include image pull failures—verify registry credentials. If SSH connection refuses, ensure the node IP is correct and the key is properly mounted (check permissions in the pod).

Network issues in NSX-T might block intra-namespace traffic; verify firewall rules. If you encounter “no such file or directory” for /usr/bin/ssh, wait longer for the installation to complete and retry.

For older TKG versions, the secret format might differ—consult version-specific docs. If using vDS instead of NSX, alternative methods like direct SSH with port forwarding may be needed.

Word count so far: 1,356

Best Practices and Security Considerations

Security is paramount when dealing with SSH access. Always use ephemeral jumpboxes—create them only when needed and delete immediately after. Avoid storing keys externally; rely on Kubernetes secrets.

Implement role-based access control (RBAC) to limit who can deploy such pods. Monitor pod logs and cluster events for unauthorized access attempts. Use multi-factor authentication for vSphere SSO.

For production, consider auditing tools like Falco for runtime security. Regularly rotate SSH keys by redeploying clusters if possible.

In terms of best practices, document your procedures, automate with scripts, and test in non-production environments first. This method scales well for multiple clusters, as you can parameterize the YAML.

Word count so far: 1,468

Real-World Applications and Conclusion

In practice, this technique is invaluable for debugging issues like node not ready states, storage attachments, or network misconfigurations in TKG clusters. For instance, during a recent outage in a financial services firm, admins used a PodVM to SSH and identify a misconfigured etcd, restoring service quickly.

In conclusion, SSH access to TKG cluster nodes via a PodVM exemplifies the power of integrating Kubernetes with virtualization. It provides secure, efficient access without compromising isolation. As cloud-native adoption grows, mastering such hybrid techniques will be key to operational excellence. With the steps outlined here, you’re equipped to handle advanced TKG management tasks effectively.

SSH Access to TKG Cluster Nodes via a PodVM

Introduction

Understanding TKG and PodVM in vSphere with Tanzu

Prerequisites for SSH Access

Step-by-Step Guide to SSH Access via PodVM

Step 1: Connect to the Supervisor Cluster

Step 2: Verify the SSH Secret

Step 3: Create a Registry Credential Secret (If Needed)

Step 5: Verify the Pod is Running

Step 6: Obtain the Target Node IP

Step 7: SSH into the Node via the PodVM

Step 8: Perform Operations and Exit

Step 9: Clean Up

Troubleshooting Common Issues

Best Practices and Security Considerations

Real-World Applications and Conclusion

Recent Posts