Istio resources

When adding an external workload to the mesh, there are two crucial Istio resources that are used.

  • WorkloadGroup needs to be created in the namespace where the machine will be attached to. This object represents a group of machines serving the same service. This is analogous to the Kubernetes concept of a Deployment.
  • Each virtual machine attached to the mesh will be represented by a WorkloadEntry object in the workload’s namespace. This is analogous to the Pod concept of Kubernetes.

The VM attachment flow used in Service Mesh Manager relies on the PILOT_ENABLE_WORKLOAD_ENTRY_AUTOREGISTRATION and PILOT_ENABLE_WORKLOAD_ENTRY_HEALTHCHECKS features.

Autoregistration

To understand the autoregistration feature first take a look at a WorkloadGroup resource:

  apiVersion: networking.istio.io/v1alpha3
  kind: WorkloadGroup
  metadata:
    labels:
      app: analytics
      version: v1
    name: analytics-v1
    namespace: smm-demo
  spec:
    metadata:
      labels:
        app: analytics
        version: v1
    template:
      ports:
        http: 8080
      serviceAccount: default

If autoregistration is enabled, the Istio pilot-agent running on the virtual machine connects to the istio-meshexpansion-gateway in the istio-system namespace and presents the specified ServiceAccount’s bearer token (and some registration details that Service Mesh Manager sets automatically) to authenticate itself to the Istio control plane. If the authentication is successful the Istio control plane creates a WorkloadEntry in the cluster, like this:

apiVersion: networking.istio.io/v1beta1
kind: WorkloadEntry
metadata:
  annotations:
    istio.io/autoRegistrationGroup: analytics-v1
    istio.io/connectedAt: "2022-03-31T06:52:14.739292073Z"
    istio.io/workloadController: istiod-cp-v115x-df9f5d556-9kvqs
  labels:
    app: analytics
    hostname: ip-172-31-22-226
    istio.io/rev: cp-v115x.istio-system
    service.istio.io/canonical-name: analytics
    service.istio.io/canonical-revision: v1
    topology.istio.io/network: vm-network-1
  name: analytics-v1-3.67.91.181-vm-network-1
  namespace: smm-demo
  ownerReferences:
  - apiVersion: networking.istio.io/v1alpha3
    controller: true
    kind: WorkloadGroup
    name: analytics-v1
    uid: d01777d5-4294-44e7-a311-3596c2f63bb1
spec:
  address: 1.2.3.4
  labels:
    app: analytics
    hostname: ip-172-31-22-226
    istio.io/rev: cp-v115x.istio-system
    service.istio.io/canonical-name: analytics
    service.istio.io/canonical-revision: v1
    topology.istio.io/network: vm-network-1
  locality: eu-central-1/eu-central-1a
  network: vm-network-1
  serviceAccount: default

Any attached machine that has a corresponding WorkloadEntry resource behaves as a Kubernetes workload, and has a set of labels assigned that could be used by Services to match the machine.

For example, the following Service will route traffic to the virtual machine due to the .spec.selector matching the WorkloadEntry’s labels (.metadata.labels):

apiVersion: v1
kind: Service
metadata:
  name: analytics
  namespace: smm-demo
spec:
  ports:
  - name: http
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    app: analytics
  sessionAffinity: None
  type: ClusterIP

Autoregistration is also crucial when removing Workloads from the mesh. In case the istio sidecar process is stopped on the host, the Istio control plane automatically removes the related WorkloadEntry custom resource. This can be used to temporarily remove a VM for maintenance or troubleshooting purposes from the mesh, but it also ensures that if Istio is uninstalled from the node, it automatically de-registers itself without needing to manually update any Kubernetes resources.

Health checks

The PILOT_ENABLE_WORKLOAD_ENTRY_HEALTHCHECKS setting provided by Istio allows health checks to be defined for VMs. If the health check fails, Istio will not route any traffic to the workload.

In case of Service Mesh Manager, the health checks are defined in the WorkloadGroup resource and our agent running on the VM ensures that Istio uses that setting. For example, the following WorkloadGroup defines an HTTP health check:

apiVersion: networking.istio.io/v1alpha3
kind: WorkloadGroup
metadata:
  labels:
    app: analytics
    version: v1
  name: analytics-v1
  namespace: smm-demo
spec:
  metadata:
    labels:
      app: analytics
      version: v1
  probe:
    httpGet:
      host: 127.0.0.1
      path: /
      port: 8080
      scheme: HTTP
  template:
    network: vm-network-1
    serviceAccount: default

The .spec.probe’s definition is the same as the Probe object of the official Kubernetes API. The defined probe is analogous to the liveliness probe of the Pod: it is checked constantly while Istio is running on the machine. The only difference is that Istio will not restart the VM if the probe fails, instead it stops routing any traffic to the WorkloadEntry.

You can query the status of the health checks from Kubernetes by checking the machine’s WorkloadEntry:

apiVersion: networking.istio.io/v1beta1
kind: WorkloadEntry
metadata:
  annotations:
    istio.io/autoRegistrationGroup: analytics-v1
    istio.io/connectedAt: "2022-03-31T06:52:14.739292073Z"
    istio.io/workloadController: istiod-cp-v115x-df9f5d556-9kvqs
  labels:
    app: analytics
    hostname: ip-172-31-22-226
    istio.io/rev: cp-v115x.istio-system
    service.istio.io/canonical-name: analytics
    service.istio.io/canonical-revision: v1
    topology.istio.io/network: vm-network-1
  name: analytics-v1-3.67.91.181-vm-network-1
  namespace: smm-demo
  ownerReferences:
  - apiVersion: networking.istio.io/v1alpha3
    controller: true
    kind: WorkloadGroup
    name: analytics-v1
    uid: d01777d5-4294-44e7-a311-3596c2f63bb1
spec:
  address: 1.2.3.4
  labels:
    app: analytics
    hostname: ip-172-31-22-226
    istio.io/rev: cp-v115x.istio-system
    service.istio.io/canonical-name: analytics
    service.istio.io/canonical-revision: v1
    topology.istio.io/network: vm-network-1
  locality: eu-central-1/eu-central-1a
  network: vm-network-1
  serviceAccount: default
status:
  conditions:
  - lastProbeTime: "2022-03-31T07:23:07.236758604Z"
    lastTransitionTime: "2022-03-31T07:23:07.236759090Z"
    status: "True"
    type: Healthy

In the status field of the custom resource, the conditions array contains an entry with the type field set to Healthy. If the same objects status is set to True, then the machine is considered healthy and will receive traffic.