Canary control plane upgrades

Overview

Upgrading between Istio minor/major releases (for example, from Istio 1.13.x to 1.15.x) is a high-risk operation. The official Istio distribution is designed in a way that the upgrade occurs as a big one-time upgrade, making recovery difficult in case of unexpected errors.

To address this issue, Service Mesh Manager runs both versions of the Istio control plane on the upgraded cluster, and allows you to migrate your workloads gradually to the new Istio version.

kubectl get istiocontrolplanes -n istio-system

The output should be similar to:

NAME       MODE     NETWORK    STATUS      MESH EXPANSION   EXPANSION GW IPS                   ERROR   AGE
cp-v113x   ACTIVE   network1   Available   true             ["3.122.28.53","3.122.43.249"]             87m
cp-v115x   ACTIVE   network1   Available   true             ["3.122.31.252","18.195.79.209"]           66m

Here cp-v113x is running Istio 1.13.x, while cp-v115x is running Istio 1.15.x.

A special label on the namespaces specifies which Istio control plane should the proxies use in that namespace. In the following example the smm-demo namespace is attached to the cp-v113x.istio-system control plane (where the .istio-system is the name of the namespace of the Istio control plane).

kubctl get ns smm-demo -o yaml

The output should be similar to:

apiVersion: v1
kind: Namespace
metadata:
  ...
  labels:
    istio.io/rev: cp-v113x.istio-system
  name: smm-demo
spec:
  finalizers:
  - kubernetes
status:
  phase: Active

Of course both cp-v113x and cp-v115x are able to discover services in all namespaces. This means that:

  • Workloads can communicate with each other regardless which Istio control plane they are attached to.
  • In case of an error, any namespace can be rolled back to use the previous version of the Istio control plane by simply changing the annotation

Upgrading between major/minor Istio versions

  1. To upgrade Istio, first upgrade Service Mesh Manager. The upgrade will also update the validation rules, detecting any possible issues with the existing Istio Custom Resources.

  2. Before starting the migration of the workloads to the new Istio control plane, check the Validation UI and fix any errors with your configuration.

  3. After the upgrade has been completed, find the name of the new Istio control plane by running the following command:

    kubectl get istiocontrolplanes -n istio-system
    

    The output should be similar to:

    NAME       MODE     NETWORK    STATUS      MESH EXPANSION   EXPANSION GW IPS                   ERROR   AGE
    cp-v113x   ACTIVE   network1   Available   true             ["3.122.28.53","3.122.43.249"]             87m
    cp-v115x   ACTIVE   network1   Available   true             ["3.122.31.252","18.195.79.209"]           66m
    

    In this case the new Istio Control Plane is called cp-v115x which is running Istio 1.15.x.

  4. Migrate a namespace to the new Istio control plane. Complete the following steps.

    1. Select a namespace, preferably one with the least impact on production traffic. Edit the istio.io/rev label on the namespace by running:

      kubectl label ns <your-namespace> istio.io/rev=cp-v115x.istio-system --overwrite
      

      Expected output:

      namespace/<your-namespace> labeled
      
    2. Restart all Controllers (Deployments, StatefulSets, and so on) in the namespace. After the restart, the workloads in the namespace are attached to the new Istio control plane. For example, to restart the deployments in a namespace, you can run:

      kubectl rollout restart deployment -n <name-of-your-namespace>
      
    3. Test your application to verify that it works with the new control plane as expected. In case of any issues, refer to the rollback section to roll back to the original Istio control plane.

  5. Migrate your other namespaces.

  6. After all of the applications has been migrated to the new control plane and you have verified that the applications work as expected, you can delete the old Istio control plane.

Delete the old Istio Control Plane

After you have verified that your applications work as expected with the new Istio control plane, you can delete the old Istio control plane by completing the following steps.

  1. Open the Service Mesh Manager dashboard and navigate to the MAIN MENU > MESH page.

  2. Verify that no Pods are attached to the old Istio control plane (the number of proxies for the old control plane should be 0).

  3. Delete the old Istio control plane:

    kubectl delete istiocontrolplanes -n istio-system cp-v113x
    

Note: Deleting the prometheus-smm-prometheus-x pod erases historic timeline data. To persist timeline data for Prometheus rollout, see Set up Persistent Volumes for Prometheus.

Roll back the data plane to the old control plane in case of issues

CAUTION:

Perform this step only if you have issues with your data plane pods, which were working with the old Istio control plane, and you deliberately want to move your workloads back to that control plane!
  1. If there is a problem and you want to roll the namespace back to the old control plane, set the istio.io/rev label on the namespace to point to the old Istio control plane, and restart the pod using the kubectl rollout restart deployment command:

    kubectl label ns <name-of-your-namespace-with-issues> istio.io/rev=cp-v113x.istio-system
    kubectl rollout restart deployment -n <name-of-your-namespace-with-issues>