Canary control plane upgrades
Overview
Upgrading between Istio minor/major releases (for example, from Istio 1.13.x to 1.15.x) is a high-risk operation. The official Istio distribution is designed in a way that the upgrade occurs as a big one-time upgrade, making recovery difficult in case of unexpected errors.
To address this issue, Service Mesh Manager runs both versions of the Istio control plane on the upgraded cluster, and allows you to migrate your workloads gradually to the new Istio version.
kubectl get istiocontrolplanes -n istio-system
The output should be similar to:
NAME MODE NETWORK STATUS MESH EXPANSION EXPANSION GW IPS ERROR AGE
cp-v113x ACTIVE network1 Available true ["3.122.28.53","3.122.43.249"] 87m
cp-v115x ACTIVE network1 Available true ["3.122.31.252","18.195.79.209"] 66m
Here cp-v113x
is running Istio 1.13.x, while cp-v115x
is running Istio 1.15.x.
A special label on the namespaces specifies which Istio control plane should the proxies use in that namespace. In the following example the smm-demo
namespace is attached to the cp-v113x.istio-system
control plane (where the .istio-system
is the name of the namespace of the Istio control plane).
kubctl get ns smm-demo -o yaml
The output should be similar to:
apiVersion: v1
kind: Namespace
metadata:
...
labels:
istio.io/rev: cp-v113x.istio-system
name: smm-demo
spec:
finalizers:
- kubernetes
status:
phase: Active
Of course both cp-v113x
and cp-v115x
are able to discover services in all namespaces. This means that:
- Workloads can communicate with each other regardless which Istio control plane they are attached to.
- In case of an error, any namespace can be rolled back to use the previous version of the Istio control plane by simply changing the annotation
Upgrading between major/minor Istio versions
-
To upgrade Istio, first upgrade Service Mesh Manager. The upgrade will also update the validation rules, detecting any possible issues with the existing Istio Custom Resources.
-
Before starting the migration of the workloads to the new Istio control plane, check the Validation UI and fix any errors with your configuration.
-
After the upgrade has been completed, find the name of the new Istio control plane by running the following command:
kubectl get istiocontrolplanes -n istio-system
The output should be similar to:
NAME MODE NETWORK STATUS MESH EXPANSION EXPANSION GW IPS ERROR AGE cp-v113x ACTIVE network1 Available true ["3.122.28.53","3.122.43.249"] 87m cp-v115x ACTIVE network1 Available true ["3.122.31.252","18.195.79.209"] 66m
In this case the new Istio Control Plane is called
cp-v115x
which is running Istio 1.15.x. -
Migrate a namespace to the new Istio control plane. Complete the following steps.
-
Select a namespace, preferably one with the least impact on production traffic. Edit the
istio.io/rev
label on the namespace by running:kubectl label ns <your-namespace> istio.io/rev=cp-v115x.istio-system --overwrite
Expected output:
namespace/<your-namespace> labeled
-
Restart all
Controllers
(Deployments
,StatefulSets
, and so on) in the namespace. After the restart, the workloads in the namespace are attached to the new Istio control plane. For example, to restart the deployments in a namespace, you can run:kubectl rollout restart deployment -n <name-of-your-namespace>
-
Test your application to verify that it works with the new control plane as expected. In case of any issues, refer to the rollback section to roll back to the original Istio control plane.
-
-
Migrate your other namespaces.
-
After all of the applications has been migrated to the new control plane and you have verified that the applications work as expected, you can delete the old Istio control plane.
Delete the old Istio Control Plane
After you have verified that your applications work as expected with the new Istio control plane, you can delete the old Istio control plane by completing the following steps.
-
Open the Service Mesh Manager dashboard and navigate to the MAIN MENU > MESH page.
-
Verify that no Pods are attached to the old Istio control plane (the number of proxies for the old control plane should be 0).
-
Delete the old Istio control plane:
kubectl delete istiocontrolplanes -n istio-system cp-v113x
Note: Deleting the prometheus-smm-prometheus-x pod erases historic timeline data. To persist timeline data for Prometheus rollout, see Set up Persistent Volumes for Prometheus.
Roll back the data plane to the old control plane in case of issues
CAUTION:
Perform this step only if you have issues with your data plane pods, which were working with the old Istio control plane, and you deliberately want to move your workloads back to that control plane!-
If there is a problem and you want to roll the namespace back to the old control plane, set the istio.io/rev label on the namespace to point to the old Istio control plane, and restart the pod using the
kubectl rollout restart deployment
command:kubectl label ns <name-of-your-namespace-with-issues> istio.io/rev=cp-v113x.istio-system kubectl rollout restart deployment -n <name-of-your-namespace-with-issues>