Upgrade SMM - GitOps - multi-cluster
This document describes how to upgrade SMM and a business application.
CAUTION:
Do not push the secrets directly into the git repository, especially when it is a public repository. Argo CD provides solutions to keep secrets safe.Prerequisites
To complete this procedure, you need:
- A free registration for the Service Mesh Manager download page
- A Kubernetes cluster running Argo CD (called
management-cluster
in the examples). - Two Kubernetes clusters running the previous version of Service Mesh Manager (called
workload-cluster-1
andworkload-cluster-2
in the examples). It is assumed that Service Mesh Manager has been installed on these clusters as described in the Service Mesh Manager 1.10.0 documentation, and that the clusters meet the resource requirements of Service Mesh Manager version 1.11.0.
CAUTION:
Supported providers and Kubernetes versions
The cluster must run a Kubernetes version that Service Mesh Manager supports: Kubernetes 1.21, 1.22, 1.23, 1.24.
Service Mesh Manager is tested and known to work on the following Kubernetes providers:
- Amazon Elastic Kubernetes Service (Amazon EKS)
- Google Kubernetes Engine (GKE)
- Azure Kubernetes Service (AKS)
- On-premises installation of stock Kubernetes with load balancer support (and optionally PVCs for persistence)
Resource requirements
Make sure that your Kubernetes cluster has sufficient resources. The default installation (with Service Mesh Manager and demo application) requires the following amount of resources on the cluster:
Only Service Mesh Manager | Service Mesh Manager and Streaming Data Manager | |
---|---|---|
CPU | - 12 vCPU in total - 4 vCPU available for allocation per worker node (If you are testing on a cluster at a cloud provider, use nodes that have at least 4 CPUs, for example, c5.xlarge on AWS.) |
- 24 vCPU in total - 4 vCPU available for allocation per worker node (If you are testing on a cluster at a cloud provider, use nodes that have at least 4 CPUs, for example, c5.xlarge on AWS.) |
Memory | - 16 GB in total - 2 GB available for allocation per worker node |
- 36 GB in total - 2 GB available for allocation per worker node |
Storage | 12 GB of ephemeral storage on the Kubernetes worker nodes (for Traces and Metrics) | 12 GB of ephemeral storage on the Kubernetes worker nodes (for Traces and Metrics) |
These minimum requirements need to be available for allocation within your cluster, in addition to the requirements of any other loads running in your cluster (for example, DaemonSets and Kubernetes node-agents). If Kubernetes cannot allocate sufficient resources to Service Mesh Manager, some pods will remain in Pending state, and Service Mesh Manager will not function properly.
Enabling additional features, such as High Availability increases this value.
The default installation, when enough headroom is available in the cluster, should be able to support at least 150 running Pods
with the same amount of Services
. For setting up Service Mesh Manager for bigger workloads, see scaling Service Mesh Manager.
This document describes how to upgrade Service Mesh Manager version 1.10.0 to Service Mesh Manager version 1.11.0.
Set up the environment
-
Set the KUBECONFIG location and context name for the
management-cluster
cluster.MANAGEMENT_CLUSTER_KUBECONFIG=management_cluster_kubeconfig.yaml MANAGEMENT_CLUSTER_CONTEXT=management-cluster kubectl config --kubeconfig "${MANAGEMENT_CLUSTER_KUBECONFIG}" get-contexts "${MANAGEMENT_CLUSTER_CONTEXT}"
Expected output:
CURRENT NAME CLUSTER AUTHINFO NAMESPACE * management-cluster management-cluster
-
Set the KUBECONFIG location and context name for the
workload-cluster-1
cluster.WORKLOAD_CLUSTER_1_KUBECONFIG=workload_cluster_1_kubeconfig.yaml WORKLOAD_CLUSTER_1_CONTEXT=workload-cluster-1 kubectl config --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" get-contexts "${WORKLOAD_CLUSTER_1_CONTEXT}"
Expected output:
CURRENT NAME CLUSTER AUTHINFO NAMESPACE * workload-cluster-1 workload-cluster-1
Repeat this step for any additional workload clusters you want to use.
-
Make sure the
management-cluster
Kubernetes context is the current context.kubectl config --kubeconfig "${MANAGEMENT_CLUSTER_KUBECONFIG}" use-context "${MANAGEMENT_CLUSTER_CONTEXT}"
Expected output:
Switched to context "management-cluster".
Upgrade Service Mesh Manager
The high-level steps of the upgrade process are:
- Install the new IstioControlPlane
istio-cp-v115x
on the workload clusters. - Upgrade the
smm-operator
and thesmm-controlplane
. Thesmm-controlplane
will use the newistio-cp-v115x
IstioControlPlane, but the business applications (for example,demo-app
) will still use the oldistio-cp-v113x
control plane. - Upgrade the business applications (
demo-app
) to use the new control plane.
-
Remove the old version (1.10.0) of the
smm-operator
Helm chart.rm -rf charts/smm-operator
-
Pull the new version (1.11.0) of the
smm-operator
Helm chart and extract it into thecharts
folder.helm pull oci://registry.eticloud.io/smm-charts/smm-operator --destination ./charts/ --untar --version 1.11.0
-
Create the
istio-cp-v115x.yaml
file, and its overlays for the workload clusters.cat > manifests/istio-cp-v115x.yaml << EOF apiVersion: servicemesh.cisco.com/v1alpha1 kind: IstioControlPlane metadata: annotations: argocd.argoproj.io/sync-wave: "5" name: cp-v115x namespace: istio-system spec: containerImageConfiguration: imagePullPolicy: Always imagePullSecrets: - name: smm-pull-secret distribution: cisco istiod: deployment: env: - name: ISTIO_MULTIROOT_MESH value: "true" image: registry.eticloud.io/smm/istio-pilot:v1.15.3-bzc.0 k8sResourceOverlays: - groupVersionKind: group: apps kind: Deployment version: v1 objectKey: name: istiod-cp-v115x namespace: istio-system patches: - path: /spec/template/spec/containers/0/args/- type: replace value: --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_AES_256_GCM_SHA384,TLS_AES_128_GCM_SHA256 meshConfig: defaultConfig: envoyAccessLogService: address: smm-als.smm-system.svc.cluster.local:50600 tcpKeepalive: interval: 10s probes: 3 time: 10s tlsSettings: mode: ISTIO_MUTUAL holdApplicationUntilProxyStarts: true proxyMetadata: ISTIO_META_ALS_ENABLED: "true" PROXY_CONFIG_XDS_AGENT: "true" tracing: tlsSettings: mode: ISTIO_MUTUAL zipkin: address: smm-zipkin.smm-system.svc.cluster.local:59411 enableEnvoyAccessLogService: true enableTracing: true meshExpansion: enabled: true gateway: deployment: podMetadata: labels: app: istio-meshexpansion-gateway istio: meshexpansiongateway service: ports: - name: tcp-smm-als-tls port: 50600 protocol: TCP targetPort: 50600 - name: tcp-smm-zipkin-tls port: 59411 protocol: TCP targetPort: 59411 meshID: mesh1 mode: ACTIVE networkName: network1 proxy: image: registry.eticloud.io/smm/istio-proxyv2:v1.15.3-bzc.0 proxyInit: cni: daemonset: image: registry.eticloud.io/smm/istio-install-cni:v1.15.3-bzc.0 image: registry.eticloud.io/smm/istio-proxyv2:v1.15.3-bzc.0 sidecarInjector: deployment: image: registry.eticloud.io/smm/istio-sidecar-injector:v1.15.3-bzc.0 version: 1.15.3 EOF
For
workload-cluster-1
:cat > manifests/smm-controlplane/overlays/workload-cluster-1/istio-cp-v115x.yaml <<EOF apiVersion: servicemesh.cisco.com/v1alpha1 kind: IstioControlPlane metadata: annotations: argocd.argoproj.io/sync-wave: "5" name: cp-v115x namespace: istio-system spec: meshID: mesh1 mode: ACTIVE networkName: network1 EOF
For
workload-cluster-2
:cat > manifests/smm-controlplane/overlays/workload-cluster-2/istio-cp-v115x.yaml <<EOF apiVersion: servicemesh.cisco.com/v1alpha1 kind: IstioControlPlane metadata: annotations: argocd.argoproj.io/sync-wave: "5" name: cp-v115x namespace: istio-system spec: meshID: mesh1 mode: PASSIVE networkName: workload-cluster-2 EOF
-
Update the
control-plane.yaml
files to use theistio-cp-v115x
IstioControlPlane.cat > manifests/smm-controlplane/base/control-plane.yaml << EOF apiVersion: smm.cisco.com/v1alpha1 kind: ControlPlane metadata: annotations: argocd.argoproj.io/sync-wave: "10" name: smm spec: certManager: namespace: cert-manager clusterName: CLUSTER-NAME clusterRegistry: enabled: true namespace: cluster-registry log: {} meshManager: enabled: true istio: enabled: true istioCRRef: name: cp-v115x namespace: istio-system operators: namespace: smm-system namespace: smm-system nodeExporter: enabled: true namespace: smm-system psp: enabled: false rbac: enabled: true oneEye: {} registryAccess: enabled: true imagePullSecretsController: {} namespace: smm-registry-access pullSecrets: - name: smm-registry.eticloud.io-pull-secret namespace: smm-registry-access repositoryOverride: host: registry.eticloud.io prefix: smm role: active smm: als: enabled: true log: {} application: enabled: true log: {} auth: mode: impersonation certManager: enabled: true enabled: true federationGateway: enabled: true name: smm service: enabled: true name: smm-federation-gateway port: 80 federationGatewayOperator: enabled: true impersonation: enabled: true istio: revision: cp-v115x.istio-system leo: enabled: true log: {} log: {} namespace: smm-system prometheus: enabled: true replicas: 1 prometheusOperator: {} releaseName: smm role: active sdm: enabled: false sre: enabled: true useIstioResources: true EOF
For
workload-cluster-1
:cat > manifests/smm-controlplane/overlays/workload-cluster-1/control-plane.yaml << EOF apiVersion: smm.cisco.com/v1alpha1 kind: ControlPlane metadata: name: smm spec: clusterName: workload-cluster-1 certManager: enabled: true smm: exposeDashboard: meshGateway: enabled: true auth: forceUnsecureCookies: true mode: anonymous EOF
For
workload-cluster-2
:cat > manifests/smm-controlplane/overlays/workload-cluster-2/control-plane.yaml << EOF apiVersion: smm.cisco.com/v1alpha1 kind: ControlPlane metadata: name: smm spec: clusterName: workload-cluster-2 role: passive smm: als: enabled: true log: {} application: enabled: false log: {} auth: mode: impersonation certManager: enabled: false enabled: true federationGateway: enabled: false name: smm service: enabled: true name: smm-federation-gateway port: 80 federationGatewayOperator: enabled: true grafana: enabled: false impersonation: enabled: true istio: revision: cp-v115x.istio-system kubestatemetrics: enabled: true leo: enabled: false log: {} log: {} namespace: smm-system prometheus: enabled: true replicas: 1 retentionTime: 8h prometheusOperator: {} releaseName: smm role: passive sdm: enabled: false sre: enabled: false tracing: enabled: true useIstioResources: false web: enabled: false EOF
-
Create the customization files.
cat > manifests/smm-controlplane/base/kustomization.yaml << EOF apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization metadata: name: cluster-secrets resources: - cert-manager-namespace.yaml - istio-system-namespace.yaml - istio-cp-v113x.yaml - istio-cp-v115x.yaml - control-plane.yaml EOF
cat > manifests/smm-controlplane/overlays/workload-cluster-1/kustomization.yaml << EOF apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization bases: - ../../base patches: - istio-cp-v113x.yaml - istio-cp-v115x.yaml - control-plane.yaml EOF
cat > manifests/smm-controlplane/overlays/workload-cluster-2/kustomization.yaml << EOF apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization bases: - ../../base patches: - istio-cp-v113x.yaml - istio-cp-v115x.yaml - control-plane.yaml EOF
-
If you are upgrading Service Mesh Manager 1.10.0 to 1.11.0: complete this step, otherwise skip to the next step.
Apply the following patch to your Service Mesh Manager v1.10.0 cluster to modify the spec field of a job that cleans up the
cert-manager-startupapicheck
job after 100 seconds when completed. If you skip this step, you might see acert-manager-startupapicheck
related error during the upgrade.kubectl patch jobs.batch -n cert-manager cert-manager-startupapicheck -p '{"spec":{"ttlSecondsAfterFinished":100}}' --type=merge --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}"
For
workload-cluster-2
:kubectl patch jobs.batch -n cert-manager cert-manager-startupapicheck -p '{"spec":{"ttlSecondsAfterFinished":100}}' --type=merge --kubeconfig "${WORKLOAD_CLUSTER_2_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_2_CONTEXT}"
-
Commit and push the changes to the Git repository.
git add . git commit -m "upgrade SMM to 1.11.0" git push
-
Wait a few minutes, then check the new IstioControlPlane.
kubectl -n istio-system get istiocontrolplanes.servicemesh.cisco.com --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}"
Expected output:
NAME MODE NETWORK STATUS MESH EXPANSION EXPANSION GW IPS ERROR AGE cp-v113x ACTIVE network1 Available true ["52.208.63.154","54.155.81.181"] 61m cp-v115x ACTIVE network1 Available true ["52.211.44.215","63.32.253.55"] 11m
For
workload-cluster-2
:kubectl -n istio-system get istiocontrolplanes.servicemesh.cisco.com --kubeconfig "${WORKLOAD_CLUSTER_2_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_2_CONTEXT}"
-
Open the Service Mesh Manager dashboard.
On the MENU > OVERVIEW page everything should be fine, except for some validation issues. The validation issues show that the business application (
demo-app
) is behind thesmm-control-plane
and the business application should be updated to use the latest IstioControlPlane.You can see the two IstioControlPlanes on the MENU > MESH page. The
smm-control-plane
is using thecp-v115x.istio-system
IstioControlPlane and thedemo-app
is using thecp-v113x.istio-system
IstioControlPlane.
Upgrade Demo application
The demo-app
application is just for demonstration purposes, but it represents your business applications. To upgrade a business application, set the istio.io/rev
label of the business application’s namespace to the target IstioControlPlane. Complete the following steps to update the demo-app
to use the cp-v115x.istio-system
IstioControlPlane and set the istio.io/rev
label accordingly.
-
Update the label on you application namespace (for the demo application, that’s the
demo-app-namespace.yaml
file).cat > manifests/demo-app/base/demo-app-namespace.yaml << EOF apiVersion: v1 kind: Namespace metadata: labels: app.kubernetes.io/instance: smm-demo app.kubernetes.io/name: smm-demo app.kubernetes.io/part-of: smm-demo app.kubernetes.io/version: 0.1.4 istio.io/rev: cp-v115x.istio-system name: smm-demo EOF
-
Update the
demo-app.yaml
files.For
workload-cluster-1
:cat > manifests/demo-app/overlays/workload-cluster-1/demo-app.yaml << EOF apiVersion: smm.cisco.com/v1alpha1 kind: DemoApplication metadata: name: smm-demo namespace: smm-demo spec: autoscaling: enabled: true controlPlaneRef: name: smm deployIstioResources: true deploySLOResources: true enabled: true enabledComponents: - frontpage - catalog - bookings - postgresql istio: revision: cp-v115x.istio-system load: enabled: true maxRPS: 30 minRPS: 10 swingPeriod: 1380000000000 replicas: 1 resources: limits: cpu: "2" memory: 192Mi requests: cpu: 40m memory: 64Mi EOF
For
workload-cluster-2
:cat > manifests/demo-app/overlays/workload-cluster-2/demo-app.yaml << EOF apiVersion: smm.cisco.com/v1alpha1 kind: DemoApplication metadata: name: smm-demo namespace: smm-demo spec: autoscaling: enabled: true controlPlaneRef: name: smm deployIstioResources: false deploySLOResources: false enabled: true enabledComponents: - movies - payments - notifications - analytics - database - mysql istio: revision: cp-v115x.istio-system replicas: 1 resources: limits: cpu: "2" memory: 192Mi requests: cpu: 40m memory: 64Mi EOF
-
Commit and push the changes to the Git repository.
git add . git commit -m "upgrade demo-app to istio-cp-v115x" git push
-
Wait a few minutes, then check the IstioControlPlane of the
demo-app
.kubectl get ns smm-demo -o=jsonpath='{.metadata.labels.istio\.io/rev}' --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}"
Expected output: the
demo-app
is using the newistio-cp-v115x
IstioControlPlane.cp-v115x.istio-system
For
workload-cluster-2
:kubectl get ns smm-demo -o=jsonpath='{.metadata.labels.istio\.io/rev}' --kubeconfig "${WORKLOAD_CLUSTER_2_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_2_CONTEXT}"
Expected output: the
demo-app
is using the newistio-cp-v115x
IstioControlPlane.cp-v115x.istio-system
-
Check the dashboard.