Upgrade SMM - GitOps - multi-cluster

This document describes how to upgrade SMM and a business application.

CAUTION:

Do not push the secrets directly into the git repository, especially when it is a public repository. Argo CD provides solutions to keep secrets safe.

Prerequisites

To complete this procedure, you need:

  • A free registration for the Service Mesh Manager download page
  • A Kubernetes cluster running Argo CD (called management-cluster in the examples).
  • Two Kubernetes clusters running the previous version of Service Mesh Manager (called workload-cluster-1 and workload-cluster-2 in the examples). It is assumed that Service Mesh Manager has been installed on these clusters as described in the Service Mesh Manager 1.10.0 documentation, and that the clusters meet the resource requirements of Service Mesh Manager version 1.11.0.

CAUTION:

Supported providers and Kubernetes versions

The cluster must run a Kubernetes version that Service Mesh Manager supports: Kubernetes 1.21, 1.22, 1.23, 1.24.

Service Mesh Manager is tested and known to work on the following Kubernetes providers:

  • Amazon Elastic Kubernetes Service (Amazon EKS)
  • Google Kubernetes Engine (GKE)
  • Azure Kubernetes Service (AKS)
  • On-premises installation of stock Kubernetes with load balancer support (and optionally PVCs for persistence)

Resource requirements

Make sure that your Kubernetes cluster has sufficient resources. The default installation (with Service Mesh Manager and demo application) requires the following amount of resources on the cluster:

Only Service Mesh Manager Service Mesh Manager and Streaming Data Manager
CPU - 12 vCPU in total
- 4 vCPU available for allocation per worker node (If you are testing on a cluster at a cloud provider, use nodes that have at least 4 CPUs, for example, c5.xlarge on AWS.)
- 24 vCPU in total
- 4 vCPU available for allocation per worker node (If you are testing on a cluster at a cloud provider, use nodes that have at least 4 CPUs, for example, c5.xlarge on AWS.)
Memory - 16 GB in total
- 2 GB available for allocation per worker node
- 36 GB in total
- 2 GB available for allocation per worker node
Storage 12 GB of ephemeral storage on the Kubernetes worker nodes (for Traces and Metrics) 12 GB of ephemeral storage on the Kubernetes worker nodes (for Traces and Metrics)

These minimum requirements need to be available for allocation within your cluster, in addition to the requirements of any other loads running in your cluster (for example, DaemonSets and Kubernetes node-agents). If Kubernetes cannot allocate sufficient resources to Service Mesh Manager, some pods will remain in Pending state, and Service Mesh Manager will not function properly.

Enabling additional features, such as High Availability increases this value.

The default installation, when enough headroom is available in the cluster, should be able to support at least 150 running Pods with the same amount of Services. For setting up Service Mesh Manager for bigger workloads, see scaling Service Mesh Manager.

This document describes how to upgrade Service Mesh Manager version 1.10.0 to Service Mesh Manager version 1.11.0.

Set up the environment

  1. Set the KUBECONFIG location and context name for the management-cluster cluster.

    MANAGEMENT_CLUSTER_KUBECONFIG=management_cluster_kubeconfig.yaml
    MANAGEMENT_CLUSTER_CONTEXT=management-cluster
    kubectl config --kubeconfig "${MANAGEMENT_CLUSTER_KUBECONFIG}" get-contexts "${MANAGEMENT_CLUSTER_CONTEXT}"
    

    Expected output:

    CURRENT   NAME                 CLUSTER              AUTHINFO   NAMESPACE
    *         management-cluster   management-cluster
    
  2. Set the KUBECONFIG location and context name for the workload-cluster-1 cluster.

    WORKLOAD_CLUSTER_1_KUBECONFIG=workload_cluster_1_kubeconfig.yaml
    WORKLOAD_CLUSTER_1_CONTEXT=workload-cluster-1
    kubectl config --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" get-contexts "${WORKLOAD_CLUSTER_1_CONTEXT}"
    

    Expected output:

    CURRENT   NAME                 CLUSTER              AUTHINFO                                          NAMESPACE
    *         workload-cluster-1   workload-cluster-1
    

    Repeat this step for any additional workload clusters you want to use.

  3. Make sure the management-cluster Kubernetes context is the current context.

    kubectl config --kubeconfig "${MANAGEMENT_CLUSTER_KUBECONFIG}" use-context "${MANAGEMENT_CLUSTER_CONTEXT}"
    

    Expected output:

    Switched to context "management-cluster".
    

Upgrade Service Mesh Manager

The high-level steps of the upgrade process are:

  • Install the new IstioControlPlane istio-cp-v115x on the workload clusters.
  • Upgrade the smm-operator and the smm-controlplane. The smm-controlplane will use the new istio-cp-v115x IstioControlPlane, but the business applications (for example, demo-app) will still use the old istio-cp-v113x control plane.
  • Upgrade the business applications (demo-app) to use the new control plane.
  1. Remove the old version (1.10.0) of the smm-operator Helm chart.

    rm -rf charts/smm-operator
    
  2. Pull the new version (1.11.0) of the smm-operator Helm chart and extract it into the charts folder.

    helm pull oci://registry.eticloud.io/smm-charts/smm-operator --destination ./charts/ --untar --version 1.11.0
    
  3. Create the istio-cp-v115x.yaml file, and its overlays for the workload clusters.

    cat > manifests/istio-cp-v115x.yaml << EOF
    apiVersion: servicemesh.cisco.com/v1alpha1
    kind: IstioControlPlane
    metadata:
      annotations:
        argocd.argoproj.io/sync-wave: "5"
      name: cp-v115x
      namespace: istio-system
    spec:
      containerImageConfiguration:
        imagePullPolicy: Always
        imagePullSecrets:
        - name: smm-pull-secret
      distribution: cisco
      istiod:
        deployment:
          env:
          - name: ISTIO_MULTIROOT_MESH
            value: "true"
          image: registry.eticloud.io/smm/istio-pilot:v1.15.3-bzc.0
      k8sResourceOverlays:
      - groupVersionKind:
          group: apps
          kind: Deployment
          version: v1
        objectKey:
          name: istiod-cp-v115x
          namespace: istio-system
        patches:
        - path: /spec/template/spec/containers/0/args/-
          type: replace
          value: --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_AES_256_GCM_SHA384,TLS_AES_128_GCM_SHA256
      meshConfig:
        defaultConfig:
          envoyAccessLogService:
            address: smm-als.smm-system.svc.cluster.local:50600
            tcpKeepalive:
              interval: 10s
              probes: 3
              time: 10s
            tlsSettings:
              mode: ISTIO_MUTUAL
          holdApplicationUntilProxyStarts: true
          proxyMetadata:
            ISTIO_META_ALS_ENABLED: "true"
            PROXY_CONFIG_XDS_AGENT: "true"
          tracing:
            tlsSettings:
              mode: ISTIO_MUTUAL
            zipkin:
              address: smm-zipkin.smm-system.svc.cluster.local:59411
        enableEnvoyAccessLogService: true
        enableTracing: true
      meshExpansion:
        enabled: true
        gateway:
          deployment:
            podMetadata:
              labels:
                app: istio-meshexpansion-gateway
                istio: meshexpansiongateway
          service:
            ports:
            - name: tcp-smm-als-tls
              port: 50600
              protocol: TCP
              targetPort: 50600
            - name: tcp-smm-zipkin-tls
              port: 59411
              protocol: TCP
              targetPort: 59411
      meshID: mesh1
      mode: ACTIVE
      networkName: network1
      proxy:
        image: registry.eticloud.io/smm/istio-proxyv2:v1.15.3-bzc.0
      proxyInit:
        cni:
          daemonset:
            image: registry.eticloud.io/smm/istio-install-cni:v1.15.3-bzc.0
        image: registry.eticloud.io/smm/istio-proxyv2:v1.15.3-bzc.0
      sidecarInjector:
        deployment:
          image: registry.eticloud.io/smm/istio-sidecar-injector:v1.15.3-bzc.0
      version: 1.15.3
    EOF
    

    For workload-cluster-1:

    cat > manifests/smm-controlplane/overlays/workload-cluster-1/istio-cp-v115x.yaml <<EOF
    apiVersion: servicemesh.cisco.com/v1alpha1
    kind: IstioControlPlane
    metadata:
      annotations:
        argocd.argoproj.io/sync-wave: "5"
      name: cp-v115x
      namespace: istio-system
    spec:
      meshID: mesh1
      mode: ACTIVE
      networkName: network1
    EOF
    

    For workload-cluster-2:

    cat > manifests/smm-controlplane/overlays/workload-cluster-2/istio-cp-v115x.yaml <<EOF
    apiVersion: servicemesh.cisco.com/v1alpha1
    kind: IstioControlPlane
    metadata:
      annotations:
        argocd.argoproj.io/sync-wave: "5"
      name: cp-v115x
      namespace: istio-system
    spec:
      meshID: mesh1
      mode: PASSIVE
      networkName: workload-cluster-2
    EOF
    
  4. Update the control-plane.yaml files to use the istio-cp-v115x IstioControlPlane.

    cat > manifests/smm-controlplane/base/control-plane.yaml << EOF
    apiVersion: smm.cisco.com/v1alpha1
    kind: ControlPlane
    metadata:
      annotations:
        argocd.argoproj.io/sync-wave: "10"
      name: smm
    spec:
      certManager:
        namespace: cert-manager
      clusterName: CLUSTER-NAME
      clusterRegistry:
        enabled: true
        namespace: cluster-registry
      log: {}
      meshManager:
        enabled: true
        istio:
          enabled: true
          istioCRRef:
            name: cp-v115x
            namespace: istio-system
          operators:
            namespace: smm-system
        namespace: smm-system
      nodeExporter:
        enabled: true
        namespace: smm-system
        psp:
          enabled: false
        rbac:
          enabled: true
      oneEye: {}
      registryAccess:
        enabled: true
        imagePullSecretsController: {}
        namespace: smm-registry-access
        pullSecrets:
        - name: smm-registry.eticloud.io-pull-secret
          namespace: smm-registry-access
      repositoryOverride:
        host: registry.eticloud.io
        prefix: smm
      role: active
      smm:
        als:
          enabled: true
          log: {}
        application:
          enabled: true
          log: {}
        auth:
          mode: impersonation
        certManager:
          enabled: true
        enabled: true
        federationGateway:
          enabled: true
          name: smm
          service:
            enabled: true
            name: smm-federation-gateway
            port: 80
        federationGatewayOperator:
          enabled: true
        impersonation:
          enabled: true
        istio:
          revision: cp-v115x.istio-system
        leo:
          enabled: true
          log: {}
        log: {}
        namespace: smm-system
        prometheus:
          enabled: true
          replicas: 1
        prometheusOperator: {}
        releaseName: smm
        role: active
        sdm:
          enabled: false
        sre:
          enabled: true
        useIstioResources: true
    EOF
    

    For workload-cluster-1:

    cat > manifests/smm-controlplane/overlays/workload-cluster-1/control-plane.yaml << EOF
    apiVersion: smm.cisco.com/v1alpha1
    kind: ControlPlane
    metadata:
      name: smm
    spec:
      clusterName: workload-cluster-1
      certManager:
        enabled: true
      smm:
        exposeDashboard:
          meshGateway:
            enabled: true
        auth:
          forceUnsecureCookies: true
          mode: anonymous
    EOF
    

    For workload-cluster-2:

    cat > manifests/smm-controlplane/overlays/workload-cluster-2/control-plane.yaml << EOF
    apiVersion: smm.cisco.com/v1alpha1
    kind: ControlPlane
    metadata:
      name: smm
    spec:
      clusterName: workload-cluster-2
      role: passive
      smm:
        als:
          enabled: true
          log: {}
        application:
          enabled: false
          log: {}
        auth:
          mode: impersonation
        certManager:
          enabled: false
        enabled: true
        federationGateway:
          enabled: false
          name: smm
          service:
            enabled: true
            name: smm-federation-gateway
            port: 80
        federationGatewayOperator:
          enabled: true
        grafana:
          enabled: false
        impersonation:
          enabled: true
        istio:
          revision: cp-v115x.istio-system
        kubestatemetrics:
          enabled: true
        leo:
          enabled: false
          log: {}
        log: {}
        namespace: smm-system
        prometheus:
          enabled: true
          replicas: 1
          retentionTime: 8h
        prometheusOperator: {}
        releaseName: smm
        role: passive
        sdm:
          enabled: false
        sre:
          enabled: false
        tracing:
          enabled: true
        useIstioResources: false
        web:
          enabled: false
    EOF
    
  5. Create the customization files.

    cat > manifests/smm-controlplane/base/kustomization.yaml << EOF
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    metadata:
      name: cluster-secrets
    
    
    resources:
    - cert-manager-namespace.yaml
    - istio-system-namespace.yaml
    - istio-cp-v113x.yaml
    - istio-cp-v115x.yaml
    - control-plane.yaml
    EOF
    
    cat > manifests/smm-controlplane/overlays/workload-cluster-1/kustomization.yaml << EOF
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    
    bases:
      - ../../base
    
    patches:
      - istio-cp-v113x.yaml
      - istio-cp-v115x.yaml
      - control-plane.yaml
    EOF
    
    cat > manifests/smm-controlplane/overlays/workload-cluster-2/kustomization.yaml << EOF
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    
    bases:
      - ../../base
    
    patches:
      - istio-cp-v113x.yaml
      - istio-cp-v115x.yaml
      - control-plane.yaml
    EOF
    
  6. If you are upgrading Service Mesh Manager 1.10.0 to 1.11.0: complete this step, otherwise skip to the next step.

    Apply the following patch to your Service Mesh Manager v1.10.0 cluster to modify the spec field of a job that cleans up the cert-manager-startupapicheck job after 100 seconds when completed. If you skip this step, you might see a cert-manager-startupapicheck related error during the upgrade.

    kubectl patch jobs.batch -n cert-manager cert-manager-startupapicheck -p '{"spec":{"ttlSecondsAfterFinished":100}}' --type=merge --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}"
    

    For workload-cluster-2:

    kubectl patch jobs.batch -n cert-manager cert-manager-startupapicheck -p '{"spec":{"ttlSecondsAfterFinished":100}}' --type=merge --kubeconfig "${WORKLOAD_CLUSTER_2_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_2_CONTEXT}"
    
  7. Commit and push the changes to the Git repository.

    git add .
    
    git commit -m "upgrade SMM to 1.11.0"
    
    git push
    
  8. Wait a few minutes, then check the new IstioControlPlane.

    kubectl -n istio-system get istiocontrolplanes.servicemesh.cisco.com --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}" 
    

    Expected output:

    NAME       MODE     NETWORK    STATUS      MESH EXPANSION   EXPANSION GW IPS                    ERROR   AGE
    cp-v113x   ACTIVE   network1   Available   true             ["52.208.63.154","54.155.81.181"]           61m
    cp-v115x   ACTIVE   network1   Available   true             ["52.211.44.215","63.32.253.55"]            11m
    

    For workload-cluster-2:

    kubectl -n istio-system get istiocontrolplanes.servicemesh.cisco.com --kubeconfig "${WORKLOAD_CLUSTER_2_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_2_CONTEXT}" 
    
  9. Open the Service Mesh Manager dashboard.

    On the MENU > OVERVIEW page everything should be fine, except for some validation issues. The validation issues show that the business application (demo-app) is behind the smm-control-plane and the business application should be updated to use the latest IstioControlPlane.

    Service Mesh Manager Overview Service Mesh Manager Overview

    Service Mesh Manager Overview Service Mesh Manager Overview

    You can see the two IstioControlPlanes on the MENU > MESH page. The smm-control-plane is using the cp-v115x.istio-system IstioControlPlane and the demo-app is using the cp-v113x.istio-system IstioControlPlane.

    Service Mesh Manager Mesh Service Mesh Manager Mesh

Upgrade Demo application

The demo-app application is just for demonstration purposes, but it represents your business applications. To upgrade a business application, set the istio.io/rev label of the business application’s namespace to the target IstioControlPlane. Complete the following steps to update the demo-app to use the cp-v115x.istio-system IstioControlPlane and set the istio.io/rev label accordingly.

  1. Update the label on you application namespace (for the demo application, that’s the demo-app-namespace.yaml file).

    cat > manifests/demo-app/base/demo-app-namespace.yaml << EOF
    apiVersion: v1
    kind: Namespace
    metadata:
      labels:
        app.kubernetes.io/instance: smm-demo
        app.kubernetes.io/name: smm-demo
        app.kubernetes.io/part-of: smm-demo
        app.kubernetes.io/version: 0.1.4
        istio.io/rev: cp-v115x.istio-system
      name: smm-demo
    EOF
    
  2. Update the demo-app.yaml files.

    For workload-cluster-1:

    cat > manifests/demo-app/overlays/workload-cluster-1/demo-app.yaml << EOF
    apiVersion: smm.cisco.com/v1alpha1
    kind: DemoApplication
    metadata:
      name: smm-demo
      namespace: smm-demo
    spec:
      autoscaling:
        enabled: true
      controlPlaneRef:
        name: smm
      deployIstioResources: true
      deploySLOResources: true
      enabled: true
      enabledComponents:
      - frontpage
      - catalog
      - bookings
      - postgresql
      istio:
        revision: cp-v115x.istio-system
      load:
        enabled: true
        maxRPS: 30
        minRPS: 10
        swingPeriod: 1380000000000
      replicas: 1
      resources:
        limits:
          cpu: "2"
          memory: 192Mi
        requests:
          cpu: 40m
          memory: 64Mi
    EOF
    

    For workload-cluster-2:

    cat > manifests/demo-app/overlays/workload-cluster-2/demo-app.yaml << EOF
    apiVersion: smm.cisco.com/v1alpha1
    kind: DemoApplication
    metadata:
      name: smm-demo
      namespace: smm-demo
    spec:
      autoscaling:
        enabled: true
      controlPlaneRef:
        name: smm
      deployIstioResources: false
      deploySLOResources: false
      enabled: true
      enabledComponents:
      - movies
      - payments
      - notifications
      - analytics
      - database
      - mysql
      istio:
        revision: cp-v115x.istio-system
      replicas: 1
      resources:
        limits:
          cpu: "2"
          memory: 192Mi
        requests:
          cpu: 40m
          memory: 64Mi
    EOF
    
  3. Commit and push the changes to the Git repository.

    git add .
    
    git commit -m "upgrade demo-app to istio-cp-v115x"
    
    git push
    
  4. Wait a few minutes, then check the IstioControlPlane of the demo-app.

    kubectl get ns smm-demo  -o=jsonpath='{.metadata.labels.istio\.io/rev}' --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}" 
    

    Expected output: the demo-app is using the new istio-cp-v115x IstioControlPlane.

    cp-v115x.istio-system
    

    For workload-cluster-2:

    kubectl get ns smm-demo  -o=jsonpath='{.metadata.labels.istio\.io/rev}' --kubeconfig "${WORKLOAD_CLUSTER_2_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_2_CONTEXT}" 
    

    Expected output: the demo-app is using the new istio-cp-v115x IstioControlPlane.

    cp-v115x.istio-system
    
  5. Check the dashboard.

    • On the MENU > OVERVIEW page, everything should be fine.

      Service Mesh Manager Overview Service Mesh Manager Overview

    • On the MENU > MESH page, you can see both the old and the new IstioControlPlane, but both smm-controlplane and demo-app are using the new one.

      Service Mesh Manager Mesh Service Mesh Manager Mesh

    • Check the MENU > TOPOLOGY page.

      Service Mesh Manager Topology Service Mesh Manager Topology