Install SDM - GitOps

This guide details how to set up a GitOps environment for Streaming Data Manager using Argo CD. The same principles can be used for other tools as well.

Prerequisites

  • Service Mesh Manager is already installed using GitOps (for details, see Install SMM - GitOps - single cluster).

    CAUTION:

    To install Streaming Data Manager on an existing Service Mesh Manager installation, the cluster must run Service Mesh Manager version 1.11.0 or later. If your cluster is running an earlier Service Mesh Manager version, you must upgrade it first.

    CAUTION:

    When using Streaming Data Manager on Amazon EKS, you must install the EBS CSI driver add-on on your cluster.

  • The cluster meets the resource requirements to run Service Mesh Manager and Streaming Data Manager:

    CAUTION:

    Supported providers and Kubernetes versions

    The cluster must run a Kubernetes version that Service Mesh Manager supports: Kubernetes 1.21, 1.22, 1.23, 1.24.

    Service Mesh Manager is tested and known to work on the following Kubernetes providers:

    • Amazon Elastic Kubernetes Service (Amazon EKS)
    • Google Kubernetes Engine (GKE)
    • Azure Kubernetes Service (AKS)
    • On-premises installation of stock Kubernetes with load balancer support (and optionally PVCs for persistence)

    Resource requirements

    Make sure that your Kubernetes cluster has sufficient resources. The default installation (with Service Mesh Manager and demo application) requires the following amount of resources on the cluster:

    Only Service Mesh Manager Service Mesh Manager and Streaming Data Manager
    CPU - 12 vCPU in total
    - 4 vCPU available for allocation per worker node (If you are testing on a cluster at a cloud provider, use nodes that have at least 4 CPUs, for example, c5.xlarge on AWS.)
    - 24 vCPU in total
    - 4 vCPU available for allocation per worker node (If you are testing on a cluster at a cloud provider, use nodes that have at least 4 CPUs, for example, c5.xlarge on AWS.)
    Memory - 16 GB in total
    - 2 GB available for allocation per worker node
    - 36 GB in total
    - 2 GB available for allocation per worker node
    Storage 12 GB of ephemeral storage on the Kubernetes worker nodes (for Traces and Metrics) 12 GB of ephemeral storage on the Kubernetes worker nodes (for Traces and Metrics)

    These minimum requirements need to be available for allocation within your cluster, in addition to the requirements of any other loads running in your cluster (for example, DaemonSets and Kubernetes node-agents). If Kubernetes cannot allocate sufficient resources to Service Mesh Manager, some pods will remain in Pending state, and Service Mesh Manager will not function properly.

    Enabling additional features, such as High Availability increases this value.

    The default installation, when enough headroom is available in the cluster, should be able to support at least 150 running Pods with the same amount of Services. For setting up Service Mesh Manager for bigger workloads, see scaling Service Mesh Manager.

  • Argo CD is already installed.

Procedure overview

The high-level steps of the procedure are:

  1. Prepare the Git repository
  2. Deploy Service Mesh Manager
  3. Extend the trust between the meshes
  4. Deploy other required resources
  5. Deploy the Kafka cluster application

Set up the environment

  1. Set the KUBECONFIG location and context name for the management-cluster cluster.

    MANAGEMENT_CLUSTER_KUBECONFIG=management_cluster_kubeconfig.yaml
    MANAGEMENT_CLUSTER_CONTEXT=management-cluster
    kubectl config --kubeconfig "${MANAGEMENT_CLUSTER_KUBECONFIG}" get-contexts "${MANAGEMENT_CLUSTER_CONTEXT}"
    

    Expected output:

    CURRENT   NAME                 CLUSTER              AUTHINFO   NAMESPACE
    *         management-cluster   management-cluster
    
  2. Set the KUBECONFIG location and context name for the workload-cluster-1 cluster.

    WORKLOAD_CLUSTER_1_KUBECONFIG=workload_cluster_1_kubeconfig.yaml
    WORKLOAD_CLUSTER_1_CONTEXT=workload-cluster-1
    kubectl config --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" get-contexts "${WORKLOAD_CLUSTER_1_CONTEXT}"
    

    Expected output:

    CURRENT   NAME                 CLUSTER              AUTHINFO                                          NAMESPACE
    *         workload-cluster-1   workload-cluster-1
    

    Repeat this step for any additional workload clusters you want to use.

  3. Make sure the management-cluster Kubernetes context is the current context.

    kubectl config --kubeconfig "${MANAGEMENT_CLUSTER_KUBECONFIG}" use-context "${MANAGEMENT_CLUSTER_CONTEXT}"
    

    Expected output:

    Switched to context "management-cluster".
    
  1. Get the password for the Argo CD admin user.

    kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo
    

    Expected output:

    argocd-admin-password
    
  2. Check the external-ip-or-hostname address of the argocd-server service.

    kubectl get service -n argocd argocd-server
    

    Expected output:

    NAME                                      TYPE           CLUSTER-IP      EXTERNAL-IP               PORT(S)                      AGE
    argocd-server                             LoadBalancer   10.108.14.130   external-ip-or-hostname   80:31306/TCP,443:30063/TCP   7d13h
    
  3. Log in to the Argo CD server using the https://external-ip-or-hostname URL.

    open https://$(kubectl get service -n argocd argocd-server -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
    
  4. Log in using the CLI.

    argocd login $(kubectl get service -n argocd argocd-server -o jsonpath='{.status.loadBalancer.ingress[0].hostname}') --insecure --username admin --password $(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d)
    

    Expected output:

    'admin:login' logged in successfully
    Context 'external-ip-or-hostname' updated
    
  5. List the Argo CD clusters and verify that your clusters are registered in Argo CD.

    argocd cluster list
    

    Expected output:

    SERVER                                     NAME                VERSION  STATUS   MESSAGE                                                  PROJECT
    https://kubernetes.default.svc             in-cluster                   Unknown  Cluster has no applications and is not being monitored.
    https://workload-cluster-1-ip-or-hostname  workload-cluster-1           Unknown  Cluster has no applications and is not being monitored.
    

Prepare Git repository

  1. Create an empty repository on GitHub (or another provider that Argo CD supports) and initialize it with a README.md file so that you can clone the repository. Alternatively, you can use the repository you have used for the Service Mesh Manager GitOps installation. Because credentials will be stored in this repository, make it a private repository.

    GITHUB_ID="github-id"
    
    GITHUB_REPOSITORY_NAME="calisti-gitops"
    
  2. Obtain a personal access token to the repository (on GitHub, see Creating a personal access token), that has the following permissions:

    • admin:org_hook
    • admin:repo_hook
    • read:org
    • read:public_key
    • repo
  3. Log in with your personal access token with git.

    export GH_TOKEN="github-personal-access-token" # Note: this environment variable needs to be exported so the `git` binary is going to use it automatically for authentication.
    
  4. Clone the repository into your local workspace.

    git clone "https://github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git"
    

    Expected output:

    Cloning into 'calisti-gitops'...
    remote: Enumerating objects: 144, done.
    remote: Counting objects: 100% (144/144), done.
    remote: Compressing objects: 100% (93/93), done.
    remote: Total 144 (delta 53), reused 135 (delta 47), pack-reused 0
    Receiving objects: 100% (144/144), 320.08 KiB | 746.00 KiB/s, done.
    Resolving deltas: 100% (53/53), done.
    
  5. Add the repository to Argo CD by running the following command. Alternatively, you can add it on Argo CD Web UI.

    argocd repo add "https://github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git" --name "${GITHUB_REPOSITORY_NAME}" --username "${GITHUB_ID}" --password "${GH_TOKEN}"
    

    Expected output:

    Repository 'https://github.com/github-id/calisti-gitops.git' added
    
  6. Verify that the repository is connected by running:

    argocd repo list
    

    In the output, Status should be Successful:

    TYPE  NAME            REPO                                             INSECURE  OCI    LFS    CREDS  STATUS      MESSAGE  PROJECT
    git   calisti-gitops  https://github.com/github-id/calisti-gitops.git  false     false  false  true   Successful
    
  7. Change into the directory of the cloned repository and create the charts, apps and manifests directories.

    cd "${GITHUB_REPOSITORY_NAME}"
    
    mkdir charts apps manifests
    

    The final structure of the repository will look like this:

    calisti-gitops
    ├── apps
    │   ├── sdm-applicationmanifest
    │   │   └── sdm-applicationmanifest-app.yaml
    │   ├── sdm-csr-operator-ca-certs
    │   │   └── sdm-csr-operator-ca-certs-app.yaml
    │   ├── sdm-istiocontrolplane
    │   │   └── sdm-istiocontrolplane-app.yaml
    │   ├── sdm-istiomesh-ca-trust-extension
    │   │   └── sdm-istiomesh-ca-trust-extension-app.yaml
    │   ├── sdm-kafka-cluster
    │   │   └── sdm-kafka-cluster-app.yaml
    │   ├── sdm-operator
    │   │   └── sdm-operator-app.yaml
    │   └── sdm-zookeeper-cluster
    │       └── sdm-zookeeper-cluster-app.yaml
    ├── charts
    │   └── supertubes-control-plane
    │       ├── Chart.yaml
    │       ├── README.md
    │       ├── templates
    │       │   └── ...
    │       └── values.yaml
    └── manifests
        ├── sdm-applicationmanifest
       │   └── sdm-applicationmanifest.yaml
        ├── sdm-csr-operator-ca-certs
       │   └── sdm-csr-operator-ca-certs-secret.yaml
        ├── sdm-istiocontrolplane
        │   ├── kustomization.yaml
        │   ├── sdm-istio-external-ca-cert-secret.yaml
        │   └── sdm-icp-v115x.yaml
        ├── sdm-istiomesh-ca-trust-extension
        │   ├── kustomization.yaml
        │   ├── istiomesh-ca-trust-extension-job.yaml
        │   └── istiomesh-ca-trust-extension-script-cm.yaml
        ├── sdm-kafka-cluster
        │   └── sdm-kafka-cluster.yaml
        └── sdm-zookeeper-cluster
           └── sdm-zookeeper-cluster.yaml
    

Prepare the helm charts

  1. You need an active Service Mesh Manager registration to download the Streaming Data Manager charts and images. You can sign up for free, or obtain Enterprise credentials on the official Cisco Service Mesh Manager page. After registration, you can obtain your username and password in the Download Center. Set them as environment variables.

    CALISTI_USERNAME="calisti-username"
    
    CALISTI_PASSWORD="calisti-password"
    
  2. Download the supertubes-control-plane chart from registry.eticloud.io into the charts directory of your Streaming Data Manager GitOps repository and unpack it. Run the following commands:

    export HELM_EXPERIMENTAL_OCI=1 # Needed prior to Helm version 3.8.0
    
    echo "${CALISTI_PASSWORD}" | helm registry login registry.eticloud.io -u "${CALISTI_USERNAME}" --password-stdin
    

    Expected output:

    Login Succeeded
    
    helm pull oci://registry.eticloud.io/sdm-charts/supertubes-control-plane --destination charts --untar
    

    Expected output:

    Pulled: registry.eticloud.io/sdm-charts/supertubes-control-plane:latest-stable-version
    Digest: sha256:someshadigest
    

(Optional) Deploy CA certificate and private key secret

This step is optional because the CA secret is created automatically by default. If you want to use your custom CA certificate and private key, complete the following steps.

The CSR-operator uses this CA secret to sign Certificate Signing Requests for workloads in the Istio mesh and for KafkaUsers CR (Kafka clients).

CAUTION:

Do not push the secrets directly into the git repository, especially when it is a public repository. Argo CD provides solutions to keep secrets safe.
  1. Create the sdm-csr-operator-ca-certs Secret in the manifests/sdm-csr-operator-ca-certs-secret directory.

    mkdir -p manifests/sdm-csr-operator-ca-certs
    
    ARGOCD_CLUSTER_NAME="${WORKLOAD_CLUSTER_1_CONTEXT}" CA_CRT_B64="$(cat ca_crt.pem | base64)" CA_KEY_B64="$(cat ca_key.pem | base64)" CHAIN_CRT_B64="$(cat intermediate.pem root.pem | base64)" ; cat > "manifests/sdm-csr-operator-ca-certs/sdm-csr-operator-ca-certs-secret.yaml" << EOF
    # manifests/sdm-csr-operator-ca-certs/sdm-csr-operator-ca-certs-secret.yaml
    apiVersion: v1
    kind: Secret
    metadata:
      name: sdm-csr-operator-ca-certs
      namespace: csr-operator-system
    data:
      ca_crt.pem: ${CA_CRT_B64}
      ca_key.pem: ${CA_KEY_B64}
      # chain_crt.pem is optional. Only needed when intermediate CA is used (root CA -> .. -> intermediate CA)
      chain_crt.pem: ${CHAIN_CRT_B64}
    EOF
    
  2. Create the sdm-csr-operator-ca-certs-secret Application CR in the apps/sdm-csr-operator-ca-certs-secret directory.

    mkdir -p apps/sdm-csr-operator-ca-certs
    
    ARGOCD_CLUSTER_NAME="${WORKLOAD_CLUSTER_1_CONTEXT}" ; cat > "apps/sdm-csr-operator-ca-certs/sdm-csr-operator-ca-certs-app.yaml" << EOF
    # apps/sdm-csr-operator-ca-certs/sdm-csr-operator-ca-certs-app.yaml
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: sdm-csr-operator-ca-certs
      namespace: argocd
      finalizers:
        - resources-finalizer.argocd.argoproj.io
    spec:
      project: default
      source:
        repoURL: https://github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git
        targetRevision: HEAD
        path: manifests/sdm-csr-operator-ca-certs
      destination:
        name: ${ARGOCD_CLUSTER_NAME}
        namespace: csr-operator-system
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
        - Validate=false
        - CreateNamespace=true
        - PrunePropagationPolicy=foreground
        - PruneLast=true
    EOF
    
  3. Commit and push the calisti-gitops repository.

    git add .
    git commit -m "add sdm-csr-operator-ca-certs-secret"
    
    git push
    

    Expected output:

    Enumerating objects: 13, done.
    Counting objects: 100% (13/13), done.
    Delta compression using up to 12 threads
    Compressing objects: 100% (10/10), done.
    Writing objects: 100% (10/10), 6.31 KiB | 6.31 MiB/s, done.
    Total 10 (delta 4), reused 0 (delta 0), pack-reused 0
    remote: Resolving deltas: 100% (4/4), completed with 2 local objects.
    To github.com:github-id/calisti-gitops.git
      af4e16f..8a81019  main -> main
    
  4. Apply the Application CR.

    kubectl apply -f "apps/sdm-csr-operator-ca-certs/sdm-csr-operator-ca-certs-app.yaml"
    

    Expected output:

    application.argoproj.io/sdm-csr-operator-ca-certs created
    
  5. Verify that the application has been added to Argo CD and is healthy.

    argocd app list
    

    Expected output:

    NAME                       CLUSTER             NAMESPACE            PROJECT  STATUS  HEALTH   SYNCPOLICY  CONDITIONS  REPO                                             PATH                                 TARGET
    ...
    sdm-csr-operator-ca-certs  workload-cluster-1  csr-operator-system  default  Synced  Healthy  Auto-Prune  <none>      https://github.com/github-id/calisti-gitops.git  manifests/sdm-csr-operator-ca-certs  HEAD
    ...
    

    Note: You will need to configure the Applicationmanifest CR so that the CSR-operator uses the previously created secret. To do this, you need to change:

    • the Applicationmanifest CR’s csr-operator/valuesOverride/.../issuer/autoGenerated field to false, and the
    • issuer/secretName field to the name of your CA secret.

Deploy Streaming Data Manager

Deploy the sdm-operator application

  1. Create the sdm-operator Application CR in the apps/sdm-operator directory.

    mkdir -p apps/sdm-operator
    
    ARGOCD_CLUSTER_NAME="${WORKLOAD_CLUSTER_1_CONTEXT}" ; cat > "apps/sdm-operator/sdm-operator-app.yaml" << EOF
    # apps/sdm-operator/sdm-operator-app.yaml
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: sdm-operator
      namespace: argocd
      finalizers:
        - resources-finalizer.argocd.argoproj.io
    spec:
      project: default
      source:
        repoURL: https://github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git
        targetRevision: HEAD
        path: charts/supertubes-control-plane
        helm:
          helm:
          releaseName: sdm-operator
          values: |
            imagePullSecrets: ["smm-registry.eticloud.io-pull-secret"]
            operator:
              image:
                repository: registry.eticloud.io/sdm/supertubes-control-plane
      destination:
        name: ${ARGOCD_CLUSTER_NAME}
        namespace: smm-registry-access
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - Validate=false
          - PruneLast=true
          - CreateNamespace=true
    EOF
    
  2. Commit and push the calisti-gitops repository.

    git add .
    git commit -m "add sdm-operator app"
    

    Expected output:

    [main 3f57c62] add sdm-operator app
    11 files changed, 842 insertions(+)
    create mode 100644 apps/sdm-operator/sdm-operator-app.yaml
    create mode 100644 charts/supertubes-control-plane/.helmignore
    create mode 100644 charts/supertubes-control-plane/Chart.yaml
    create mode 100644 charts/supertubes-control-plane/README.md
    create mode 100644 charts/supertubes-control-plane/templates/_helpers.tpl
    create mode 100644 charts/supertubes-control-plane/templates/supertubes-crd.yaml
    create mode 100644 charts/supertubes-control-plane/templates/supertubes-deployment.yaml
    create mode 100644 charts/supertubes-control-plane/templates/supertubes-rbac.yaml
    create mode 100644 charts/supertubes-control-plane/templates/supertubes-service.yaml
    create mode 100644 charts/supertubes-control-plane/templates/supertubes-webhooks.yaml
    create mode 100644 charts/supertubes-control-plane/values.yaml
    
    git push
    

    Expected output:

    Enumerating objects: 33, done.
    Counting objects: 100% (33/33), done.
    Delta compression using up to 12 threads
    Compressing objects: 100% (29/29), done.
    Writing objects: 100% (29/29), 14.38 KiB | 2.88 MiB/s, done.
    Total 29 (delta 7), reused 0 (delta 0), pack-reused 0
    remote: Resolving deltas: 100% (7/7), completed with 2 local objects.
    To github.com:github-id/calisti-gitops.git
    + 8a81019...3f57c62 main -> main (forced update)
    
  3. Apply the Application manifest.

    kubectl apply -f "apps/sdm-operator/sdm-operator-app.yaml"
    

    Expected output:

    application.argoproj.io/sdm-operator created
    
  4. Verify that the application has been added to Argo CD and is healthy.

    argocd app list
    

    Expected output:

    NAME          CLUSTER             NAMESPACE            PROJECT  STATUS  HEALTH   SYNCPOLICY  CONDITIONS  REPO                                             PATH                             TARGET
    ...
    sdm-operator  workload-cluster-1  smm-registry-access  default  Synced  Healthy  Auto-Prune  <none>      https://github.com/github-id/calisti-gitops.git  charts/supertubes-control-plane  HEAD
    ...
    

    You can check the sdm-operator application on the Argo CD Web UI as well.

    open https://$(kubectl get service -n argocd argocd-server -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
    

Deploy the sdm-applicationmanifest application

  1. Create the applicationmanifest ApplicationManifest CR in the manifests/sdm-applicationmanifest directory.

    mkdir -p manifests/sdm-applicationmanifest
    
    ISSUER_SECRET_NAME="sdm-csr-operator-ca-certs" ISSUER_AUTOGENERATED="false" ; cat > "manifests/sdm-applicationmanifest/sdm-applicationmanifest.yaml" << EOF
    # manifests/sdm-applicationmanifest/sdm-applicationmanifest.yaml
    apiVersion: supertubes.banzaicloud.io/v1beta1
    kind: ApplicationManifest
    metadata:
      name: applicationmanifest
    spec:
      clusterRegistry:
        enabled: false
        namespace: cluster-registry
      csrOperator:
        enabled: true
        namespace: csr-operator-system
        valuesOverride: |-
          image:
            repository: "registry.eticloud.io/csro/csr-operator"
          csroperator:
            config:
              privateCASigner:
                issuer:
                  secretName: "${ISSUER_SECRET_NAME}"
                  autoGenerated: "${ISSUER_AUTOGENERATED}"
      imagePullSecretsOperator:
        enabled: false
        namespace: supertubes-system
      istioOperator:
        enabled: false
        namespace: istio-system
      kafkaMinion:
        enabled: false
      kafkaOperator:
        enabled: false
        namespace: kafka
        valuesOverride: |
          alertManager:
            permissivePeerAuthentication:
              create: true
      monitoring:
        grafanaDashboards:
          enabled: false
          label: app.kubernetes.io/supertubes_managed_grafana_dashboard
        prometheusOperator:
          enabled: false
          namespace: supertubes-system
          valuesOverride: |
            prometheus:
              prometheusSpec:
                resources:
                  limits:
                    cpu: 2
                    memory: 2Gi
                  requests:
                    cpu: 1
                    memory: 1Gi
      supertubes:
        enabled: false
        namespace: supertubes-system
        valuesOverride: |
          ui-backend:
            image:
              repository: "registry.eticloud.io/sdm/supertubes-ui"
            podLabels:
              smm.cisco.com/jwt-auth-from-ingress: "true"
          operator:
            image:
              repository: "registry.eticloud.io/sdm/supertubes"
            cruiseControlModules:
              image:
                repository: "registry.eticloud.io/sdm/cruisecontrol-modules"
            kafkaAuthAgent:
              image:
                repository: "registry.eticloud.io/sdm/kafka-authn-agent"
            kafkaModules:
              image:
                repository: "registry.eticloud.io/sdm/kafka-modules"
      zookeeperOperator:
        enabled: false
        namespace: zookeeper
    EOF
    
  2. If you want to use a custom CA and secret created by the CSR-operator, modify the ApplicationManifest CR:

    • Set the csr-operator/valuesOverride/…/issuer/autoGenerated field to false.
    • Set the csr-operator/valuesOverride/…/issuer/secretName field to the name of your CA secret.
  3. Create the sdm-applicationmanifest Application CR in the apps/sdm-applicationmanifest directory.

    mkdir -p apps/sdm-applicationmanifest
    
    ARGOCD_CLUSTER_NAME="${WORKLOAD_CLUSTER_1_CONTEXT}" ; cat > "apps/sdm-applicationmanifest/sdm-applicationmanifest-app.yaml" <<EOF
    # apps/sdm-applicationmanifest/sdm-applicationmanifest-app.yaml
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: sdm-applicationmanifest
      namespace: argocd
      finalizers:
        - resources-finalizer.argocd.argoproj.io
    spec:
      project: default
      source:
        repoURL: https://github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git
        targetRevision: HEAD
        path: manifests/sdm-applicationmanifest
      destination:
        name: ${ARGOCD_CLUSTER_NAME}
        namespace: smm-registry-access
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
        - Validate=false
        - CreateNamespace=true
        - PrunePropagationPolicy=foreground
        - PruneLast=true
    EOF
    
  4. Commit and push the calisti-gitops repository.

    git add .
    git commit -m "add sdm-applicationmanifest application"
    

    Expected output:

    [main 0eae6a5] add sdm-applicationmanifest app
    2 files changed, 206 insertions(+)
    create mode 100644 apps/sdm-applicationmanifest/sdm-applicationmanifest-app.yaml
    create mode 100644 manifests/sdm-applicationmanifest/sdm-applicationmanifest.yaml
    
    git push origin
    

    Expected output:

    Enumerating objects: 13, done.
    Counting objects: 100% (13/13), done.
    Delta compression using up to 12 threads
    Compressing objects: 100% (10/10), done.
    Writing objects: 100% (10/10), 1.90 KiB | 1.90 MiB/s, done.
    Total 10 (delta 5), reused 0 (delta 0), pack-reused 0
    remote: Resolving deltas: 100% (5/5), completed with 3 local objects.
    To github.com:github-id/calisti-gitops.git
      3f57c62..0eae6a5  main -> main
    
  5. Apply the Application manifest.

    kubectl apply -f "apps/sdm-applicationmanifest/sdm-applicationmanifest-app.yaml"
    

    Expected output:

    application.argoproj.io/sdm-applicationmanifest created
    
  6. Verify that the application has been added to Argo CD and is healthy.

    argocd app list
    

    Expected output:

    NAME                     CLUSTER             NAMESPACE            PROJECT  STATUS  HEALTH   SYNCPOLICY  CONDITIONS  REPO                                             PATH                               TARGET
    ...
    sdm-applicationmanifest  workload-cluster-1  smm-registry-access  default  Synced  Healthy  Auto-Prune  <none>      https://github.com/github-id/calisti-gitops.git  manifests/sdm-applicationmanifest  HEAD
    ...
    
  7. Check if all pods are healthy and running in the supertubes-system namespace on workload-cluster-1.

    kubectl get pods -n csr-operator-system --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}"
    

    Expected output:

    NAME                            READY   STATUS    RESTARTS   AGE
    csr-operator-587df64776-5bs2p   2/2     Running   0          59s
    

Deploy the sdm-istiocontrolplane application

  1. Create the kustomization.yaml file in the manifests/sdm-istiocontrolplane directory.

    mkdir -p manifests/sdm-istiocontrolplane
    
    ISTIO_MINOR_VERSION="1.15" ; cat > "manifests/sdm-istiocontrolplane/kustomization.yaml" <<EOF
    # manifests/sdm-istiocontrolplane/kustomization.yaml
    resources:
    - sdm-istio-external-ca-cert-secret.yaml
    - sdm-icp-v${ISTIO_MINOR_VERSION/.}x.yaml
    EOF
    
  2. Set the sdm-istio-external-ca-cert secret.

    • When there is no intermediate CA, the sdm-istio-external-ca-cert secret has to contain the CA certificate from the sdm-csr-operator-ca-certs secret with the data key ca_crt.pem.
    • Otherwise, the sdm-istio-external-ca-cert has to contain the CA certificate chain from the sdm-csr-operator-ca-certs secret with the data key chain_crt.pem.
    ISSUER_SECRET_NAME="sdm-csr-operator-ca-certs" ; cat > "manifests/sdm-istiocontrolplane/sdm-istio-external-ca-cert-secret.yaml" <<EOF
    # manifests/sdm-istiocontrolplane/sdm-istio-external-ca-cert-secret.yaml
    apiVersion: v1
    kind: Secret
    metadata:
      name: sdm-istio-external-ca-cert
      namespace: istio-system
    data:
      # When there is intermediate CA:
      # root-cert.pem: $(kubectl --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}" --namespace csr-operator-system get secret ${ISSUER_SECRET_NAME} -o 'jsonpath={.data.chain_crt\.pem}')
      # When there is no intermediate CA
      root-cert.pem: $(kubectl --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}" --namespace csr-operator-system get secret ${ISSUER_SECRET_NAME} -o 'jsonpath={.data.ca_crt\.pem}')
    EOF
    
  3. Create the SDM IstioControlPlane CR in the manifests/sdm-istiocontrolplane directory.

    ISTIO_VERSION="1.15.3" ISTIO_MINOR_VERSION="1.15" ISTIO_PILOT_VERSION="v1.15.3-bzc.0" ISTIO_PROXY_VERSION="v1.15.3-bzc-kafka.0" ; cat > "manifests/sdm-istiocontrolplane/sdm-icp-v${ISTIO_MINOR_VERSION/.}x.yaml" <<EOF
    # manifests/sdm-istiocontrolplane/sdm-icp-v${ISTIO_MINOR_VERSION/.}x.yaml
    apiVersion: servicemesh.cisco.com/v1alpha1
    kind: IstioControlPlane
    metadata:
      labels:
        banzaicloud.io/managed-by: supertubes
      name: sdm-icp-v${ISTIO_MINOR_VERSION/.}x
      namespace: istio-system
    spec:
      containerImageConfiguration:
        imagePullPolicy: Always
        imagePullSecrets:
          - name: smm-pull-secret
      distribution: cisco
      istiod:
        deployment:
          env:
            - name: PILOT_SKIP_VALIDATE_TRUST_DOMAIN
              value: "true"
            - name: EXTERNAL_CA
              value: ISTIOD_RA_KUBERNETES_API
            - name: K8S_SIGNER
              value: csr.banzaicloud.io/privateca
            - name: ISTIO_MULTIROOT_MESH
              value: "true"
          image: registry.eticloud.io/sdm/istio-pilot:${ISTIO_PILOT_VERSION}
      k8sResourceOverlays:
        - groupVersionKind:
            group: apps
            kind: Deployment
            version: v1
          objectKey:
            name: istiod-sdm-icp-v${ISTIO_MINOR_VERSION/.}x
          patches:
            - parseValue: true
              path: /spec/template/spec/volumes/-
              type: replace
              value: |
                name: external-ca-cert
                secret:
                  secretName: sdm-istio-external-ca-cert
                  optional: true
            - parseValue: true
              path: /spec/template/spec/containers/name=discovery/volumeMounts/-
              type: replace
              value: |
                name: external-ca-cert
                mountPath: /etc/external-ca-cert
                readOnly: true
        - groupVersionKind:
            group: rbac.authorization.k8s.io
            kind: ClusterRole
            version: v1
          objectKey:
            name: istiod-sdm-icp-v${ISTIO_MINOR_VERSION/.}x-istio-system
          patches:
            - parseValue: true
              path: /rules/-
              type: replace
              value: |
                apiGroups:
                - certificates.k8s.io
                resourceNames:
                - csr.banzaicloud.io/privateca
                resources:
                - signers
                verbs:
                - approve
      meshConfig:
        defaultConfig:
          proxyMetadata:
            PROXY_CONFIG_XDS_AGENT: "true"
        enableAutoMtls: true
        protocolDetectionTimeout: 5s
      meshID: sdm
      mode: ACTIVE
      networkName: network1
      proxy:
        image: registry.eticloud.io/sdm/istio-proxyv2:v1.15.3-bzc-kafka.0
      telemetryV2:
        enabled: true
      version: 1.15.3
    EOF
    
  4. Create sdm-istiocontrolplane Application CR in the apps/sdm-istiocontrolplane directory.

    mkdir -p apps/sdm-istiocontrolplane
    
    ARGOCD_CLUSTER_NAME="${WORKLOAD_CLUSTER_1_CONTEXT}" ; cat > apps/sdm-istiocontrolplane/sdm-istiocontrolplane-app.yaml <<EOF
    # apps/sdm-istiocontrolplane/sdm-istiocontrolplane-app.yaml
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: sdm-istiocontrolplane
      namespace: argocd
      finalizers:
        - resources-finalizer.argocd.argoproj.io
    spec:
      project: default
      source:
        repoURL: https://github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git
        targetRevision: HEAD
        path: manifests/sdm-istiocontrolplane
      destination:
        name: ${ARGOCD_CLUSTER_NAME}
        namespace: istio-system
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
        - Validate=false
        - CreateNamespace=true
        - PrunePropagationPolicy=foreground
        - PruneLast=true
    EOF
    
  5. Commit and push the calisti-gitops repository.

    git add apps/sdm-istiocontrolplane manifests/sdm-istiocontrolplane
    
    git commit -m "add sdm-istiocontrolplane app"
    

    Expected output:

    [main 0d8959e] add sdm-istiocontrolplane app
    4 files changed, 120 insertions(+)
    create mode 100644 apps/sdm-istiocontrolplane/sdm-istiocontrolplane-app.yaml
    create mode 100644 manifests/sdm-istiocontrolplane/kustomization.yaml
    create mode 100644 manifests/sdm-istiocontrolplane/sdm-icp-v115x.yaml
    create mode 100644 manifests/sdm-istiocontrolplane/sdm-istio-external-ca-cert-secret.yaml
    
    git push origin
    

    Expected output:

    Enumerating objects: 13, done.
    Counting objects: 100% (13/13), done.
    Delta compression using up to 12 threads
    Compressing objects: 100% (10/10), done.
    Writing objects: 100% (10/10), 3.63 KiB | 1.81 MiB/s, done.
    Total 10 (delta 3), reused 0 (delta 0), pack-reused 0
    remote: Resolving deltas: 100% (3/3), completed with 3 local objects.
    To github.com:github-id/calisti-gitops.git
      63600fc..0d8959e  main -> main
    
  6. Apply the sdm-istiocontrolplane Application CR.

    kubectl apply -f apps/sdm-istiocontrolplane/sdm-istiocontrolplane-app.yaml
    

    Expected output:

    application.argoproj.io/sdm-istiocontrolplane created
    
  7. Verify that the application has been added to Argo CD and is healthy.

    argocd app list
    

    Expected output:

    NAME                     CLUSTER             NAMESPACE      PROJECT  STATUS  HEALTH   SYNCPOLICY  CONDITIONS  REPO                                             PATH                               TARGET
    ...
    sdm-istiocontrolplane    workload-cluster-1  istio-system   default  Synced  Healthy  Auto-Prune  <none>      https://github.com/github-id/calisti-gitops.git  manifests/sdm-istiocontrolplane    HEAD
    ...
    

Extend the trust between the Istio meshes

Extend the Service Mesh Manager Istio mesh trusted CA certificates with the Streaming Data Manager external CA certificate.

  1. Create the kustomization.yaml file in the manifests/sdm-istiomesh-ca-trust-extension directory.

    mkdir -p manifests/sdm-istiomesh-ca-trust-extension
    
    cat > manifests/sdm-istiomesh-ca-trust-extension/kustomization.yaml <<EOF
    # manifests/sdm-istiomesh-ca-trust-extension/kustomization.yaml
    resources:
    - istiomesh-ca-trust-extension-job.yaml
    - istiomesh-ca-trust-extension-script-cm.yaml
    EOF
    
  2. Create the istiomesh-ca-trust-extension Job in the manifests/sdm-istiomesh-ca-trust-extension directory.

    cat > manifests/sdm-istiomesh-ca-trust-extension/istiomesh-ca-trust-extension-job.yaml <<EOF
    # manifests/sdm-istiomesh-ca-trust-extension/istiomesh-ca-trust-extension-job.yaml
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: istiomesh-ca-trust-extension
      namespace: smm-registry-access
    spec:
      completions: 1
      template:
        metadata:
          name: istiomesh-ca-trust-extension
        spec:
          containers:
          - command:
            - /scripts/run.sh
            image: lachlanevenson/k8s-kubectl:v1.16.10
            imagePullPolicy: IfNotPresent
            name: istio-trust-extension-job
            volumeMounts:
            - mountPath: /scripts
              name: run
              readOnly: false
          dnsPolicy: ClusterFirst
          restartPolicy: Never
          serviceAccount: sdm-operator-supertubes-control-plane
          serviceAccountName: sdm-operator-supertubes-control-plane
          volumes:
          - configMap:
              defaultMode: 365
              name: istiomesh-ca-trust-extension-script
            name: run
    EOF
    
  3. Create the istiomesh-ca-trust-extension-script ConfigMap in the manifests/sdm-istiomesh-ca-trust-extension/resources directory.

    ISTIO_MINOR_VERSION="1.15" cat > manifests/sdm-istiomesh-ca-trust-extension/istiomesh-ca-trust-extension-script-cm.yaml <<EOF
    # manifests/sdm-istiomesh-ca-trust-extension/istiomesh-ca-trust-extension-script-cm.yaml
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: istiomesh-ca-trust-extension-script
      namespace: smm-registry-access
    data:
      run.sh: |-
        #!/bin/sh
        # Fill these fields properly----------------------
        export CA_SECRET_NAMESPACE="istio-system"
        export CA_SECRET_NAME="sdm-istio-external-ca-cert"
        # ------------------------------------------------
        export ICP_NAME="cp-v${ISTIO_MINOR_VERSION/.}x"
        export ICP_NAMESPACE="istio-system"
        export CA_CERT=\$(kubectl get secret -n \$CA_SECRET_NAMESPACE \$CA_SECRET_NAME -o jsonpath='{.data.root-cert\.pem}' | base64 -d | sed '\$ ! s/\$/\\n/' | tr -d '\n')
        read -r -d '' PATCH << EOF
        {"spec": {"meshConfig": {"caCertificates": [{"pem": "\$CA_CERT"}]}}}
        EOF
        read -r -d '' INSERT_PATCH << EOF
        [{"op": "add", "path": "/spec/meshConfig/caCertificates/-", "value": {"pem": "\$CA_CERT"}}]
        EOF
        kubectl patch istiocontrolplanes.servicemesh.cisco.com \$ICP_NAME -n \$ICP_NAMESPACE --type json --patch="\$INSERT_PATCH" || kubectl patch istiocontrolplanes.servicemesh.cisco.com \$ICP_NAME -n \$ICP_NAMESPACE --type merge --patch="\$PATCH"
    EOF
    
  4. Create the istiomesh-ca-trust-extension Application CR.

    mkdir -p apps/sdm-istiomesh-ca-trust-extension
    
    ARGOCD_CLUSTER_NAME="${WORKLOAD_CLUSTER_1_CONTEXT}" ; cat > "apps/sdm-istiomesh-ca-trust-extension/sdm-istiomesh-ca-trust-extension-app.yaml" <<EOF
    # apps/sdm-istiomesh-ca-trust-extension/sdm-istiomesh-ca-trust-extension-app.yaml
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: sdm-istiomesh-ca-trust-extension
      namespace: argocd
      finalizers:
        - resources-finalizer.argocd.argoproj.io
    spec:
      project: default
      source:
        repoURL: https://github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git
        targetRevision: HEAD
        path: manifests/sdm-istiomesh-ca-trust-extension
      destination:
        name: ${ARGOCD_CLUSTER_NAME}
        namespace: smm-registry-access
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
        - Validate=false
        - CreateNamespace=true
        - PrunePropagationPolicy=foreground
        - PruneLast=true
    EOF
    
  5. Commit and push the sdm-gitops repository.

    git add apps/sdm-istiomesh-ca-trust-extension manifests/sdm-istiomesh-ca-trust-extension
    git commit -m "add sdm-istiomesh-ca-trust-extension app"
    

    Expected output:

    [main 1ff5fae] add sdm-istiomesh-ca-trust-extension app
    4 files changed, 84 insertions(+)
    create mode 100644 apps/sdm-istiomesh-ca-trust-extension/sdm-istiomesh-ca-trust-extension-app.yaml
    create mode 100644 manifests/sdm-istiomesh-ca-trust-extension/istiomesh-ca-trust-extension-job.yaml
    create mode 100644 manifests/sdm-istiomesh-ca-trust-extension/istiomesh-ca-trust-extension-script-cm.yaml
    create mode 100644 manifests/sdm-istiomesh-ca-trust-extension/kustomization.yaml
    
    git push origin
    

    Expected output:

    Enumerating objects: 13, done.
    Counting objects: 100% (13/13), done.
    Delta compression using up to 12 threads
    Compressing objects: 100% (10/10), done.
    Writing objects: 100% (10/10), 2.21 KiB | 2.21 MiB/s, done.
    Total 10 (delta 3), reused 0 (delta 0), pack-reused 0
    remote: Resolving deltas: 100% (3/3), completed with 3 local objects.
    To github.com:github-id/calisti-gitops.git
      98d53c4..1ff5fae  main -> main
    
  6. Apply the sdm-istiomesh-ca-trust-extension Application CR.

    kubectl apply -f apps/sdm-istiomesh-ca-trust-extension/sdm-istiomesh-ca-trust-extension-app.yaml
    

    Expected output:

    application.argoproj.io/sdm-istiomesh-ca-trust-extension created
    
  7. Verify that the application has been added to Argo CD and is healthy.

    argocd app list
    

    Expected output:

    NAME                              CLUSTER             NAMESPACE            PROJECT  STATUS  HEALTH   SYNCPOLICY  CONDITIONS  REPO                                             PATH                                        TARGET
    ...
    sdm-istiomesh-ca-trust-extension  workload-cluster-1  smm-registry-access  default  Synced  Healthy  Auto-Prune  <none>      https://github.com/github-id/calisti-gitops.git  manifests/sdm-istiomesh-ca-trust-extension  HEAD
    ...
    

Deploy the remaining Streaming Data Manager resources

Modify the ApplicationManifest CR to deploy the remaining Streaming Data Manager resources.

  1. Edit the ApplicationManifest CR.

    ISSUER_SECRET_NAME="sdm-csr-operator-ca-certs" ISSUER_AUTOGENERATED="false" ; cat > "manifests/sdm-applicationmanifest/sdm-applicationmanifest.yaml" << EOF
    # manifests/sdm-applicationmanifest/sdm-applicationmanifest.yaml
    apiVersion: supertubes.banzaicloud.io/v1beta1
    kind: ApplicationManifest
    metadata:
      name: applicationmanifest
    spec:
      clusterRegistry:
        enabled: false
        namespace: cluster-registry
      csrOperator:
        enabled: true
        namespace: csr-operator-system
        valuesOverride: |-
          image:
            repository: "registry.eticloud.io/csro/csr-operator"
          csroperator:
            config:
              privateCASigner:
                issuer:
                  secretName: "${ISSUER_SECRET_NAME}"
                  autoGenerated: "${ISSUER_AUTOGENERATED}"
      imagePullSecretsOperator:
        enabled: false
        namespace: supertubes-system
      istioOperator:
        enabled: false
        namespace: istio-system
      kafkaMinion:
        enabled: true
      kafkaOperator:
        enabled: true
        namespace: kafka
        valuesOverride: |
          alertManager:
            permissivePeerAuthentication:
              create: true
      monitoring:
        grafanaDashboards:
          enabled: true
          label: app.kubernetes.io/supertubes_managed_grafana_dashboard
        prometheusOperator:
          enabled: true
          namespace: supertubes-system
          valuesOverride: |
            prometheus:
              prometheusSpec:
                resources:
                  limits:
                    cpu: 2
                    memory: 2Gi
                  requests:
                    cpu: 1
                    memory: 1Gi
      supertubes:
        enabled: true
        namespace: supertubes-system
        valuesOverride: |
          ui-backend:
            image:
              repository: "registry.eticloud.io/sdm/supertubes-ui"
            podLabels:
              smm.cisco.com/jwt-auth-from-ingress: "true"
          operator:
            image:
              repository: "registry.eticloud.io/sdm/supertubes"
            cruiseControlModules:
              image:
                repository: "registry.eticloud.io/sdm/cruisecontrol-modules"
            kafkaAuthAgent:
              image:
                repository: "registry.eticloud.io/sdm/kafka-authn-agent"
            kafkaModules:
              image:
                repository: "registry.eticloud.io/sdm/kafka-modules"
      zookeeperOperator:
        enabled: true
        namespace: zookeeper
    EOF
    
  2. If you want to use a custom CA and secret created by the CSR-operator, modify the ApplicationManifest CR:

    • Set the csr-operator/valuesOverride/…/issuer/autoGenerated field to false.
    • Set the csr-operator/valuesOverride/…/issuer/secretName field to the name of your CA secret.
  3. Commit and push the calisti-gitops repository.

    git add manifests/sdm-applicationmanifest
    git commit -m "update sdm-applicationmanifest app"
    

    Expected output:

    [main 18df6de] update sdm-applicationmanifest app
    1 file changed, 6 insertions(+), 6 deletions(-)
    
    git push
    

    Expected output:

    Enumerating objects: 9, done.
    Counting objects: 100% (9/9), done.
    Delta compression using up to 12 threads
    Compressing objects: 100% (5/5), done.
    Writing objects: 100% (5/5), 662 bytes | 662.00 KiB/s, done.
    Total 5 (delta 3), reused 0 (delta 0), pack-reused 0
    remote: Resolving deltas: 100% (3/3), completed with 3 local objects.
    To github.com:github-id/calisti-gitops.git
      1ff5fae..18df6de  main -> main
    
  4. Sync the changes in Argo CD

    argocd app sync sdm-applicationmanifest
    

    Expected output:

    TIMESTAMP                  GROUP                            KIND            NAMESPACE                           NAME    STATUS    HEALTH        HOOK  MESSAGE
    2022-10-31T13:07:07+01:00  supertubes.banzaicloud.io  ApplicationManifest  smm-registry-access   applicationmanifest  OutOfSync
    
    Name:               sdm-applicationmanifest
    Project:            default
    Server:             workload-cluster-1
    Namespace:          smm-registry-access
    URL:                https://a90baf9b2fd8e42ff8bbcbfb60ba59b0-779918959.us-east-1.elb.amazonaws.com/applications/sdm-applicationmanifest
    Repo:               https://github.com/github-id/calisti-gitops.git
    Target:             HEAD
    Path:               manifests/sdm-applicationmanifest
    SyncWindow:         Sync Allowed
    Sync Policy:        Automated (Prune)
    Sync Status:        Synced to HEAD (18df6de)
    Health Status:      Healthy
    
    Operation:          Sync
    Sync Revision:      18df6de9a4b64863496e666d7e9217a1b10f351d
    Phase:              Succeeded
    Start:              2022-10-31 13:07:07 +0100 CET
    Finished:           2022-10-31 13:07:08 +0100 CET
    Duration:           1s
    Message:            successfully synced (all tasks run)
    
    GROUP                      KIND                 NAMESPACE            NAME                 STATUS  HEALTH  HOOK  MESSAGE
    supertubes.banzaicloud.io  ApplicationManifest  smm-registry-access  applicationmanifest  Synced                applicationmanifest.supertubes.banzaicloud.io/applicationmanifest configured
    
  5. Wait about 5 minutes and check if all pods are healthy and running in the supertubes-system, kafka, and zookeeper namespaces on workload-cluster-1.

    kubectl get pods -n supertubes-system --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}"
    

    Expected output:

    NAME                                                      READY   STATUS    RESTARTS      AGE
    prometheus-operator-grafana-6c48dc7db9-qvz4r              4/4     Running   0             32m
    prometheus-operator-kube-state-metrics-6cfbf4ff4c-zkm7t   2/2     Running   2 (32m ago)   32m
    prometheus-operator-operator-785464cdb5-fpckj             2/2     Running   2 (32m ago)   32m
    prometheus-operator-prometheus-node-exporter-85f4h        1/1     Running   0             32m
    prometheus-operator-prometheus-node-exporter-bgnd2        1/1     Running   0             32m
    prometheus-operator-prometheus-node-exporter-dn22n        1/1     Running   0             32m
    prometheus-operator-prometheus-node-exporter-sk2nc        1/1     Running   0             32m
    prometheus-prometheus-operator-prometheus-0               3/3     Running   0             32m
    supertubes-6d4f68655b-jrgml                               3/3     Running   2 (31m ago)   31m
    supertubes-ui-backend-55976fc6fc-mc9t5                    2/2     Running   2 (31m ago)   31m
    
    kubectl get pods -n kafka --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}"
    

    Expected output:

    NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
    kafka-operator-operator   1/1     1            1           39m
    
    kubectl get pods -n zookeeper --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}"
    

    Expected output:

    NAME                                            READY   STATUS      RESTARTS      AGE
    zookeeper-operator-77df49fd-tv7dl               2/2     Running     1 (38m ago)   38m
    zookeeper-operator-post-install-upgrade-5vzkp   0/1     Completed   0             38m
    

Deploy ZooKeeper cluster application

  1. Create the sdm-zookeeper-cluster ZookeeperCluster CR in the apps/sdm-zookeeper-cluster directory.

    mkdir -p manifests/sdm-zookeeper-cluster
    
    cat > manifests/sdm-zookeeper-cluster/sdm-zookeeper-cluster.yaml <<EOF
    # manifests/sdm-zookeeper-cluster/sdm-zookeeper-cluster.yaml
    apiVersion: zookeeper.pravega.io/v1beta1
    kind: ZookeeperCluster
    metadata:
      name: zookeeper
      namespace: zookeeper
    spec:
      replicas: 3
      persistence:
        reclaimPolicy: Delete
    EOF
    
  2. Create the sdm-zookeeper-cluster Application CR in the apps/sdm-zookeeper-cluster directory.

    mkdir -p apps/sdm-zookeeper-cluster
    
    ARGOCD_CLUSTER_NAME="${WORKLOAD_CLUSTER_1_CONTEXT}" ; cat > apps/sdm-zookeeper-cluster/sdm-zookeeper-cluster-app.yaml <<EOF
    # apps/sdm-zookeeper-cluster/sdm-zookeeper-cluster-app.yaml
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: sdm-zookeeper-cluster
      namespace: argocd
      finalizers:
        - resources-finalizer.argocd.argoproj.io
    spec:
      project: default
      source:
        repoURL: https://github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git
        targetRevision: HEAD
        path: manifests/sdm-zookeeper-cluster
      destination:
        name: ${ARGOCD_CLUSTER_NAME}
        namespace: zookeeper
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
        - Validate=false
        - CreateNamespace=true
        - PrunePropagationPolicy=foreground
        - PruneLast=true
    EOF
    
  3. Commit and push the calisti-gitops repository.

    git add apps/sdm-zookeeper-cluster manifests/sdm-zookeeper-cluster
    
    git commit -m "add sdm-zookeeper-cluster app"
    

    Expected output:

    [main d3e8b60] add sdm-zookeeper-cluster app
    2 files changed, 36 insertions(+)
    create mode 100644 apps/sdm-zookeeper-cluster/sdm-zookeeper-cluster-app.yaml
    create mode 100644 manifests/sdm-zookeeper-cluster/sdm-zookeeper-cluster.yaml
    
    git push
    

    Expected output:

    Enumerating objects: 11, done.
    Counting objects: 100% (11/11), done.
    Delta compression using up to 12 threads
    Compressing objects: 100% (8/8), done.
    Writing objects: 100% (8/8), 1.24 KiB | 1.24 MiB/s, done.
    Total 8 (delta 3), reused 0 (delta 0), pack-reused 0
    remote: Resolving deltas: 100% (3/3), completed with 3 local objects.
    To github.com:github-id/calisti-gitops.git
      18df6de..d3e8b60  main -> main
    
  4. Apply the sdm-zookeeper-cluster Application CR.

    kubectl apply -f apps/sdm-zookeeper-cluster/sdm-zookeeper-cluster-app.yaml
    

    Expected output:

    application.argoproj.io/sdm-zookeeper-cluster created
    
  5. Verify that the application has been added to Argo CD and is healthy.

    argocd app list
    

    Expected output:

    NAME                   CLUSTER             NAMESPACE  PROJECT  STATUS  HEALTH   SYNCPOLICY  CONDITIONS  REPO                                             PATH                                        TARGET
    ...
    sdm-zookeeper-cluster  workload-cluster-1  zookeeper  default  Synced  Healthy  Auto-Prune  <none>      https://github.com/github-id/calisti-gitops.git  manifests/sdm-zookeeper-cluster             HEAD
    ...
    
  6. Wait about 5 minutes and check if all pods are healthy and running in the zookeeper namespace on workload-cluster-1.

    kubectl get pods --namespace zookeeper --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}"
    

    Expected output:

    NAME                                            READY   STATUS      RESTARTS      AGE
    zookeeper-0                                     2/2     Running     0             113s
    zookeeper-1                                     2/2     Running     0             79s
    zookeeper-2                                     2/2     Running     0             42s
    zookeeper-operator-5698f684bb-b7lf6             2/2     Running     2 (12m ago)   12m
    zookeeper-operator-post-install-upgrade-xbj8d   0/1     Completed   0             12m
    

Deploy the Kafka cluster application

  1. Create the kafka KafkaCluster CR in the manifests/sdm-kafka-cluster directory.

    mkdir -p manifests/sdm-kafka-cluster
    
  2. Create your custom KafkaCluster CR in the sdm-kafka-cluster directory based on the samples and the Koperator documentation.

    Edit your KafkaCluster CR and set the .spec.zkAddresses field to $ZOOKEEPER_CLUSTER_NAME-client.zookeeper:2181

    ISTIO_MINOR_VERSION="1.15" BANZAI_KAFKA_VERSION="2.13-3.1.0" ZOOKEEPER_CLUSTER_NAME="zookeeper" ; cat > "manifests/sdm-kafka-cluster/sdm-kafka-cluster.yaml" <<EOF
    # manifests/sdm-kafka-cluster/sdm-kafka-cluster.yaml
    apiVersion: kafka.banzaicloud.io/v1beta1
    kind: KafkaCluster
    metadata:
      labels:
        controller-tools.k8s.io: "1.0"
      name: kafka
      namespace: kafka
    spec:
      headlessServiceEnabled: false
      ingressController: "istioingress"
      istioIngressConfig:
        gatewayConfig:
          mode: ISTIO_MUTUAL
      istioControlPlane:
        name: sdm-icp-v${ISTIO_MINOR_VERSION/.}x
        namespace: istio-system
      zkAddresses:
        - "${ZOOKEEPER_CLUSTER_NAME}-client.zookeeper:2181"
      # monitoringConfig:
      #   jmxImage: "ghcr.io/banzaicloud/jmx-javaagent:0.16.1"
      #   pathToJar: "/jmx_prometheus_javaagent.jar"
      #   kafkaJMXExporterConfig: |
      #     lowercaseOutputName: true
      #     lowercaseOutputLabelNames: true
      #     ssl: false
      #     whitelistObjectNames:
      #     - kafka.cluster:type=Partition,name=UnderMinIsr,*
      #     - kafka.cluster:type=Partition,name=UnderReplicated,*
      #     - kafka.controller:type=ControllerChannelManager,name=QueueSize,*
      #     - kafka.controller:type=ControllerChannelManager,name=TotalQueueSize
      #     - kafka.controller:type=ControllerStats,*
      #     - kafka.controller:type=KafkaController,*
      #     - kafka.log:type=Log,name=LogEndOffset,*
      #     - kafka.log:type=Log,name=LogStartOffset,*
      #     - kafka.log:type=Log,name=Size,*
      #     - kafka.log:type=LogManager,*
      #     - kafka.network:type=Processor,name=IdlePercent,*
      #     - kafka.network:type=RequestChannel,name=RequestQueueSize
      #     - kafka.network:type=RequestChannel,name=ResponseQueueSize,*
      #     - kafka.network:type=RequestMetrics,name=ErrorsPerSec,*
      #     - kafka.network:type=RequestMetrics,name=RequestsPerSec,*
      #     - kafka.network:type=RequestMetrics,name=LocalTimeMs,request=Produce
      #     - kafka.network:type=RequestMetrics,name=LocalTimeMs,request=FetchConsumer
      #     - kafka.network:type=RequestMetrics,name=LocalTimeMs,request=FetchFollower
      #     - kafka.network:type=RequestMetrics,name=RemoteTimeMs,request=Produce
      #     - kafka.network:type=RequestMetrics,name=RemoteTimeMs,request=FetchConsumer
      #     - kafka.network:type=RequestMetrics,name=RemoteTimeMs,request=FetchFollower
      #     - kafka.network:type=RequestMetrics,name=RequestQueueTimeMs,request=Produce
      #     - kafka.network:type=RequestMetrics,name=RequestQueueTimeMs,request=FetchConsumer
      #     - kafka.network:type=RequestMetrics,name=RequestQueueTimeMs,request=FetchFollower
      #     - kafka.network:type=RequestMetrics,name=ResponseQueueTimeMs,request=Produce
      #     - kafka.network:type=RequestMetrics,name=ResponseQueueTimeMs,request=FetchConsumer
      #     - kafka.network:type=RequestMetrics,name=ResponseQueueTimeMs,request=FetchFollower
      #     - kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce
      #     - kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer
      #     - kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower
      #     - kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce
      #     - kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer
      #     - kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower
      #     - kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent
      #     - kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,*
      #     - kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,*
      #     - kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec
      #     - kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec
      #     - kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,*
      #     - kafka.server:type=BrokerTopicMetrics,name=ReassignmentBytesInPerSec
      #     - kafka.server:type=BrokerTopicMetrics,name=ReassignmentBytesOutPerSec
      #     - kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesInPerSec
      #     - kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesOutPerSec
      #     - kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec,*
      #     - kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec,*
      #     - kafka.server:type=BrokerTopicMetrics,name=FetchMessageConversionsPerSec
      #     - kafka.server:type=BrokerTopicMetrics,name=ProduceMessageConversionsPerSec
      #     - kafka.server:type=DelayedOperationPurgatory,*
      #     - kafka.server:type=FetcherLagMetrics,name=ConsumerLag,*
      #     - kafka.server:type=FetcherStats,name=BytesPerSec,*
      #     - kafka.server:type=KafkaRequestHandlerPool,*
      #     - kafka.server:type=KafkaServer,name=BrokerState
      #     - kafka.server:type=Fetch
      #     - kafka.server:type=Produce
      #     - kafka.server:type=Request
      #     - kafka.server:type=app-info,*
      #     - kafka.server:type=ReplicaFetcherManager,*
      #     - kafka.server:type=ReplicaManager,*
      #     - kafka.server:type=SessionExpireListener,*
      #     - kafka.server:type=ZooKeeperClientMetrics,name=ZooKeeperRequestLatencyMs
      #     - kafka.server:type=socket-server-metrics,listener=*,*
      #     - java.lang:type=*
      #     - java.nio:*
      #     rules:
      #     - pattern: kafka.cluster<type=(Partition), name=(UnderMinIsr|UnderReplicated), topic=([-.\w]+), partition=(\d+)><>(Value)
      #       name: kafka_controller_$1_$2_$5
      #       type: GAUGE
      #       labels:
      #         topic: $3
      #         partition: $4
      #       cache: true
      #     - pattern: kafka.controller<type=(ControllerChannelManager), name=(QueueSize), broker-id=(\d+)><>(Value)
      #       name: kafka_controller_$1_$2_$4
      #       type: GAUGE
      #       labels:
      #         broker_id: $3
      #       cache: true
      #     - pattern: kafka.controller<type=(ControllerChannelManager), name=(TotalQueueSize)><>(Value)
      #       name: kafka_controller_$1_$2_$3
      #       type: GAUGE
      #       cache: true
      #     - pattern: kafka.controller<type=(ControllerStats), name=([-.\w]+)><>(Count)
      #       name: kafka_controller_$1_$2_$3
      #       type: COUNTER
      #       cache: true
      #     - pattern: kafka.controller<type=(KafkaController), name=([-.\w]+)><>(Value)
      #       name: kafka_controller_$1_$2_$3
      #       type: GAUGE
      #       cache: true
      #     - pattern: kafka.log<type=Log, name=(LogEndOffset|LogStartOffset|Size), topic=([-.\w]+), partition=(\d+)><>(Value)
      #       name: kafka_log_$1_$4
      #       type: GAUGE
      #       labels:
      #         topic: $2
      #         partition: $3
      #       cache: true
      #     - pattern: kafka.log<type=(LogManager), name=(LogDirectoryOffline), logDirectory="(.+)"><>(Value)
      #       name: kafka_log_$1_$2_$4
      #       labels:
      #         log_directory: $3
      #       type: GAUGE
      #       cache: true
      #     - pattern: kafka.log<type=(LogManager), name=(OfflineLogDirectoryCount)><>(Value)
      #       name: kafka_log_$1_$2_$3
      #       type: GAUGE
      #       cache: true
      #     - pattern : kafka.network<type=(Processor), name=(IdlePercent), networkProcessor=(\d+)><>(Value)
      #       name: kafka_network_$1_$2_$4
      #       labels:
      #         network_processor: $3
      #       type: GAUGE
      #       cache: true
      #     - pattern: kafka.network<type=(RequestChannel), name=(RequestQueueSize)><>(Value)
      #       name: kafka_network_$1_$2_$3
      #       type: GAUGE
      #       cache: true
      #     - pattern : kafka.network<type=(RequestChannel), name=(ResponseQueueSize), processor=(\d+)><>(Value)
      #       name: kafka_network_$1_$2_$4
      #       type: GAUGE
      #       labels:
      #         processor: $3
      #       cache: true
      #     - pattern : kafka.network<type=(RequestMetrics), name=(ErrorsPerSec), request=(\w+), error=(\w+)><>(Count)
      #       name: kafka_network_$1_$2_$5
      #       type: COUNTER
      #       labels:
      #         request: $3
      #         error: $4
      #       cache: true
      #     - pattern : kafka.network<type=(RequestMetrics), name=(RequestsPerSec), request=(\w+), version=(\d+)><>(Count)
      #       name: kafka_network_$1_$2_$5
      #       type: COUNTER
      #       labels:
      #         request: $3
      #         version: $4
      #       cache: true
      #     - pattern : kafka.network<type=(RequestMetrics), name=(\w+TimeMs), request=(\w+)><>(Count)
      #       name: kafka_network_$1_$2_$4
      #       type: COUNTER
      #       labels:
      #         request: $3
      #       cache: true
      #     - pattern : kafka.network<type=(SocketServer), name=(NetworkProcessorAvgIdlePercent)><>(Value)
      #       name: kafka_network_$1_$2_$3
      #       type: GAUGE
      #       cache: true
      #     - pattern: kafka.server<type=(BrokerTopicMetrics), name=(\w+PerSec), topic=([-.\w]+)><>(Count)
      #       name: kafka_server_$1_$2_$4
      #       type: COUNTER
      #       labels:
      #         topic: $3
      #       cache: true
      #     - pattern: kafka.server<type=(BrokerTopicMetrics), name=(\w+PerSec)><>(Count)
      #       name: kafka_server_$1_$2_total_$3
      #       type: COUNTER
      #       cache: true
      #     - pattern: kafka.server<type=(DelayedOperationPurgatory), name=(\w+), delayedOperation=(\w+)><>(Value)
      #       name: kafka_server_$1_$2_$4
      #       type: GAUGE
      #       labels:
      #         delayed_operation: $3
      #       cache: true
      #     - pattern: kafka.server<type=(FetcherLagMetrics), name=(ConsumerLag), clientId=([-.\w]+), topic=([-.\w]+), partition=(\d+)><>(Value)
      #       name: kafka_server_$1_$2_$6
      #       type: GAUGE
      #       labels:
      #         client_id: $3
      #         topic: $4
      #         partition: $5
      #       cache: true
      #     - pattern: kafka.server<type=(FetcherStats), name=(\w+PerSec), clientId=([-.\w]+), brokerHost=([-.\w]+), brokerPort=(\d+)><>(Count)
      #       name: kafka_server_$1_$2_$6
      #       type: COUNTER
      #       labels:
      #         client_id: $3
      #         broker_host: $4
      #         broker_port: $5
      #       cache: true
      #     - pattern: kafka.server<type=(KafkaRequestHandlerPool), name=(\w+)><>(Count)
      #       name: kafka_server_$1_$2_$3
      #       type: COUNTER
      #       cache: true
      #     - pattern: kafka.server<type=(KafkaRequestHandlerPool), name=(\w+)><>(OneMinuteRate)
      #       name: kafka_server_$1_$2_$3
      #       type: GAUGE
      #       cache: true
      #     - pattern: kafka.server<type=(KafkaServer), name=(\w+)><>(Value)
      #       name: kafka_server_$1_$2_$3
      #       type: GAUGE
      #       cache: true
      #     - pattern: kafka.server<type=(Fetch|Produce|Request)><>(queue-size)
      #       name: kafka_server_$1_$2
      #       type: GAUGE
      #       cache: true
      #     - pattern: kafka.server<type=(app-info), id=(\d+)><>(StartTimeMs)
      #       name: kafka_server_$1_$3
      #       type: COUNTER
      #       labels:
      #         broker_id: $2
      #       cache: true
      #     - pattern: 'kafka.server<type=(app-info), id=(\d+)><>(Version): ([-.~+\w\d]+)'
      #       name: kafka_server_$1_$3
      #       type: COUNTER
      #       labels:
      #         broker_id: $2
      #         version: $4
      #       value: 1.0
      #       cache: false
      #     - pattern: kafka.server<type=(ReplicaFetcherManager), name=(\w+), clientId=([-.\w]+)><>(Value)
      #       name: kafka_server_$1_$2_$4
      #       type: GAUGE
      #       labels:
      #         client_id: $3
      #       cache: true
      #     - pattern: kafka.server<type=(ReplicaManager), name=(\w+)><>(Value)
      #       name: kafka_server_$1_$2_$3
      #       cache: true
      #     - pattern: kafka.server<type=(ReplicaManager), name=(\w+)><>(Count)
      #       name: kafka_server_$1_$2_$3
      #       type: COUNTER
      #       cache: true
      #     - pattern: kafka.server<type=(SessionExpireListener), name=(\w+)><>(Value)
      #       name: kafka_server_$1_$2_$3
      #       type: GAUGE
      #       cache: true
      #     - pattern: kafka.server<type=(SessionExpireListener), name=(\w+)><>(Count)
      #       name: kafka_server_$1_$2_$3
      #       type: COUNTER
      #       cache: true
      #     - pattern: kafka.server<type=(ZooKeeperClientMetrics), name=(\w+)><>(\d+)thPercentile
      #       name: kafka_server_$1_$2
      #       type: GAUGE
      #       labels:
      #         quantile: 0.$3
      #       cache: true
      #     - pattern: kafka.server<type=(ZooKeeperClientMetrics), name=(\w+)><>(Mean|Max|StdDev)
      #       name: kafka_server_$1_$2_$3
      #       type: GAUGE
      #       cache: true
      #     - pattern: kafka.server<type=(socket-server-metrics), listener=([-.\w]+), networkProcessor=(\d+)><>([-\w]+(?:-count|-rate))
      #       name: kafka_server_$1_$4
      #       type: GAUGE
      #       labels:
      #         listener: $2
      #         network_processor: $3
      #       cache: true
      #     - pattern: java.lang.+
      #       cache: true
      #     - pattern: java.nio.+
      #       cache: true
      oneBrokerPerNode: false
      clusterImage: "ghcr.io/banzaicloud/kafka:${BANZAI_KAFKA_VERSION}"
      readOnlyConfig: |
        auto.create.topics.enable=false
        offsets.topic.replication.factor=2
        cruise.control.metrics.topic.auto.create=true
        cruise.control.metrics.topic.num.partitions=12
        cruise.control.metrics.topic.replication.factor=2
        cruise.control.metrics.topic.min.insync.replicas=1
        super.users=User:CN=kafka-default;User:CN=kafka-kafka-operator;User:CN=supertubes-system-supertubes;User:CN=supertubes-system-supertubes-ui
      brokerConfigGroups:
        default:
          serviceAccountName: default
          brokerAnnotations:
            prometheus.istio.io/merge-metrics: "false"
          storageConfigs:
            - mountPath: "/kafka-logs"
              pvcSpec:
                accessModes:
                  - ReadWriteOnce
                resources:
                  requests:
                    storage: 10Gi
      brokers:
        - id: 0
          brokerConfigGroup: "default"
        - id: 1
          brokerConfigGroup: "default"
      rollingUpgradeConfig:
        failureThreshold: 1
      listenersConfig:
        internalListeners:
          - type: "plaintext"
            name: "internal"
            containerPort: 29092
            usedForInnerBrokerCommunication: true
          - type: "plaintext"
            name: "controller"
            containerPort: 29093
            usedForInnerBrokerCommunication: false
            usedForControllerCommunication: true
      cruiseControlConfig:
        serviceAccountName: default
        config: |
          metadata.max.age.ms=60000
          client.id=kafka-cruise-control
          send.buffer.bytes=131072
          receive.buffer.bytes=131072
          connections.max.idle.ms=540000
          reconnect.backoff.ms=50
          request.timeout.ms=30000
          logdir.response.timeout.ms=10000
          num.metric.fetchers=1
          metric.sampler.class=com.linkedin.kafka.cruisecontrol.monitor.sampling.CruiseControlMetricsReporterSampler
          sampling.allow.cpu.capacity.estimation=true
          metric.reporter.topic=__CruiseControlMetrics
          sample.store.class=com.linkedin.kafka.cruisecontrol.monitor.sampling.KafkaSampleStore
          partition.metric.sample.store.topic=__KafkaCruiseControlPartitionMetricSamples
          broker.metric.sample.store.topic=__KafkaCruiseControlModelTrainingSamples
          sample.store.topic.replication.factor=2
          num.sample.loading.threads=8
          metric.sampler.partition.assignor.class=com.linkedin.kafka.cruisecontrol.monitor.sampling.DefaultMetricSamplerPartitionAssignor
          metric.sampling.interval.ms=120000
          partition.metrics.window.ms=300000
          num.partition.metrics.windows=12
          min.samples.per.partition.metrics.window=1
          broker.metrics.window.ms=300000
          num.broker.metrics.windows=20
          min.samples.per.broker.metrics.window=1
          capacity.config.file=config/capacity.json
          prometheus.server.endpoint=http://localhost:9090
          prometheus.query.resolution.step.ms=60000
          prometheus.query.supplier=com.linkedin.kafka.cruisecontrol.monitor.sampling.prometheus.DefaultPrometheusQuerySupplier
          default.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal
          goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.kafkaassigner.KafkaAssignerDiskUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.kafkaassigner.KafkaAssignerEvenRackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.PreferredLeaderElectionGoal
          intra.broker.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.IntraBrokerDiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.IntraBrokerDiskUsageDistributionGoal
          hard.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal
          min.valid.partition.ratio=0.95
          cpu.balance.threshold=1.1
          disk.balance.threshold=1.1
          network.inbound.balance.threshold=1.1
          network.outbound.balance.threshold=1.1
          replica.count.balance.threshold=1.1
          cpu.capacity.threshold=0.7
          disk.capacity.threshold=0.8
          network.inbound.capacity.threshold=0.8
          network.outbound.capacity.threshold=0.8
          cpu.low.utilization.threshold=0.15
          disk.low.utilization.threshold=0.15
          network.inbound.low.utilization.threshold=0.15
          network.outbound.low.utilization.threshold=0.15
          metric.anomaly.percentile.upper.threshold=90.0
          metric.anomaly.percentile.lower.threshold=10.0
          proposal.expiration.ms=60000
          max.replicas.per.broker=10000
          num.proposal.precompute.threads=1
          topics.excluded.from.partition.movement=
          leader.replica.count.balance.threshold=1.1
          topic.replica.count.balance.threshold=3.0
          goal.balancedness.priority.weight=1.1
          goal.balancedness.strictness.weight=1.5
          default.replica.movement.strategies=com.linkedin.kafka.cruisecontrol.executor.strategy.BaseReplicaMovementStrategy
          topic.replica.count.balance.min.gap=2
          topic.replica.count.balance.max.gap=40
          maintenance.event.class=com.linkedin.kafka.cruisecontrol.detector.MaintenanceEvent
          maintenance.event.reader.class=com.linkedin.kafka.cruisecontrol.detector.NoopMaintenanceEventReader
          maintenance.event.enable.idempotence=true
          maintenance.event.idempotence.retention.ms=180000
          maintenance.event.max.idempotence.cache.size=25
          maintenance.event.stop.ongoing.execution=true
          zookeeper.security.enabled=false
          num.concurrent.partition.movements.per.broker=10
          num.concurrent.intra.broker.partition.movements=2
          num.concurrent.leader.movements=1000
          execution.progress.check.interval.ms=10000
          anomaly.notifier.class=com.linkedin.kafka.cruisecontrol.detector.notifier.SelfHealingNotifier
          metric.anomaly.finder.class=com.linkedin.kafka.cruisecontrol.detector.KafkaMetricAnomalyFinder
          anomaly.detection.interval.ms=300000
          goal.violation.detection.interval.ms=300000
          metric.anomaly.detection.interval.ms=300000
          disk.failure.detection.interval.ms=300000
          topic.anomaly.detection.interval.ms=300000
          anomaly.detection.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal
          metric.anomaly.analyzer.metrics=BROKER_PRODUCE_LOCAL_TIME_MS_50TH,BROKER_PRODUCE_LOCAL_TIME_MS_999TH,BROKER_CONSUMER_FETCH_LOCAL_TIME_MS_50TH,BROKER_CONSUMER_FETCH_LOCAL_TIME_MS_999TH,BROKER_FOLLOWER_FETCH_LOCAL_TIME_MS_50TH,BROKER_FOLLOWER_FETCH_LOCAL_TIME_MS_999TH,BROKER_LOG_FLUSH_TIME_MS_50TH,BROKER_LOG_FLUSH_TIME_MS_999TH
          broker.failure.exclude.recently.demoted.brokers=true
          broker.failure.exclude.recently.removed.brokers=true
          goal.violation.exclude.recently.demoted.brokers=true
          goal.violation.exclude.recently.removed.brokers=true
          failed.brokers.zk.path=/CruiseControlBrokerList
          topic.config.provider.class=com.linkedin.kafka.cruisecontrol.config.KafkaTopicConfigProvider
          cluster.configs.file=config/clusterConfigs.json
          completed.kafka.monitor.user.task.retention.time.ms=86400000
          completed.cruise.control.monitor.user.task.retention.time.ms=86400000
          completed.kafka.admin.user.task.retention.time.ms=604800000
          completed.cruise.control.admin.user.task.retention.time.ms=604800000
          completed.user.task.retention.time.ms=86400000
          demotion.history.retention.time.ms=1209600000
          removal.history.retention.time.ms=1209600000
          max.cached.completed.kafka.monitor.user.tasks=20
          max.cached.completed.cruise.control.monitor.user.tasks=20
          max.cached.completed.kafka.admin.user.tasks=30
          max.cached.completed.cruise.control.admin.user.tasks=30
          max.cached.completed.user.tasks=500
          max.active.user.tasks=25
          self.healing.enabled=true
          self.healing.broker.failure.enabled=true
          self.healing.goal.violation.enabled=true
          self.healing.metric.anomaly.enabled=true
          self.healing.disk.failure.enabled=true
          self.healing.topic.anomaly.enabled=false
          self.healing.exclude.recently.demoted.brokers=true
          self.healing.exclude.recently.removed.brokers=true
          topic.anomaly.finder.class=com.linkedin.kafka.cruisecontrol.detector.NoopTopicAnomalyFinder
          self.healing.maintenance.event.enabled=true
          goal.violation.distribution.threshold.multiplier=1.0
          self.healing.target.topic.replication.factor=1
          topic.excluded.from.replication.factor.check=
          topic.replication.topic.anomaly.class=com.linkedin.kafka.cruisecontrol.detector.TopicReplicationFactorAnomaly
          topic.replication.factor.margin=1
          topic.min.isr.record.retention.time.ms=43200000
          webserver.http.port=9090
          webserver.http.address=0.0.0.0
          webserver.http.cors.enabled=false
          webserver.http.cors.origin=http://localhost:8080/
          webserver.http.cors.allowmethods=OPTIONS,GET,POST
          webserver.http.cors.exposeheaders=User-Task-ID
          webserver.api.urlprefix=/kafkacruisecontrol/*
          webserver.ui.diskpath=./cruise-control-ui/dist/
          webserver.ui.urlprefix=/*
          webserver.request.maxBlockTimeMs=10000
          webserver.session.maxExpiryTimeMs=60000
          webserver.session.path=/
          webserver.accesslog.enabled=true
          webserver.accesslog.path=access.log
          webserver.accesslog.retention.days=14
          two.step.verification.enabled=false
          two.step.purgatory.retention.time.ms=1209600000
          two.step.purgatory.max.requests=25
        clusterConfig: |
          {
            "min.insync.replicas": 1
          }
    EOF
    
  3. Create the sdm-kafka-cluster Application CR in the apps/sdm-kafka-cluster directory.

    mkdir -p apps/sdm-kafka-cluster
    
    ARGOCD_CLUSTER_NAME="${WORKLOAD_CLUSTER_1_CONTEXT}" ; cat > apps/sdm-kafka-cluster/sdm-kafka-cluster-app.yaml <<EOF
    # apps/sdm-kafka-cluster/sdm-kafka-cluster-app.yaml
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: sdm-kafka-cluster
      namespace: argocd
      finalizers:
        - resources-finalizer.argocd.argoproj.io
    spec:
      project: default
      source:
        repoURL: https://github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git
        targetRevision: HEAD
        path: manifests/sdm-kafka-cluster
      destination:
        name: ${ARGOCD_CLUSTER_NAME}
        namespace: kafka
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
        - Validate=false
        - CreateNamespace=true
        - PrunePropagationPolicy=foreground
        - PruneLast=true
    EOF
    
  4. Commit and push the calisti-gitops repository.

    git add apps/sdm-kafka-cluster manifests/sdm-kafka-cluster
    git commit -m "add sdm-kafka-cluster app"
    

    Expected output:

    [main 13402f4] add sdm-kafka-cluster app
    2 files changed, 712 insertions(+)
    create mode 100644 apps/sdm-kafka-cluster/sdm-kafka-cluster-app.yaml
    create mode 100644 manifests/sdm-kafka-cluster/sdm-kafka-cluster.yaml
    
    git push
    

    Expected output:

    Enumerating objects: 11, done.
    Counting objects: 100% (11/11), done.
    Delta compression using up to 12 threads
    Compressing objects: 100% (8/8), done.
    Writing objects: 100% (8/8), 8.93 KiB | 4.47 MiB/s, done.
    Total 8 (delta 3), reused 0 (delta 0), pack-reused 0
    remote: Resolving deltas: 100% (3/3), completed with 3 local objects.
    To github.com:github-id/calisti-gitops.git
      d3e8b60..13402f4  main -> main
    
  5. Apply the sdm-kafka-cluster Application CR.

    kubectl apply -f apps/sdm-kafka-cluster/sdm-kafka-cluster-app.yaml
    

    Expected output:

    application.argoproj.io/sdm-kafka-cluster created
    
  6. Verify that the application has been added to Argo CD and is healthy.

    argocd app list
    

    Expected output:

    NAME               CLUSTER             NAMESPACE  PROJECT  STATUS  HEALTH   SYNCPOLICY  CONDITIONS  REPO                                             PATH                                        TARGET
    ...
    sdm-kafka-cluster  workload-cluster-1  kafka      default  Synced  Healthy  Auto-Prune  <none>      https://github.com/github-id/calisti-gitops.git  manifests/sdm-kafka-cluster                 HEAD
    ...
    
  7. Wait about 5 minutes and check if all pods are healthy and running in the kafka namespace on workload-cluster-1.

    kubectl get pods -n kafka --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}"
    

    Expected output:

    NAME                                       READY   STATUS    RESTARTS   AGE
    kafka-0-j5vq4                              2/2     Running   0          5d
    kafka-1-dv9p7                              2/2     Running   0          5d
    kafka-2-wx55p                              2/2     Running   0          5d
    ...
    kafka-cruisecontrol-775fd5f6fb-d5tml       2/2     Running   0          5d
    kafka-kminion-59cf847cdb-hc7zt             2/2     Running   3          5d
    kafka-operator-operator-6cb66c5dbd-7mwxv   3/3     Running   2          5d