Alerting

You can use Prometheus AlertManager for alerting in Streaming Data Manager. To enable AlertManager in your Streaming Data Manager deployment, complete the following steps.

Prerequisites

Streaming Data Manager version 0.6.3 or later.

Steps

  1. Edit your ApplicationManifest custom resource. You can download your ApplicationManifest custom resource that is currently used on your cluster by running:

    kubectl get applicationmanifests.supertubes.banzaicloud.io  applicationmanifest -o yaml > appmanifest.yaml
    
  2. Update the monitoring.prometheusOperator.valuesOverride section of the custom resource to enable Prometheus AlertManager. For example:

    prometheusOperator:
     ...
     valuesOverride: |-
       prometheus:
         prometheusSpec:
           alertingEndpoints:
             - namespace: supertubes-system
               name: prometheus-operator-alertmanager
               port: http-web
               pathPrefix: "/"
               apiVersion: v2
       defaultRules:
         rules:
           alertmanager: true
       alertmanager:
         enabled: true   
    

    Note: this configuration won’t enable any Prometheus alerting rules. Alerting rules must be defined separately through PrometheusRule custom resources.

    For all the configuration settings that can be provided through the monitoring.prometheusOperator.valuesOverride field, see https://hub.helm.sh/charts/stable/prometheus-operator/8.15.8

  3. Reconcile the deployed Streaming Data Manager components, for example:

    supertubes reconcile --from-file appmanifest.yaml -c <path-to-kubeconfig-file>
    
  4. Create and apply Prometheus alerting rules:

    kubectl apply -n kafka -f- <<EOF
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
    labels:
        prometheus: kafka-rules
        banzaicloud.io/managed-by: supertubes
    name: kafka-prometheus-rules
    spec:
        # Alerting rules
    

    The following example defines an alerting rule for ZooKeeper. For details on alerting rules, see the official Prometheus documentation.

    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      labels:
        prometheus: kafka-rules
        banzaicloud.io/managed-by: supertubes
      name: kafka-prometheus-rules
    spec:
      groups:
      - name: ZookeeperAlerts
        rules:
        - alert: QuorumDown
          expr: max by(namespace,label_app) (zookeeper_QuorumSize{kind="ZookeeperMember"}) < on(namespace,label_app) ceil((kube_statefulset_replicas * on(namespace,statefulset) group_left(label_app) kube_statefulset_labels) / 2)
          for: 5m
          labels:
            severity: critical
          annotations:
            message: The Zookeeper cluster {{ $labels.namespace}}/{{ $labels.label_app }} is not operational as there are less Zookeeper servers than needed to form a quorum
        - alert: MinQuorum
          expr: max by(namespace,label_app) (zookeeper_QuorumSize{kind="ZookeeperMember"}) > 1 and max by(namespace,label_app) (zookeeper_QuorumSize{kind="ZookeeperMember"}) == on(namespace,label_app) ceil((kube_statefulset_replicas * on(namespace,statefulset) group_left(label_app) kube_statefulset_labels) / 2)