Overview

Calisti’s Streaming Data Manager is the deployment tool for setting up and operating production-ready Apache Kafka clusters on Kubernetes, leveraging a Cloud Native technology stack. Streaming Data Manager includes Zookeeper, Koperator, Envoy, and many other components hosted in a managed service mesh. All components are automatically installed, configured, and managed in order to operate a production-ready Kafka cluster on Kubernetes.

Key features

Some of the key features of Streaming Data Manager are:

  • Fine-grained broker configuration support for heterogeneous cluster layouts.
  • Declarative topic and user management through custom resources (CRs).
  • Automatic, mTLS-based encrypted and authenticated communication between all Streaming Data Manager components.
  • Advanced Grafana dashboards to monitor all Streaming Data Manager components.
  • Automatic reaction and self-healing based on Prometheus alerts.
  • Alert-based reactions for graceful up and downscaling and adding volumes to brokers.
  • Disaster recovery using volume snapshots and cross-cluster replication using MirrorMaker2.
  • Data migration from Kafka environments.
  • Rolling upgrades for continuous operations.

Koperator (formerly called Banzai Cloud Kafka operator) is a core part of Calisti’s Streaming Data Manager, that helps you create production-ready Apache Kafka cluster on Kubernetes, with scaling, rebalancing, and alerts based self healing. While the Koperator itself is an open-source project, the Streaming Data Manager extends the functionality of the Koperator with commercial features (for example, declarative ACL handling, built-in monitoring, and multiple ways of disaster recovery). Read a detailed comparison of Streaming Data Manager and the Koperator.

Architecture

Streaming Data Manager architecture

What makes Streaming Data Manager unique?

Streaming Data Manager is specifically built to run and manage Apache Kafka on Kubernetes. Other solutions use Kubernetes StatefulSets to run Apache Kafka, but this approach is not really suitable for Apache Kafka. Streaming Data Manager is based on simple Kubernetes resources (Pods, ConfigMaps, and PersistentVolumeClaims), allowing a much more flexible approach that makes possible to:

  • Modify the configuration of unique Brokers: you can configure every Broker individually.
  • Remove specific Brokers from clusters.
  • Use multiple Persistent Volumes for each Broker.
  • Do rolling upgrades without data loss or service disruption.

On-prem, multi-cloud, and hybrid-cloud support

Streaming Data Manager allows you to run large Apache Kafka clusters not only in the cloud, but also on-premises, multi-cloud, and hybrid-cloud environments. Streaming Data Manager helps you run Kafka over Istio, and to automate the creation of Kafka clusters across single-cloud multi AZ, multi-cloud and especially hybrid-cloud environments.

Apache Kafka in on-prem, multi-cloud, and hybrid-cloud environments

Running Apache Kafka over Istio

Streaming Data Manager makes running Apache Kafka over Istio extremely easy, bringing additional security benefits, scalability and durability, locality based load balancing, and more.

Apache Kafka protocol support in Envoy

Envoy is a next generation network proxy, built for the cloud native era. It supports a wide variety of application protocols, including the client protocol for Kafka. The benefits of a network proxy understanding higher level protocol implementations are huge. In case of Kafka, the list of benefits include:

  • Out of the box tracing and monitoring within a Kafka mesh
  • Consumer group metrics
  • Information about apps and their version of the client libraries
  • Request validation
  • Protocol version translations
  • Automatic topic name conversions without having to modify the clients
  • Mirroring topics to another clusters (we run many hybrid Kubernetes clusters)
  • Functional parity across runtimes

External Access via LoadBalancer

Streaming Data Manager externalizes access to Apache Kafka using a single LoadBalancer, so there’s no need for a LoadBalancer for each Broker.

Kafka External

Event-based scaling and self-healing

Streaming Data Manager exposes Cruise Control and Kafka JMX metrics to Prometheus, and acts as a Prometheus Alert Manager. It receives alerts defined in Prometheus, and creates actions based on Prometheus alert annotations, so it can handle and react to alerts automatically, without having to involve human operators.

Graceful Apache Kafka Cluster Scaling

To scale Kafka clusters both up and down gracefully, Streaming Data Manager integrates LinkedIn’s Cruise Control, and is configured to react to events. The three default actions are:

  • upscale cluster (add a new Broker)
  • downscale cluster (remove a Broker)
  • add additional disk to a Broker

You can also define custom actions.

Vertical capacity scaling

There are many situations in which the horizontal scaling of a cluster is impossible. When only one Broker is throttling and needs more CPU or requires additional disks (because it handles the most partitions), a StatefulSet-based solution is useless, since it does not distinguish between replicas' specifications. The handling of such a case requires unique Broker configurations. If we need to add a new disk to a unique Broker, with a StatefulSet-based solution, we waste a lot of disk space (and money). Since it can’t add a disk to a specific Broker, the StatefulSet adds one to each replica.

With Streaming Data Manager, adding a new disk to any Broker is as easy as changing a CR configuration. Similarly, any Broker-specific configuration can be done on a Broker by Broker basis.

SSL-encrypted communication

Streaming Data Manager fully automates managed mutual TLS (mTLS) encryption and authentication. You don’t need to configure your brokers to use SSL, as Streaming Data Manager provides mTLS out-of-the box at the network layer (implemented through a light-weight, managed Istio mesh). All services deployed by Streaming Data Manager (Apache ZooKeeper, , the Apache Kafka cluster, Cruise Control, MirrorMaker2, and so on) interact with each other using mTLS.

Kafka SSL

  • If the client application is included into the managed light-weight Istio mesh used by Streaming Data Manager than the client application will interact with Kafka over mTLS out of the box, no SSL configuration is needed at the client application.
  • If the client application runs outside the managed mesh, the client application needs to be configured to use a client certificate that was signed by the same CA as what the mesh is using. Such a client certificate can be obtained by following the steps in Client applications are in the same Kubernetes cluster as the Kafka cluster, but outside the Istio mesh
  • If the client application runs outside the Kubernetes cluster that hosts the Kafka cluster, the scenario is essentially the same as when the client application runs outside the managed mesh. The only difference is that the client application must connect to the external endpoint of the Kafka cluster instead of the internal endpoint which is only reachable from within the Kubernetes cluster. Detailed steps can be found in Client applications are outside the Kubernetes cluster

Disaster recovery

CSI

Kafka disaster recovery on Kubernetes with CSI allows you to backup and restore your clusters using Kubernetes volume snapshots. While this solution provides a good enough disaster recovery option (and a super quick recovery), it doesn’t help when the entire Kubernetes cluster hosting the Kafka cluster is lost.

MM2

MirrorMaker2 leverages the Connect framework to replicate topics between Kafka clusters, and has the following benefits:

  • Both topics and consumer groups are replicated.
  • Topic configuration and ACLs are replicated.
  • Cross-cluster offsets are synchronized.
  • Partitioning is preserved.

Streaming Data Manager is using MirrorMaker2 to set up cross-cluster replication between remote Kafka clusters and recover a lost Kafka cluster from a remote Kafka cluster. It deploys a MirrorMaker2 instance for each Kafka cluster into the same namespace where the Kafka cluster resides.

Cross-cluster replication

Read more about disaster recovery with Streaming Data Manager.

Data migration

Data migration leverages MirrorMaker2 to support moving Kafka topics and their data into Streaming Data Manager. This solution has the benefits of being easy to deploy and use and being usable in any existing Kafka environment, while having very few drawbacks.

Read more about data migration

Kafka Connect and connector support

Streaming Data Manager deploys Kafka Connect with Confluent’s Community Connectors included, and supports the use of schemas via Schema Registry.

Streaming Data Manager automates the deployment of Kafka Connect clusters and the creation of connectors declaratively through KafkaConnect custom resource instances and KafkaConnector custom resource instances.

Streaming Data Manager Kafka connect structure

Read more about using Kafka Connect with Streaming Data Manager.

ksqlDB support

Streaming Data Manager supports integration with Confluent’s ksqlDB event streaming database.

ksqlDB can be optionally installed through Streaming Data Manager CLI. The server automatically connects to the existing Kafka Connect and Schema Registry instances managed by Streaming Data Manager. You can create ksqlDB server deployments declaratively through ksqlDB custom resource instances.

Streaming Data Manager interactive ksqlDB

Read more about using ksqlDB with Streaming Data Manager.

Observability dashboard

Streaming Data Manager gives you an observability dashboard to help you monitor all the key metrics of your Apache Kafka cluster at a glance and in a single place. It indicates potential issues and what you should keep an eye on, so you can prepare for anomalies and then act accordingly. You can observe the core components of your cluster under the labels of Topics, Brokers and Consumer Groups, and dig deeper into each of these for further information - like number of messages, partition sizes, low and high watermark offsets, in-sync replicas, and so on.

Streaming Data Manager Dashboard

Read more about the Streaming Data Manager dashboard.

Koperator and Streaming Data Manager

Our solution to run Apache Kafka on Kubernetes comes in the following flavours:

  • The Koperator is an open source project that delivers the basic functionality of our solution.
  • Calisti with Streaming Data Manager is a commercial product that includes all the features mentioned in this guide, commercial support, and optionally integration support.

Read the detailed comparison.

Next steps