Cisco Streaming Data Manager (Streaming Data Manager) is the deployment tool for setting up and operating production-ready Apache Kafka clusters on Kubernetes, leveraging a Cloud Native technology stack. Streaming Data Manager includes Zookeeper, Koperator, Envoy, and many other components hosted in a managed service mesh. All components are automatically installed, configured, and managed in order to operate a production-ready Kafka cluster on Kubernetes.
Some of the key features of Streaming Data Manager are:
- Fine-grained broker configuration support for heterogeneous cluster layouts.
- Declarative topic and user management through custom resources (CRs).
- Automatic, mTLS-based encrypted and authenticated communication between all Streaming Data Manager components.
- Advanced Grafana dashboards to monitor all Streaming Data Manager components.
- Automatic reaction and self-healing based on Prometheus alerts.
- Alert-based reactions for graceful up and downscaling and adding volumes to brokers.
- Disaster recovery using volume snapshots and cross-cluster replication using MirrorMaker2.
- Rolling upgrades for continuous operations.
Koperator (formerly called Banzai Cloud Kafka operator) is a core part of Cisco Streaming Data Manager that helps you create production-ready Apache Kafka cluster on Kubernetes, with scaling, rebalancing, and alerts based self healing. While the Koperator itself is an open-source project, the Cisco Streaming Data Manager product extends the functionality of the Koperator with commercial features (for example, declarative ACL handling, built-in monitoring, and multiple ways of disaster recovery). Read a detailed comparison of Streaming Data Manager and the Koperator .
What makes Streaming Data Manager unique?
Cisco Streaming Data Manager is specifically built to run and manage Apache Kafka on Kubernetes. Other solutions use Kubernetes StatefulSets to run Apache Kafka, but this approach is not really suitable for Apache Kafka. Streaming Data Manager is based on simple Kubernetes resources (Pods, ConfigMaps, and PersistentVolumeClaims), allowing a much more flexible approach that makes possible to:
- Modify the configuration of unique Brokers: you can configure every Broker individually.
- Remove specific Brokers from clusters.
- Use multiple Persistent Volumes for each Broker.
- Do rolling upgrades without data loss or service disruption.
On-prem, multi-cloud, and hybrid-cloud support
Streaming Data Manager allows you to run large Apache Kafka clusters not only in the cloud, but also on-premises, multi-cloud, and hybrid-cloud environments. Streaming Data Manager helps you run Kafka over Istio, and to automate the creation of Kafka clusters across single-cloud multi AZ, multi-cloud and especially hybrid-cloud environments.
Running Apache Kafka over Istio
Streaming Data Manager makes running Apache Kafka over Istio extremely easy, bringing additional security benefits, scalability and durability, locality based load balancing, and more.
Apache Kafka protocol support in Envoy
Envoy is a next generation network proxy, built for the cloud native era. It supports a wide variety of application protocols, including the client protocol for Kafka. The benefits of a network proxy understanding higher level protocol implementations are huge. In case of Kafka, the list of benefits include:
- Out of the box tracing and monitoring within a Kafka mesh
- Consumer group metrics
- Information about apps and their version of the client libraries
- Request validation
- Protocol version translations
- Automatic topic name conversions without having to modify the clients
- Mirroring topics to another clusters (we run many hybrid Kubernetes clusters)
- Functional parity across runtimes
External Access via LoadBalancer
Streaming Data Manager externalizes access to Apache Kafka using a single LoadBalancer, so there’s no need for a LoadBalancer for each Broker.
Event-based scaling and self-healing
Streaming Data Manager exposes Cruise Control and Kafka JMX metrics to Prometheus, and acts as a Prometheus Alert Manager. It receives alerts defined in Prometheus, and creates actions based on Prometheus alert annotations, so it can handle and react to alerts automatically, without having to involve human operators.
Graceful Apache Kafka Cluster Scaling
To scale Kafka clusters both up and down gracefully, Streaming Data Manager integrates LinkedIn’s Cruise Control, and is configured to react to events. The three default actions are:
- upscale cluster (add a new Broker)
- downscale cluster (remove a Broker)
- add additional disk to a Broker
You can also define custom actions.
Vertical capacity scaling
There are many situations in which the horizontal scaling of a cluster is impossible. When only one Broker is throttling and needs more CPU or requires additional disks (because it handles the most partitions), a StatefulSet-based solution is useless, since it does not distinguish between replicas' specifications. The handling of such a case requires unique Broker configurations. If we need to add a new disk to a unique Broker, with a StatefulSet-based solution, we waste a lot of disk space (and money). Since it can’t add a disk to a specific Broker, the StatefulSet adds one to each replica.
With Streaming Data Manager, adding a new disk to any Broker is as easy as changing a CR configuration. Similarly, any Broker-specific configuration can be done on a Broker by Broker basis.
Streaming Data Manager fully automates managed mutual TLS (mTLS) encryption and authentication. You don’t need to configure your brokers to use SSL, as Streaming Data Manager provides mTLS out-of-the box at the network layer (implemented through a light-weight, managed Istio mesh). All services deployed by Streaming Data Manager (Apache ZooKeeper, , the Apache Kafka cluster, Cruise Control, MirrorMaker2, and so on) interact with each other using mTLS.
- If the client application is included into the managed light-weight Istio mesh used by Streaming Data Manager than the client application will interact with Kafka over mTLS out of the box, no SSL configuration is needed at the client application.
- If the client application runs outside the managed mesh, the client application needs to be configured to use a client certificate that was signed by the same CA as what the mesh is using. Such a client certificate can be issued by executing
smm sdm istio certificate generate-client-certificate. Note that some additional configuration is needed in this scenario which is use case specific.
- If the client application runs outside the Kubernetes cluster that hosts the Kafka cluster, the scenario is essentially the same as when the client application runs outside the managed mesh. The only difference is that the client application must connect to the external endpoint of the Kafka cluster instead of the internal endpoint which is only reachable from within the Kubernetes cluster.
Kafka disaster recovery on Kubernetes with CSI allows you to backup and restore your clusters using Kubernetes volume snapshots. While this solution provides a good enough disaster recovery option (and a super quick recovery), it doesn’t help when the entire Kubernetes cluster hosting the Kafka cluster is lost.
MirrorMaker2 leverages the Connect framework to replicate topics between Kafka clusters, and has the following benefits:
- Both topics and consumer groups are replicated.
- Topic configuration and ACLs are replicated.
- Cross-cluster offsets are synchronized.
- Partitioning is preserved.
Streaming Data Manager is using MirrorMaker2 to set up cross-cluster replication between remote Kafka clusters and recover a lost Kafka cluster from a remote Kafka cluster. It deploys a MirrorMaker2 instance for each Kafka cluster into the same namespace where the Kafka cluster resides.
Kafka Connect and connector support
Streaming Data Manager deploys Kafka Connect with Confluent’s Community Connectors included, and supports the use of schemas via Schema Registry.
Streaming Data Manager automates the deployment of Kafka Connect clusters and the creation of connectors declaratively through KafkaConnect custom resource instances and KafkaConnector custom resource instances.
Streaming Data Manager supports integration with Confluent’s ksqlDB event streaming database.
ksqlDB can be optionally installed through Streaming Data Manager CLI. The server automatically connects to the existing Kafka Connect and Schema Registry instances managed by Streaming Data Manager. You can create ksqlDB server deployments declaratively through ksqlDB custom resource instances.
Streaming Data Manager gives you an observability dashboard to help you monitor all the key metrics of your Apache Kafka cluster at a glance and in a single place. It indicates potential issues and what you should keep an eye on, so you can prepare for anomalies and then act accordingly. You can observe the core components of your cluster under the labels of Topics, Brokers and Consumer Groups, and dig deeper into each of these for further information - like number of messages, partition sizes, low and high watermark offsets, in-sync replicas, and so on.
Koperator and Streaming Data Manager
Our solution to run Apache Kafka on Kubernetes comes in the following flavours:
- The Koperator is an open source project that delivers the basic functionality of our solution.
- Cisco Streaming Data Manager is a commercial product that includes all the features mentioned in this guide, commercial support, and optionally integration support.