Streaming Data Manager automates the deployment of the ksqlDB event streaming database by introducing a new custom resource called KsqlDB. Streaming Data Manager provides two modes to manage ksqlDB backend(s):
Both methods use the KsqlDB Custom Resource Definition under the hood to manage ksqlDB instances.
Imperative management of ksqlDB instances
The Streaming Data Manager CLI provides commands to deploy ksqlDB instances with either default or custom settings with ease.
Note: To deploy ksqlDB instances or manage existing ones, run the
smm sdm cluster ksql createand
smm sdm cluster ksql updatecommands.
Declarative management of ksqlDB instances
Managing ksqlDB instances with Streaming Data Manager is as simple as creating and updating the
KsqlDB custom resource. Streaming Data Manager automatically monitors the ksqlDB deployment and configuration settings specified using the
KsqlDB custom resource. For details on the custom resource, see the description of the custom resource.
These will perform the necessary steps to spin up new ksqlDB instances or reconfigure existing ones with the desired configuration.
Introduction to ksqlDB
For a detailed description on how to manage ksqlDB with Streaming Data Manager, see our Managing ksqlDB with Streaming Data Manager blog post.
Modes of operation
The ksqlDB server has two modes of operation: interactive and non-interactive (or headless) mode. For details, see the official ksqlDB documentation.
Streaming Data Manager supports both modes, and uses the interactive mode by default. To enable and configure ksqlDB in headless mode, see Running ksqlDB in headless mode.
Scaling by HPA
Streaming Data Manager takes care of scaling ksqlDB using a Horizontal Pod Autoscaler (HPA). The twist here is that by default, HPAs only support scaling through basic CPU or memory usage. While that’s generally enough for most workloads, in the case of ksqlDB it’s much better to scale by
When ksqlDB cannot keep up with the rate of messages produced on your Kafka topics, it can fall behind in its processing of incoming data. Scaling by consumer lag helps solve this issue far better than scaling by any traditional metric. In the Streaming Data Manager ecosystem, we already track consumer lag in our Prometheus instance.
To enable HPA to understand the consumer lag metrics, deploy the kube-metrics-adapter helm chart. An already deployed and configured HPA will do the rest for you.
# Default HPA configuration scaling: prometheusUrl: http://prometheus-operator-prometheus.supertubes-system.svc:9090 # Name of the ksqlDB streams that the PrometheusMetric will be filtered by streams:  # Minimum number of replicas minValue: 1 # Maximum number of replicas maxValue: 5 # Threshold for the hpa to activate threshold: 30
Streaming Data Manager security features (like Kafka ACLs) apply to the ksqlDB deployment as well. The following sections detail the additional options that allow you to configure security for ksqlDB.
You can configure the authorization policy through the
authorizations field of the
KsqlDB custom resource. Only the listed principals can access the ksqlDB server.
You can list arbitrary number of
ServiceAccount entities in the specification.
Example authorization settings
Here’s an example authorization spec, that allows traffic to the ksqlDB server for the
user-1 user and the
default service account.
Authorizations: - Principal: Kind: KafkaUser Namespace: kafka Name: user-1 - Principal: Kind: ServiceAccount Namespace: kafka Name: default
Access ksqlDB from outside the service mesh
In order to access the ksqlDB from a CLI instance which is outside the service mesh, you have to configure the certificates manually.
- Extract the certificates from Istio as described in Client applications outside the Istio mesh .
- Use that certificate to configure the CLI as described in the ksqlDB’s documentation.
Access ksqlDB from outside the Kubernetes cluster
In order to access the ksqlDB from a client application which is outside the Kubernetes cluster, do the following:
ksqlDBcustom resource with the following fields: Set
trueand set the
istioControlPlanefield is a reference to the IstioControlPlane custom resource and is mandatory from SDM version 1.7.0+ as SDM 1.7.0+ uses Istio operator v2, which supports multiple Istio control planes on the same cluster. This is why the corresponding control plane to the Istio ingress gateway must be specified.
spec: ... externalEndpoint: enabled: true istioIngressConfig: ... istioControlPlane: name: <name of the IstioControlPlane custom resource> namespace: <namespace of the IstioControlPlane custom resource> ...
(Optional) You can set
falseif you don’t need client authentication and secure communication between the client application and external endpoint.
spec: ... MTLS: false ...
Streaming Data Manager manages ACLs for
ksqlDB and even provides a way to fine grain your configuration through the KsqlDB Custom Resource Definition. For example:
... Spec: # Input topics to be used in ksql queries for reading inputTopics:  # Output topics to be used in ksql queries for write and create outputTopics:  ...
The KsqlDB custom resource definition
apiVersion: kafka.banzaicloud.io/v1beta1 kind: KsqlDB metadata: name: ksqldb-sample namespace: kafka spec: # Name of the KafkaCluster custom resource that represents the Kafka cluster this ksqlDB instance to connect to clusterRef: name: kafka # Name of the SchemaRegistry custom resource that represents the Schema registry to be made available for ksqlDB schemaRegistryRef: # Name of the KafkaConnect custom resource that represents the Kafka Connect to be made available for ksqlDB kafkaConnectRef: # Controls whether mTLS is enforced between ksqlDB and client applications (default: true) MTLS: true # Affinity settings for ksqlDB pods # see https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity affinity: # Controls the list of principals who are authorized to access the ksqlDB REST API authorizations: # Settings for exposing ksqlDB REST API outside the Kubernetes cluster when running in interactive mode externalEndpoint: # Controls whether to expose KsqlDB outside the Kubernetes cluster (default: false) enabled: # Configuration for the ingress controller when KsqlDB running in interactive mode is exposed outside the Kubernetes cluster istioIngressConfig: # IstioControlPlane specifies the namespace and name of the IstioControlPlane custom resource # which represents the Istio control plane. Starting from SDM 1.7.0 this field is required if KsqlDB is exposed # outside of the Kubernetes cluster. istioControlPlane: name: namespace: # Controls whether the ksqlDB is running in headless or interactive mode (default: false) headless: false # Heap settings for ksqlDB (default: -Xms512M -Xmx2G) heapOpts: -Xms512M -Xmx2G image: # PullPolicy describes a policy for if/when to pull a container image imagePullPolicy: imagePullSecrets: # Input topics to be used in ksql queries for reading inputTopics: # Output topics to be used in ksql queries for write and create outputTopics: # JmxExporterSpec defines the configuration for jmx exporter jmxExporter: # Defines the config values for ksqlDB ksqlDBConfig: # Node selector setting for ksqlDB pods # https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector nodeSelector: # Annotations to be applied to ksqlDB pod podAnnotations: # Labels to be applied to ksqlDB pod podLabels: # Controls the name of the configmap which contains the ksqldb queries executed in headless mode. (default: <ksqldb cr name>-ksql-queries-configmap) Inside the configmap the query should be named as `queries.sql` queryConfigMapName: # Resources describes the compute resource requirements # default: # requests: # cpu: 1 # memory: 1.5Gi # limits: # cpu: 2 # memory: 2.5Gi resources: # Defines HPA configurations scaling: # Service account for ksqlDB pod serviceAccountName: # Annotations to be applied on the service that exposes ksqlDB API on port `ServicePort` serviceAnnotations: # Labels to be applied to the service that exposes ksqlDB API on port `ServicePort` serviceLabels: # The port ksqlDB listens for REST API requests servicePort: # Toleration settings for ksqlDB pods # see (https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) tolerations: # Volume mounts for ksqlDB pods # see (https://kubernetes.io/docs/concepts/storage/volumes/) volumeMounts: # Volumes for ksqlDB pods # see (https://kubernetes.io/docs/concepts/storage/volumes/) volumes:
KsqlDB configurations are computed and maintained by Streaming Data Manager, and cannot be overridden:
- ksql.schema.registry.url (if
- ksql.connect.url (if
The default KsqlDB custom resource
apiVersion: kafka.banzaicloud.io/v1beta1 kind: KsqlDB metadata: name: ksqldb-sample spec: clusterRef: name: "kafka"