Using ksqlDB
Streaming Data Manager automates the deployment of the ksqlDB event streaming database by introducing a new custom resource called KsqlDB. Streaming Data Manager provides two modes to manage ksqlDB backend(s):
Both methods use the KsqlDB Custom Resource Definition under the hood to manage ksqlDB instances.
Imperative management of ksqlDB instances
The Streaming Data Manager CLI provides commands to deploy ksqlDB instances with either default or custom settings with ease.
Note: To deploy ksqlDB instances or manage existing ones, run the
smm sdm cluster ksql create
andsmm sdm cluster ksql update
commands.
Declarative management of ksqlDB instances
Managing ksqlDB instances with Streaming Data Manager is as simple as creating and updating the KsqlDB
custom resource. Streaming Data Manager automatically monitors the ksqlDB deployment and configuration settings specified using the KsqlDB
custom resource. For details on the custom resource, see the description of the custom resource.
These will perform the necessary steps to spin up new ksqlDB instances or reconfigure existing ones with the desired configuration.
Introduction to ksqlDB
For a detailed description on how to manage ksqlDB with Streaming Data Manager, see our Managing ksqlDB with Streaming Data Manager blog post.
Modes of operation
The ksqlDB server has two modes of operation: interactive and non-interactive (or headless) mode. For details, see the official ksqlDB documentation.
Streaming Data Manager supports both modes, and uses the interactive mode by default. To enable and configure ksqlDB in headless mode, see Running ksqlDB in headless mode.
Scaling by HPA
Streaming Data Manager takes care of scaling ksqlDB using a Horizontal Pod Autoscaler (HPA). The twist here is that by default, HPAs only support scaling through basic CPU or memory usage. While that’s generally enough for most workloads, in the case of ksqlDB it’s much better to scale by consumer lag
.
When ksqlDB cannot keep up with the rate of messages produced on your Kafka topics, it can fall behind in its processing of incoming data. Scaling by consumer lag helps solve this issue far better than scaling by any traditional metric. In the Streaming Data Manager ecosystem, we already track consumer lag in our Prometheus instance.
To enable HPA to understand the consumer lag metrics, deploy the kube-metrics-adapter helm chart. An already deployed and configured HPA will do the rest for you.
# Default HPA configuration
scaling:
prometheusUrl: http://prometheus-operator-prometheus.supertubes-system.svc:9090
# Name of the ksqlDB streams that the PrometheusMetric will be filtered by
streams: []
# Minimum number of replicas
minValue: 1
# Maximum number of replicas
maxValue: 5
# Threshold for the hpa to activate
threshold: 30
Security
Streaming Data Manager security features (like Kafka ACLs) apply to the ksqlDB deployment as well. The following sections detail the additional options that allow you to configure security for ksqlDB.
Authorization
You can configure the authorization policy through the authorizations
field of the KsqlDB
custom resource. Only the listed principals can access the ksqlDB server.
You can list arbitrary number of KafkaUser
or ServiceAccount
entities in the specification.
Example authorization settings
Here’s an example authorization spec, that allows traffic to the ksqlDB server for the user-1
user and the default
service account.
Authorizations:
- Principal:
Kind: KafkaUser
Namespace: kafka
Name: user-1
- Principal:
Kind: ServiceAccount
Namespace: kafka
Name: default
Access ksqlDB from outside the service mesh
In order to access the ksqlDB from a CLI instance which is outside the service mesh, you have to configure the certificates manually.
- Extract the certificates from Istio as described in Client applications outside the Istio mesh .
- Use that certificate to configure the CLI as described in the ksqlDB’s documentation.
Access ksqlDB from outside the Kubernetes cluster
In order to access the ksqlDB from a client application which is outside the Kubernetes cluster, do the following:
-
Configure the
externalEndpoint
in theksqlDB
custom resource with the following fields: Setenabled
field totrue
and set theistioControlPlane
field’sname
andnamespace
. TheistioControlPlane
field is a reference to the IstioControlPlane custom resource and is mandatory from SDM version 1.7.0+ as SDM 1.7.0+ uses Istio operator v2, which supports multiple Istio control planes on the same cluster. This is why the corresponding control plane to the Istio ingress gateway must be specified.spec: ... externalEndpoint: enabled: true istioIngressConfig: ... istioControlPlane: name: <name of the IstioControlPlane custom resource> namespace: <namespace of the IstioControlPlane custom resource> ...
-
(Optional) You can set
MTLS
tofalse
if you don’t need client authentication and secure communication between the client application and external endpoint.spec: ... MTLS: false ...
Access control
Streaming Data Manager manages ACLs for ksqlDB
and even provides a way to fine grain your configuration through the KsqlDB Custom Resource Definition. For example:
...
Spec:
# Input topics to be used in ksql queries for reading
inputTopics: []
# Output topics to be used in ksql queries for write and create
outputTopics: []
...
The KsqlDB custom resource definition
apiVersion: kafka.banzaicloud.io/v1beta1
kind: KsqlDB
metadata:
name: ksqldb-sample
namespace: kafka
spec:
# Name of the KafkaCluster custom resource that represents the Kafka cluster this ksqlDB instance to connect to
clusterRef:
name: kafka
# Name of the SchemaRegistry custom resource that represents the Schema registry to be made available for ksqlDB
schemaRegistryRef:
# Name of the KafkaConnect custom resource that represents the Kafka Connect to be made available for ksqlDB
kafkaConnectRef:
# Controls whether mTLS is enforced between ksqlDB and client applications (default: true)
MTLS: true
# Affinity settings for ksqlDB pods
# see https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity
affinity:
# Controls the list of principals who are authorized to access the ksqlDB REST API
authorizations:
# Settings for exposing ksqlDB REST API outside the Kubernetes cluster when running in interactive mode
externalEndpoint:
# Controls whether to expose KsqlDB outside the Kubernetes cluster (default: false)
enabled:
# Configuration for the ingress controller when KsqlDB running in interactive mode is exposed outside the Kubernetes cluster
istioIngressConfig:
# IstioControlPlane specifies the namespace and name of the IstioControlPlane custom resource
# which represents the Istio control plane. Starting from SDM 1.7.0 this field is required if KsqlDB is exposed
# outside of the Kubernetes cluster.
istioControlPlane:
name:
namespace:
# Controls whether the ksqlDB is running in headless or interactive mode (default: false)
headless: false
# Heap settings for ksqlDB (default: -Xms512M -Xmx2G)
heapOpts: -Xms512M -Xmx2G
image:
# PullPolicy describes a policy for if/when to pull a container image
imagePullPolicy:
imagePullSecrets:
# Input topics to be used in ksql queries for reading
inputTopics:
# Output topics to be used in ksql queries for write and create
outputTopics:
# JmxExporterSpec defines the configuration for jmx exporter
jmxExporter:
# Defines the config values for ksqlDB
ksqlDBConfig:
# Node selector setting for ksqlDB pods
# https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector
nodeSelector:
# Annotations to be applied to ksqlDB pod
podAnnotations:
# Labels to be applied to ksqlDB pod
podLabels:
# Controls the name of the configmap which contains the ksqldb queries executed in headless mode. (default: <ksqldb cr name>-ksql-queries-configmap) Inside the configmap the query should be named as `queries.sql`
queryConfigMapName:
# Resources describes the compute resource requirements
# default:
# requests:
# cpu: 1
# memory: 1.5Gi
# limits:
# cpu: 2
# memory: 2.5Gi
resources:
# Defines HPA configurations
scaling:
# Service account for ksqlDB pod
serviceAccountName:
# Annotations to be applied on the service that exposes ksqlDB API on port `ServicePort`
serviceAnnotations:
# Labels to be applied to the service that exposes ksqlDB API on port `ServicePort`
serviceLabels:
# The port ksqlDB listens for REST API requests
servicePort:
# Toleration settings for ksqlDB pods
# see (https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/)
tolerations:
# Volume mounts for ksqlDB pods
# see (https://kubernetes.io/docs/concepts/storage/volumes/)
volumeMounts:
# Volumes for ksqlDB pods
# see (https://kubernetes.io/docs/concepts/storage/volumes/)
volumes:
The following KsqlDB
configurations are computed and maintained by Streaming Data Manager, and cannot be overridden:
- bootstrap.servers
- listeners
- ksql.schema.registry.url (if
schemaRegistryRef
is provided) - ksql.connect.url (if
kafkaConnectRef
is provided)
The default KsqlDB custom resource
apiVersion: kafka.banzaicloud.io/v1beta1
kind: KsqlDB
metadata:
name: ksqldb-sample
spec:
clusterRef:
name: "kafka"