Known Issues

This page describes potential issues seen previously and their workarounds. For high-level questions about Service Mesh Manager, see the generic FAQ.

SMM CLI driven installation with SDM failure at sdm-appmanifest step due to webhook failure

During testing some webhook errors occurred in the sdm-appmanifest stage when running smm install CLI with --install-sdm enabled.

Example error message:

✗ sdm-applicationmanifest ❯ skip purging results due to previous errors: failed to create resource: creating resource failed: Internal error occurred: failed calling webhook "singleton-vapplicationmanifest.supertubes-control-plane.admission.banzaicloud.io": failed to call webhook: Post "https://supertubes-control-plane.supertubes-control-plane.svc:443/singleton-validate-supertubes-banzaicloud-io-v1beta1-applicationmanifest?timeout=10s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "svc-cat-ca") (name=sdm-applicationmanifest, namespace=supertubes-control-plane, apiVersion=supertubes.banzaicloud.io/v1beta1, kind=ApplicationManifest, name=sdm-applicationmanifest, namespace=supertubes-control-plane, apiVersion=supertubes.banzaicloud.io/v1beta1, kind=ApplicationManifest)
✗ error during operator reconcile: failed to create resource: creating resource failed: Internal error occurred: failed calling webhook "singleton-vapplicationmanifest.supertubes-control-plane.admission.banzaicloud.io": failed to call webhook: Post "https://supertubes-control-plane.supertubes-control-plane.svc:443/singleton-validate-supertubes-banzaicloud-io-v1beta1-applicationmanifest?timeout=10s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "svc-cat-ca") (name=sdm-applicationmanifest, namespace=supertubes-control-plane, apiVersion=supertubes.banzaicloud.io/v1beta1, kind=ApplicationManifest, name=sdm-applicationmanifest, namespace=supertubes-control-plane, apiVersion=supertubes.banzaicloud.io/v1beta1, kind=ApplicationManifest)

Workarounds

Rerun the same smm install command. This reapplies the ValidatingWebhook configuration and credentials, and allows the installation to proceed.

Some uninstall workflows leave resources in the cluster

The Calisti CLI has several options to uninstall portions of the Calisti system, with the smm uninstall -a command being the most comprehensive. However, to maintain compatibility with upgrade and reinstall workflows without loss of user configuration, Calisti custom resource definitions (CRDs), some persistent volume claims (PVCs), and some namespaces are not removed. This is by design.

In some cases other Calisti system objects might remain in the cluster, such as secrets, configmaps, or admission webhook configurations.

Workarounds

To completely remove the Calisti system, the remaining Calisti system namespaces should be checked for objects and removed via Kubernetes API operations.

If you are using the Calisti demoapp (smm demoapp install), especially with generated traffic loads (smm demoapp load start), then use the following steps during the uninstallation:

  1. Stop the traffic load generation: smm demoapp load stop
  2. Uninstall the demo application first: smm demoapp uninstall
  3. Run the uninstall command: smm uninstall

If using Streaming Data Manager, remove the user-created Kafka configuration custom resource instances, (such as kafkatopics, kafkaroles, kafkaacls), before running the Calisti uninstallation (smm uninstall).

Validation errors on UI when Istio configuration refers to services that don’t exist on the cluster

In some cases, your Istio configuration may refer to services and/or endpoints that do not exist on the cluster with the configuration applied. These are detected by Calisti as validation errors and displayed in the Calisti UI. There are situations where such configuration is desirable, for example, in multicluster topologies with passive peer clusters.

Workarounds

There is no functional impact to validation errors. They serve as notifications of configuration inconsistencies.

KafkaTopic resource instances can only be created when the KafkaCluster has enough brokers running

The Streaming Data Manager configuration interface for managing Kafka resources currently has the limitation that causes creating or modifying KafkaTopic resources to fail until the corresponding KafkaCluster has enough brokers to fulfill the topic partitions.

Workarounds

It is expected that different administrative workflows, likely with different personas, configure KafkaClusters and KafkaTopics. Sequencing of the operations workflows handling these configuration tasks is currently required.

Under heavy load CruiseControl transition to async message processing can cause koperator to panic

The CruiseControl setting webserver.request.maxBlockTimeMs controls the threshold at which requests are converted to async processing. When CruiseControl enters this condition, koperator may panic due to mishandling the async request processing scheme. The overall condition is caused by heavy request load where CruiseControl can’t process all requests within the maxBlockTimeMs threshold.

This issue maps to upstream issue 828.

Workarounds

It is expected that this situation is very rare and koperator pods will auto-restart to recover. Tuning the webserver.request.maxBlockTimeMs will decrease the probability that this issue occurs.

Multicluster topic reachability issues when using MirrorMaker2

MirrorMaker2 usage with multicluster Calisti with Kafka brokers and topics spread across clusters has issues where MirrorMaker2 is unable to access topics on remote clusters.

Workarounds

Restarting the MirrorMaker2 pod fixes the issue. This needs to be done every time the user creates a new topic. The issue is under active investigation.

Koperator cannot delete terminating and stuck broker pods

While very rare, the symptom of broker pods being stuck in Terminating state has been hit in some Calisti Kafka broker management testing.

Workarounds

Force removal of the broker pods via external means. For example, use kubectl delete --force

Kafka broker log directory removal not supported when brokers use multiple log directories

There is an upstream issue which prevents Kafka broker log directory removal not supported when brokers use multiple log directories.

Workarounds

Currently no workaround is possible in Streaming Data Manager. Active investigation is underway into the use of CruiseControl enhancements to provide a solution.

User specified JKSPasswordName is not used on user provided Kafka listener SSL/TLS certificate-issuer-certificate

In Streaming Data Manager when a user wants to configure the Kafka cluster listener SSL/TLS certificate-issuer-certificate with user provided certificate (code), and they

  1. create a SSL/TLS secret (as K8s secret) with caCrt and caKey fields, and
  2. create a JKS password (as a K8s secret) with password field, and
  3. at the KafkaCluster.spec.listenersConfig.sslSecrets set the tlsSecretName and jksPasswordName fields to the corresponding created secrets' names and sets the create field to false (certificate keys are supposed to be provided, not generated).

Then Streaming Data Manager currently does not use the password provided through the pre-created secret referenced in jksPasswordName, instead it generates a new password value for the Java KeyStore file and the set secret is not used at all.

Adding and removing Kafka brokers within the same KafkaCluster CR update leads to non-deterministic behavior

Updates to an existing KafkaCluster CR instance that result in Kafka broker adding and removing in the same update causes unpredictable behavior.

Workarounds

Ensure updates to existing KafkaCluster CR instances do not add and remove entries in the broker list within the same update operation.