This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Calisti

Calisti includes the following main components:

  • Service Mesh Manager is a multi and hybrid-cloud enabled service mesh platform for constructing modern applications. Built on Kubernetes and our Istio distribution, Service Mesh Manager enables flexibility, portability and consistency across on-premises datacenters and cloud environments.

  • Calisti’s Streaming Data Manager is the deployment tool for setting up and operating production-ready Apache Kafka clusters on Kubernetes, leveraging a Cloud Native technology stack. Streaming Data Manager includes Zookeeper, Koperator, Envoy, and many other components hosted in a managed service mesh. All components are automatically installed, configured, and managed in order to operate a production-ready Kafka cluster on Kubernetes.

Calisti architecture

Service Mesh Manager

Service Mesh Manager helps you to confidently scale your microservices over single- and multi-cluster environments and to make daily operational routines standardized and more efficient. The componentization and scaling of modern applications inevitably leads to a number of optimization and management issues:

  • How do you spot bottlenecks? Are all components functioning correctly?
  • How are connections between components secured?
  • How does one reliably upgrade service components?

Service Mesh Manager helps you accomplish these tasks and many others in a simple and scalable way, by leveraging the Istio service mesh and building many automations around it. Our tag-line for the product captures this succinctly:

Service Mesh Manager operationalizes the service mesh to bring deep observability, convenient management, and policy-based security to modern container-based applications.

Learn more about Service Mesh Manager.

Streaming Data Manager

Streaming Data Manager is specifically built to run and manage Apache Kafka on Kubernetes. Other solutions use Kubernetes StatefulSets to run Apache Kafka, but this approach is not really suitable for Apache Kafka. Streaming Data Manager is based on simple Kubernetes resources (Pods, ConfigMaps, and PersistentVolumeClaims), allowing a much more flexible approach that makes possible to:

  • Modify the configuration of unique Brokers: you can configure every Broker individually.
  • Remove specific Brokers from clusters.
  • Use multiple Persistent Volumes for each Broker.
  • Do rolling upgrades without data loss or service disruption.

Learn more about Streaming Data Manager.

1 - What's new

Release 1.12.0 (2023-05-11)

OpenShift support and certification

Red Hat OpenShift provides scalable and reliable solutions to monitor microservices and offers full-fledged container security. With a Red Hat OpenShift Service on AWS (ROSA) version 4.11 setup, you can seamlessly install Calisti on Red Hat OpenShift clusters. Calisti avoids vendor lock-in so that you can mix-n-match your cluster cloud providers and transition or continue using mixed multi-clusters (GKE, EKS, AKS, OpenShift).

We have implemented OpenShift support for Calisti in a way that it is ready to be OpenShift certified.

Starting with this release, Calisti runs on Red Hat OpenShift, and can run and manage Apache Kafka clusters and Istio meshes on OpenShift. For details, see the detailed instructions for the installation method you want to use.

Migrate your Kafka cluster data into Calisti

You can now migrate data from your existing Apache Kafka clusters to Kafka clusters that are managed by Calisti without introducing downtime of the existing Kafka clusters. That way you can get the benefits of running Kafka clusters on Calisti without losing any data collected to the already running Kafka clusters.

Migrating Kafka topics into Calisti

For details, see Data migration using MirrorMaker2.

Dashboard improvements

You can now directly create and edit Istio resources on the dashboard. The new YAML editor of Calisti provides ready-to-use templates, as well as syntactic and semantic validation.

  • Navigate to MENU > ISTIO RESOURCES > CREATE NEW, select an Istio resource to create from the list displayed. Istio resources

  • The MENU > GATEWAYS > ROUTES page has been renamed to VIRTUAL SERVICE.

  • The MENU > SERVICES > TRAFFIC MANAGEMENT and MENU > TOPOLOGY > TRAFFIC MANAGEMENT pages are renamed to VIRTUAL SERVICE. Calisti provides various templates to configure virtual services depending on your use case.

  • The MENU > SERVICES > CIRCUIT BREAKER and MENU > TOPOLOGY > CIRCUIT BREAKER pages are renamed to DESTINATION RULES. Calisti provides various templates to configure destination rules depending on your use case.

    Destination rules

  • Creating ingress and egress gateways, as well as setting other traffic management components of Istio is now possible on the Calisti dashboard using the new YAML editor.

  • You can now create a VirtualService and use:

  • You can now create a DestinationRule and use Outlier Detection and Connection Pool for setting circiut breaking parameters. To lean more see, Circuit Breaking.

Other changes

  • You can now install Istio into custom namespaces.
  • You can now configure custom sidecar injection templates for Calisti to help you integrate Service Mesh Manager with other applications.
  • Calisti can now generate a support bundle to help troubleshooting support cases. For details, see Support-bundle.

Release 1.11.0 (2022-11-07)

Streaming Data Manager

Calisti now has a new component called Streaming Data Manager. Streaming Data Manager is a cloud-native, turnkey solution for deploying and managing Apache Kafka over Istio, providing:

  • Security and encryption
  • out-of-the-box observability
  • RBAC integration
  • Scaling

Kafka traffic on the UI

For details, see Overview.

Note: When using Streaming Data Manager on Amazon EKS, you must install the EBS CSI driver add-on on your cluster.

GitOps support

Service Mesh Manager and Streaming Data Manager can be used in GitOps environments as well. For details, see Install SMM - GitOps - single cluster, Install SMM - GitOps - multi-cluster, and Install SDM - GitOps.

Istio 1.15 support

Service Mesh Manager now supports Istio 1.15 and provides our Istio distribution based on that codebase.

This also means that Service Mesh Manager is fully compatible with Kubernetes v1.24.x.

Other changes

  • The health views of the Services and Workloads pages now have fixed URLs to make sharing easier.

  • If the name of your cluster doesn’t comply with the RFC 1123 DNS labels/subdomain restrictions, Service Mesh Manager now automatically converts it to a compliant format and sets it as the name of the cluster. In earlier versions, you had to manually set a compliant name for clusters with non-compliant names, otherwise certain operations (like smm install and smm attach) failed. Service Mesh Manager now automatically applies the following conversions if needed:

    • Replace ‘_’ characters with ‘-’
    • Replace ‘.’ characters with ‘-’
    • Replace ‘:’ characters with ‘-’
    • Truncate the name to 63 characters
  • The Service Mesh Manager CLI now returns an error message when trying to run a command on a cluster that’s running an unsupported Kubernetes version.

  • In Kubernetes 1.24 or newer, token secrets for service accounts aren’t created automatically. If Service Mesh Manager is running on a Kubernetes 1.24 (or newer) cluster, then when adding virtual machines to the mesh, you must create the token secrets manually. For details, see Add a virtual machine to the mesh.

Release 1.10.0 (2022-08-09)

RedHat-based virtual machines

Service Mesh Manager now supports attaching virtual machines running RedHat Enterprise Linux 8 to the mesh. For details, see Integrating Virtual Machines into the mesh.

Istio 1.13 support

Service Mesh Manager now supports Istio 1.13 and provides our Istio distribution based on that codebase.

Enterprise licenses

Paid-tier and enterprise licenses are now available for Service Mesh Manager.

  • If you are interested in purchasing a license, contact us.
  • If you have already purchased a license, apply it to your Service Mesh Manager deployments. For details, see Licensing options.

Other changes

  • The smm CLI tool now supports MacOS running on M1 chips.
  • The Prometheus node exporter service now uses port 19101 instead of 19100. That way, the Prometheus deployment of Service Mesh Manager can work side-by-side with a pre-existing Prometheus deployment. For details on other ports used by Service Mesh Manager, see Open Port Inventory.

Release 1.9.1 (2022-04-11)

Service Mesh Manager now supports attaching virtual machines to the mesh. After a virtual machine has been integrated into the mesh, Service Mesh Manager automatically updates the configuration of the virtual machine to ensure that it remains a part of the mesh and receives every configuration updates it needs to operate in teh mesh. In addition, the observability features available for Kubernetes pods are available for the virtual machines as well, for example:

For details, see Integrating Virtual Machines into the mesh.

Release 1.9.0 (2022-03-08)

Free tier

From now on, after a free registration, you can use Service Mesh Manager to manage your mesh of up to ten nodes. For details, see Licensing options and Getting started with the Free Tier.

Istio 1.12 support

Service Mesh Manager now supports Istio 1.12 and provides our Istio distribution based on that codebase.

Other changes

This release includes the following fixes:

  • All custom resources used by Service Mesh Manager had been moved to the smm.cisco.com group. CLI is capable of migrating the objects to the new group.
  • Topology:
    • Mesh gateways are now fully visible on the topology page even in timeline mode
    • Topology view now shows pod counts in timeline mode
  • Fix an issue causing new SLOs to not to start calculating on creation
  • IstioControlPlane settings can be overridden from Service Mesh Manager’s ControlPlane resource using the .spec.meshManager.istio.istioCRDOverrides key (which contains a YAML string).

Removed features

The following commands have been removed from the Service Mesh Manager command-line tool. You can configure the related features from the dashboard.

  • smm sidecar-proxy egress get
  • smm sidecar-proxy egress set
  • smm sidecar-proxy egress delete
  • smm routing
  • smm mtls
  • Integrated support for canary deployments. You can use the Flagger operator instead.

Release 1.8.2 (2021-12-14)

This release includes the following fixes:

Active-active fixes

  • Fix secret cleanup for Istio in active-active setups.
  • Update istio-operator to latest.
  • Multiple active Istio control-plane support.
  • Cluster name is now visible in istio status command.
  • Control plane list now shows clusters as well.

Mesh view

  • Stabilize the ordering of Istio clusters to prevent changed ordering on the UI.

cert-manager

  • Update to v1 API.

Auth

  • Fix an issue where 1.7 specific authentication tokens were generated during upgrade scenarios.

UI

  • Fix an issue which caused topology to crash for ingress gateways.

Operators

  • Add RBAC for Coordination resources so that operator leader election can use the resources.
  • In case there is a merge conflict during reconciliation the smm operator will retry the reconciliation without failing.
  • 1.7 Istio operators will be properly removed during uninstall.

Let’s Encrypt

  • Validate DNS records on let’s encrypt enabled ingresses to ensure that the ingress and DNS records are matching.

Registry access

  • Sort secret names to prevent changes always happening during reconciliation.

Release 1.8 (2021-10-26)

The primary goal of this release was to have a modern way to orchestrate Istio and the multi-cluster topologies Service Mesh Manager supports. As part of this work, the Cisco Istio Operator has been restructured from the ground up so that you can benefit from an API that has been adjusted to the modern Istio versions.

As this new version of the operator supports not just the Primary-Remote cluster topology, but also Multi-Primary both on the same and different network, this change paves the way for subsequent releases to add support into Service Mesh Manager for meshes with any number of Primary and Remote clusters.

Istio 1.11 support

Service Mesh Manager now supports Istio 1.11 and provides our Istio distribution based on that codebase.

This also means that Service Mesh Manager is fully compatible with Kubernetes v1.22.x.

OIDC and external dashboard access support

This release provides support for exposing the Service Mesh Manager dashboard via a public, https URL. For the required configuration please check out the Exposing the Dashboard page.

To entirely remove the need for downloading the Service Mesh Manager CLI and to better integrate with existing OIDC-enabled Kubernetes deployments, we are also supporting OIDC Authentication.

Release 1.7 (2021-07-28)

Release 1.7 is focusing on compliance, integrations, tech-debt and reusability.

GraphQL federation

The Service Mesh Manager GraphQL API is now broken down into separate components to increase reusability, and to provide the ability to switch components on/off in Service Mesh Manager in the future.

Protocol-specific observability

Istio provides several useful metrics for the TCP, HTTP, and GRPC protocols. To give you better observability and more insight into the traffic of your services, Service Mesh Manager displays protocol-specific metrics normally not available in Istio for MySQL and PostgreSQL traffic. Support for more protocols is planned in future releases.

Protocol specific observability Protocol specific observability

Istio 1.10 support

Service Mesh Manager now supports Istio 1.10.

Cluster registry

A generic, distributed Kubernetes cluster registry is now serving as the base for keeping multi-cluster metadata. Cluster metadata is replicated across clusters using a gossip-like protocol.

Unified Istio distribution with SecureCN

SecureCN and Service Mesh Manager are now using the same Istio distribution that enables better integration between the two products.

CSDL Compliance

Service Mesh Manager has now reached CSDL “Planned” status.

DevNet Sandbox

Service Mesh Manager is now available on DevNet sandbox for design partners for solution testing.

Release 1.6.1 (2021-05-06)

This release is a security and bugfix release.

Included changes are:

  • Add support for Istio 1.8.5 for customers still using the old version of Istio instead of 1.9
  • Fix an issue in the Istio operator that required permissions for the authentication.istio.io and config.istio.io groups, while those are only needed for Istio versions < 1.8
  • smm activate command now resets all of the user’s registry settings, making changing IAM credentials easier. Previously the end user-needed to remove the registry access credentials manually using the smm registry remove command

Release 1.6 (2021-04-09)

Group your clusters into networks to optimize your mesh topology using a mix of gateway-based and flat-network connections between your clusters, decreasing cross-cluster latencies and transfer costs. Clusters belonging to the same network can access each other directly, without using the cluster gateway. For details, see Cluster network and Attach a new cluster to the mesh.

UI improvements

Istio 1.9 support

Service Mesh Manager now supports Istio 1.9.

1.1 - Version Matrix

Release 1.12.0 (2023-05-11)

Calisti v1.12.0 is validated on

  • EKS, GKE, AKS, KinD, On-premises (requires loadBalancer) kubernetes versions - 1.21, 1.22, 1.23, 1.24
  • Openshift ROSA (AWS) - 4.11
Component Version
Cert Manager v1.11.0
Cluster Registry Controller v0.2.10
ImagePullSecrets Controller v0.3.12
Istio v1.13.5, v1.15.3
Istio-operator v2.13.6 (istio-v1.13.5), v2.15.11 (istio-v1.15.3)
Grafana v7.5.11
Jaeger v1.28.0
Prometheus Operator v0.63.0
Prometheus v2.39.1
Prometheus-node-exporter v1.2.2
Thanos v0.28.1
Koperator v0.24.1
Kafka v3.1.0
Cruise Control for Apache Kafka v2.5.101
Zookeeper Operator v0.2.14

Release 1.11.0 (2022-11-07)

Calisti v1.11.0 is validated on

  • EKS, GKE, AKS, KinD, On-premises (requires loadBalancer) kubernetes versions - 1.21, 1.22, 1.23, 1.24
Component Version
Cert Manager v1.9.1
Cluster Registry Controller v0.2.4
ImagePullSecrets Controller v0.3.5
Istio-operator v2.13.5 (istio-v1.13.5), v2.15.4 (istio-v1.15.3)
Istio v1.13.5, v1.15.3
Grafana v7.5.11
Jaeger v1.28.0
Prometheus Operator v0.60.1
Prometheus v2.38.0
Prometheus-node-exporter v1.2.2
Thanos v0.28.1
Koperator v0.22.0
Kafka v3.1.0
Cruise Control for Apache Kafka v2.5.101
Zookeeper Operator v0.2.13

Release 1.10.0 (2022-08-09)

Calisti v1.10.0 is validated on

  • EKS, GKE, AKS, KinD, On-premises (requires loadBalancer) kubernetes versions - 1.19, 1.20, 1.21, 1.22, 1.23
Component Version
Cert Manager v1.6.2
Cluster Registry Controller v0.2.2
ImagePullSecrets Controller v0.3.5
Istio-operator v2.12.0 (istio-v1.12.7), v2.13.5 (istio-v1.13.5)
Istio v1.12.5, v1.13.5
Grafana v7.5.11
Jaeger v1.28.0
Prometheus Operator v0.52.1
Prometheus v2.30.2
Prometheus-node-exporter v1.2.2
Thanos v0.23.1

Release 1.9.1 (2022-04-11)

Calisti v1.9.1 is validated on

  • EKS, GKE, AKS, KinD, On-premises (requires loadBalancer) kubernetes versions - 1.19, 1.20, 1.21, 1.22, 1.23
Component Version
Cert Manager v1.6.2
Cluster Registry Controller v0.2.2
ImagePullSecrets Controller v0.3.5
Istio-operator v2.11.7(istio-1.11.4) v2.12.0 (istio-v1.12.5)
Istio v1.11.4, v1.12.5
Grafana v7.5.11
Jaeger v1.28.0
Prometheus Operator v0.52.1
Prometheus v2.30.2
Prometheus-node-exporter v1.2.2
Thanos v0.23.1

2 - Service Mesh Manager

Service Mesh Manager is a multi and hybrid-cloud enabled service mesh platform for constructing modern applications. Built on Kubernetes and our Istio distribution, Service Mesh Manager enables flexibility, portability and consistency across on-premises datacenters and cloud environments.

Service Mesh Manager helps you to confidently scale your microservices over single- and multi-cluster environments and to make daily operational routines standardized and more efficient. The componentization and scaling of modern applications inevitably leads to a number of optimization and management issues:

  • How do you spot bottlenecks? Are all components functioning correctly?
  • How are connections between components secured?
  • How does one reliably upgrade service components?

Service Mesh Manager helps you accomplish these tasks and many others in a simple and scalable way, by leveraging the Istio service mesh and building many automations around it. Our tag-line for the product captures this succinctly:

Service Mesh Manager operationalizes the service mesh to bring deep observability, convenient management, and policy-based security to modern container-based applications.

Learn more about Service Mesh Manager.

2.1 - Overview

Service Mesh Manager is a multi and hybrid-cloud enabled service mesh platform for constructing modern applications. Built on Kubernetes and our Istio distribution, Service Mesh Manager enables flexibility, portability and consistency across on-premises datacenters and cloud environments.

Service Mesh Manager helps you to confidently scale your microservices over single- and multi-cluster environments and to make daily operational routines standardized and more efficient. The componentization and scaling of modern applications inevitably leads to a number of optimization and management issues:

  • How do you spot bottlenecks? Are all components functioning correctly?
  • How are connections between components secured?
  • How does one reliably upgrade service components?

Service Mesh Manager helps you accomplish these tasks and many others in a simple and scalable way, by leveraging the Istio service mesh and building many automations around it. Our tag-line for the product captures this succinctly:

Service Mesh Manager operationalizes the service mesh to bring deep observability, convenient management, and policy-based security to modern container-based applications.

Key features

Service Mesh Manager takes the pain out of Istio by offering great UX from installation and mesh management to runtime diagnostics and more.

Istio distribution

Service Mesh Manager is built on Istio, but offers enhanced functionality, for example, operator-based Istio management, a full-featured CLI tool, and an intuitive and easy to use UI. It is not a new abstraction layer on top of Istio, and stays fully compatible with the upstream. Service Mesh Manager is designed for enterprise users and comes with commercial support.

For a detailed list of changes compared to upstream Istio, see Istio distribution.

Observability

The Service Mesh Manager UI gives you insight into the operation of your services. It not only shows the service topology with real-time and historical metrics, but also allows you to drill-down and analyze the metrics in context. Service Mesh Manager automatically calculates the health of your services and workloads based on the available metrics. If you still need additional details, you can access the related Grafana dashboards with a single click.

You can also monitor communications with services that are external to your mesh.

Root cause diagnostics

Root cause diagnostics help you efficiently isolate and solve operational issues related to your services. Service Mesh Manager offers:

Control

You can manage Istio through the Service Mesh Manager UI and the CLI. Service Mesh Manager gives you easy access to the configuration of the Istio service mesh and its underlying traffic-management features, including:

With Service Mesh Manager, you can manage service-updates using automated, industry-standard upgrade strategies, like canary releases.

Multi-cluster

With Service Mesh Manager, you can monitor and manage your hybrid multi-cloud service infrastructure from a single pane of glass. You can easily attach and detach clusters using the CLI, and take advantage of enhanced multi-cluster telemetry.

Service Mesh Manager supports multiple mesh topologies, so you can use the one that best fits for your use-cases. In multi-cluster configurations it provides automatic locality load-balancing.

Service Level Objectives and burn-rate alerts

Service Mesh Manager helps SREs and operation engineers to observe and control the health of their services and applications. You can create and track service level objectives and corresponding alerting rules on the Service Mesh Manager dashboard.

Security & Compliance

Service Mesh Manager helps you secure your services through industry-standard authorization and authentication practices, including:

High-level architecture

Calisti Service Mesh Manager consists of the following components:

Service Mesh Manager architecture overview

  • Service mesh management: The open source Cisco Istio operator helps to install/upgrade/manage Istio deployments. Its unique features include managing multiple ingress/egress gateways in a declarative fashion, and automated and fine-tuned multi-cluster management.

  • The core components of Service Mesh Manager are:

    • the Service Mesh Manager backend (exposing a GraphQL API)
    • the Service Mesh Manager UI, a web interface
    • the Service Mesh Manager CLI
    • the Service Mesh Manager operator

    Service Mesh Manager’s soul is its backend, which exposes a GraphQL API. The Service Mesh Manager UI (dashboard) and CLI interact with this API. The Service Mesh Manager operator is an optional component which helps with a declarative installation method to support GitOps workflows.

  • External out-of-the-box integrations:

    These components are automatically installed and configured by Service Mesh Manager by default to be able to work with Istio. You can also integrate Service Mesh Manager with your own Prometheus, Grafana, Jaeger, or Cert manager - Service Mesh Manager follow the batteries included but replaceable paradigm.

Istio-operator and Service Mesh Manager

The Calisti team actively maintains its fully upstream-compatible Istio distribution and several open-source projects and integrations that serve as the basis for Cisco Service Mesh Manager. From the perspective of Istio management, the Calisti team maintains the following:

  • The Istio operator is an open source project, which is the core component involved in Istio control plane and gateway lifecycle management for Cisco Service Mesh Manager.
  • Calisti Service Mesh Manager is a commercial product that includes all the features mentioned in this guide, enterprise support, and optionally integration support for customer environments.

Read the detailed comparison.

Next steps

2.1.1 - Features

Service Mesh Manager addresses the whole cloud-native lifecycle of a service mesh-based solution by providing various tools starting from day 0 to day 2 operations. As such a solution requires quite many components to provide the core service mesh functionality, tracing, metrics or safe canary-based deployments (just to name a few) we are dividing Service Mesh Manager into the following layers:

features features

Now let’s see how these layers add up to a complete solution over the whole lifecycle of the product:

Day 0

Day 0, in software development, represents the design phase, during which project requirements are specified and the architecture of the solution is decided. Service Meshes, even if they are offloading the burden of security and traffic routing from the microservices' native side, are complex in nature.

Service Mesh Manager is designed with Day 0 experimentation in mind: we are providing a CLI tool that allows to install Service Mesh Manager without prior experience: you can have an Istio-based service mesh up and running in 15 minutes - with monitoring and tracing included. The user interface allows for rapid experimentation with Istio features via an intuitive dashboard, so during the design phase Engineers can focus on what matters the most: finding the right architecture.

Day 1

Day 1 involves developing and deploying software that was designed in the Day 0 phase. In this phase you create not only the application itself, but also its infrastructure, network, and external services, then implement the initial configuration of it all.

After the initial experimentation, Service Mesh Manager aids this process by not just providing facilities for configuring the service mesh, but also by providing validations to check for any issues in the deployed settings, and integrated metrics and outlier-detection information to pinpoint any issues with the freshly changed services.

In case of interoperability issues, the traffic tap and automated tracing feature provides more detailed insight into the real-time traffic.

Day 2

Day 2 is the time when the product is shipped or made available to the customer. Here, most of the effort is focused on maintaining, monitoring, and optimizing the system. Analyzing the behavior of the system and reacting correctly are of crucial importance, as the resulting feedback loop is applied until the end of the application’s life.

Service meshes and Istio in particular are developing fast. This is reflected in the N-1 support model it uses: a new Istio version is released every 3 months, and only the last two are supported. Service Mesh Manager helps decrease the risk of these upgrades by providing canary-like control plane upgrades: SREs can gradually upgrade their services to the new version even on a Workload level, and in case an issue happens, the old version of Istio is always available in the cluster to fall back to.

Service Mesh Manager provides a Service Level Objective feature that allows to ensure the solution works within its expected operational parameters. In case of an issue, the automated outlier detection system detects failures and shows them on the topology view. We are aiding postmortems using our timeline feature, that allows for checking out the past state of the deployment, including health data.

2.1.1.1 - Istio distribution

Service Mesh Manager is built on Istio, but offers enhanced functionality, for example, operator-based Istio management, a full-featured CLI tool, and an intuitive and easy to use UI. It is not a new abstraction layer on top of Istio, and stays fully compatible with the upstream.

Service Mesh Manager is designed for enterprise users and comes with commercial support.

Notable changes compared to upstream Istio

FIPS 140-2 Level 1 compliant build

The FIPS build uses Google’s BoringCrypto for the go-based components and Envoy. All components are recompiled with the necessary configuration to provide Level 1 compliance. Also the allowed ciphers are restricted even more than FIPS would allow. For details, see FIPS-compliant service mesh.

Multiple control plane support

The upstream Istio does not have the proper support for having properly isolated multiple control planes within one cluster. Various changes (ENV name overrides, ConfigMap name overrides, and so on) were made to support proper isolation between control planes.

Protocol specific observability

Istio uses Envoy proxy under the hood, which has support for various data protocols and provides protocol-specific metrics for them. The upstream Istio can enable those metrics if a supported protocol is detected. That list has been extended with PostgreSQL, and other protocols are coming soon.

Direct connect through gateways

Direct connect means that a workload can be exposed through an Istio ingress gateway in a way that the internal mTLS is not terminated, but rather the workload proxy port is directly accessible through the gateway. This allows communication to a workload with mTLS from an external client. This feature is mainly used in the Streaming Data Manager (formerly called Supertubes) product.

DNS capture and report

With this feature the Istio proxy is able to capture DNS requests and responses and report them to an API endpoint. This feature is used in the SecureCN product.

TLS interception

The mesh Certificate Authority (CA) can issue TLS certificates for arbitrary domain names, to be able to look into TLS encrypted traffic. This feature is used in the SecureCN product.

Store arbitrary key/value information in certificates

The certificates issued by Istio CA can store arbitrary, workload-specific key-value attributes in the certificates' subject directory attribute property. This is used in Panoptica, the Cisco Secure Application Cloud to propagate workload-specific information between clusters, without the need for a central database.

Standalone sidecar injector component

The standalone sidecar injector is used in multi-cluster topologies on the peer clusters to have the sidecar-injection functionality of Istiod with much smaller resource requirements.

2.1.1.2 - Mesh lifecycle management

Operating a Service Mesh at scale is hard due to the inherent complexity of Mesh configurations. To ensure the most optimal operations of Service Mesh Manager based solutions, we provide validations to highlight any common issues with the existing configuration.

When it comes to supporting a service mesh-based solution on the long run, Service Mesh Manager provides support to dynamically extend the existing cluster with zero downtime to additional clusters.

Given that Istio is a fast-moving project (a new release is available every 3 months), Service Mesh Manager needs to bridge the gap between a rapidly changing Cloud Native solution and the requirements of an enterprise deployment. To decrease the blast radius of these upgrades we have introduced canary control plane upgrades based on Cisco’s open source Istio operator.

2.1.1.3 - Observability toolbox

Service Mesh Manager includes integrated observability by default. You can make the most out of your service mesh deployments:

  • By leveraging the security and traffic management features of Istio.
  • By having access to all the monitoring and tracing features Istio is capable of.

Integrated monitoring

Service Mesh Manager includes Prometheus to ensure faster troubleshooting and recovery. For further information on our monitoring capabilities, see the Dashboard guide.

Topology view

Integrated tracing

Distributed tracing is the process of tracking individual requests throughout their whole call stack in the system.

With distributed tracing in place, you can visualize full call stacks, to see:

  • which service called which service,
  • how long each call took, and
  • the network latencies between them.

The above insights can tell where a request failed, or which service took too much time to respond.

To access traces and see real-time traffic flowing thru the cluster, see the traffic tap feature.

Traffic Tap

Automated outlier detection system

Complex systems might be hard to understand. Service Mesh Manager provides an automated (zero configuration required) outlier detection system available on the topology, workloads, and services pages of the Service Mesh Manager UI.

Outlier detection

Service Level Objectives

To ensure, and most importantly to measure the deployed workloads availability, you can use an integrated Service Level Objective (SLO) based alerting system.

For defining SLOs, see Tracking Service Level Objectives (SLOs).

For alerting settings, see SLO-based alerting in Production.

2.1.1.4 - Multi-cluster topologies

Multi-cluster topologies overview

Service Mesh Manager is able to construct an Istio service mesh that spans multiple clusters. In this scenario you combine multiple clusters into a single service mesh that you can manage from either a single or from multiple Istio control planes.

multi-cluster

Single mesh scenarios are best suited to use cases where clusters are configured together, sharing resources, and are generally treated as one infrastructural component within an organization.

Service Mesh Manager not only automates setting up multi-cluster topologies, but also:

  • Updates resources to keep everything running in case a cluster changes (for example, the IP address of a load balancer changes, and so on)
  • Keeps the Istio CRs in sync between the clusters
  • Creates federated trust between the clusters (this is a difference compared to Istio)
  • Provides observability, tracing, traffic tapping and other features over multiple clusters

Supported multi-cluster topologies

Istio supports a variety of mesh topologies, as detailed in the official documentation.

Service Mesh Manager implements the Primary-Remote model, either using the different or the same network model. Service Mesh Manager also supports the Primary-Primary model, either using the different or the same network model.

Moreover, the topologies can be combined and one can have multiple primaries and multiple remotes using Service Mesh Manager.

Creating a multi-cluster mesh

Read the multi-cluster installation guide for details on how to set up a multi-cluster mesh.

2.1.2 - Modes of operation

To support the different use-cases from Day 0 to Day 2 operations, Service Mesh Manager has different modes of operation. The same binary can act as:

You can also use the operator in GitOps scenarios.

Imperative mode

The main purpose of the imperative mode is to install Service Mesh Manager, get you started, and help you experiment with the various components. You can access only a small subset of the available configuration options and features (mostly just the default settings and some of the most important configuration flags) to avoid getting overloaded with command line flags.

Most notably, you can install and delete Service Mesh Manager from the command line. Internally, the install and delete commands change the component-specific parts of the main configuration, then trigger the reconciliation of the affected components.

Other commands do not necessarily change the main configuration, nor trigger reconciliation of any component. Such commands create dynamic resources which are out of scope for the reconcilers, but are convenient for getting started without having to leave the CLI.

Once you are finished experimenting with Service Mesh Manager, the recommended way forward is to start using the reconcile command, and apply all configuration through the custom resource directly. This is analogous to how you use kubectl create and then switch to using kubectl apply when you already have a full configuration and just want to apply changes incrementally. If you are an experienced Kubernetes user, you probably skip the imperative mode and start using the reconcile command from the beginning.

The drawback of the imperative mode is that there is no overall state of components, so it can’t tell what has already been installed.

Also, it it not suitable for automation. CD systems typically require Helm charts, Kustomize, or pure YAML resources to operate with. Although the imperative commands of Service Mesh Manager have a --dump-resources flag that generates YAML files instead of applying them, you would still have to run the install command locally for each component, and commit the generated resources into version control. The CD workflow would then have to specify sequential steps for each component separately, making the whole flow difficult to extend over time.

Using the imperative mode

To use Service Mesh Manager in imperative mode, install the smm-cli command-line tool, then use its commands to install Service Mesh Manager and perform other actions. For a list of available commands, see the CLI reference.

Note: You can also configure many aspects of your service mesh using the Service Mesh Manager web interface. To access the web interface run the smm dashboard command (if your KUBECONFIG file is set properly), the smm dashboard command automatically performs the login).

Install/Uninstall components

The following components can be installed/uninstalled individually. The -a flag installs/uninstalls them all. For details on installing and uninstalling the Service Mesh Manager operator, see Operator mode.

Reconciler mode

Reconciler mode is a declarative CLI mode. The reconcile command is a one-shot version of an operator’s reconcile flow. It executes the component reconcilers in order, and can decide whether they require another reconciliation round, or are already finished. Reconciling can apply new configuration, and remove disabled components from the system.

Note: In this mode, the operator is not installed on the cluster. The controller code runs from the CLI on the client side.

A component can be anything that receives the whole configuration, understands its own part from it to configure itself, and is able to delete its managed resources when disabled or removed. Service Mesh Manager uses two different implementations:

  • The native reconciler triggers a “resource builder” to create Kubernetes resources along with their desired state (present or absent) based on the configuration of the component. Such resource builders create CRDs, RBAC, and a Deployment resource to be able to run an operator.

  • The other implementation is the Helm reconciler that basically acts as a Helm operator. It installs and upgrades an embedded chart if it has changed, or uninstalls it if it has been removed from the main configuration.

Compared to kubectl apply, these solutions add ordering, and allow executing custom logic if required. Also, they remove resources that are not present in the config anymore. The CLI in this case executes the control logic as well.

Compared to terraform, the dependencies are managed in a predefined execution order and have static binding using deterministic names. Lower performance, but easier to follow. Remote state is the CR saved to the API server.

Using the reconciler mode

To use Service Mesh Manager in reconciler mode, complete the following steps. In this scenario, the manifest is read from a file, allowing you to declaratively provide custom configuration for the various components.

  1. Login to your Service Mesh Manager installation.

  2. Prepare the configuration settings you want to apply in a YAML file, and run the following command. For details on the configuration settings, see the ControlPlane Custom Resource.

    smm reconcile --from-file <path-to-file>
    
  3. The settings applied to the components are the result of merging the default settings + valuesOverride + managed settings. You cannot change the managed settings to avoid misconfiguration and possible malfunction.

Operator mode

The operator mode follows the familiar operator pattern. In operator mode, Service Mesh Manager watches events on the ControlPlane Custom Resource, and triggers a reconciliation for all components in order, the same way you can trigger the reconcile command locally.

Note: Unlike in the declarative CLI mode, in operator mode the Service Mesh Manager operator is running inside Kubernetes, and not on a client machine. This naturally means that this mode is exclusive with the install, delete, and reconcile commands.

Using the operator mode is the recommended way to integrate the Service Mesh Manager installer into a Kubernetes-native continuous delivery solution, for example, Argo, where the integration boils down to applying YAML files to get the installer deployed as an operator.

Existing configurations managed using the reconcile command work out-of-the box after switching to the operator mode.

Using the operator mode

To use Service Mesh Manager in operator mode, Install Service Mesh Manager in operator mode. In this scenario, the reconcile flow runs on the Kubernetes cluster as an operator that watches the ControlPlane custom resources. Any changes made to the watched custom resource triggers the reconcile flow.

GitOps

GitOps is a way of implementing Continuous Deployment for cloud native applications. Based on Git and Continuous Deployment tools, GitOps provides a declarative way to store the desired state of your infrastructure and automated processes to realize the desired state in your production environment.

For example, to deploy a new application you update the repository, and the automated processes perform the actual deployment steps.

When used in operator mode, Service Mesh Manager works flawlessly with GitOps solutions such as Argo CD, and can be used to declaratively manage your service mesh. For a detailed tutorial on setting up Argo CD with Service Mesh Manager, see Install SMM - GitOps - single cluster.

2.1.3 - Istio-operator feature comparison

The Calisti team actively maintains its fully upstream-compatible Istio distribution and several open-source projects and integrations that serve as the basis for Cisco Service Mesh Manager. From the perspective of Istio management, the Calisti team maintains the following:

  • The Istio operator is an open source project, which is the core component involved in Istio control plane and gateway lifecycle management for Cisco Service Mesh Manager.
  • Calisti Service Mesh Manager is a commercial product that includes all the features mentioned in this guide, enterprise support, and optionally integration support for customer environments.
Istio operator Calisti Service Mesh Manager
Install Istio
Manage Istio
Upgrade Istio
Uninstall Istio
Multiple gateways support
Multi cluster support needs some manual steps fully automatic
Prometheus
Grafana
Jaeger
Cert manager
Dashboard
CLI
OIDC authentication
VM integration
Topology graph
Outlier detection
Service Level Objectives
Live access logs
mTLS management
Gateway management
Istio traffic management
Validations
Support Community Enterprise

2.2 - Getting started with the Free Tier

This Getting Started guide helps you access and install the free version of Service Mesh Manager. If you are a paying customer, see Installation for installation options.

To get started with Service Mesh Manager, you will install Service Mesh Manager and a demo application on a single cluster. After that, you can attach other clusters to the mesh and redeploy the demo application to run on multiple clusters.

Free tier limitations

  • The free tier of Service Mesh Manager allows you to use Service Mesh Manager on maximum of two Kubernetes clusters where the total number of worker nodes in your clusters is 10. For details, see Licensing options.
  • SMM Operator helm charts is not supported.

To buy an enterprise license, contact your Cisco sales representative, or directly Cisco Emerging Technologies and Incubation.

Prerequisites

You need a Kubernetes cluster to run Service Mesh Manager. If you don’t already have a Kubernetes cluster to work with, then:

  1. Create a cluster that meets the following resource requirements with your favorite provider.

    CAUTION:

    Supported providers and Kubernetes versions

    The cluster must run a Kubernetes version that Service Mesh Manager supports: Kubernetes 1.21, 1.22, 1.23, 1.24.

    Service Mesh Manager is tested and known to work on the following Kubernetes providers:

    • Amazon Elastic Kubernetes Service (Amazon EKS)
    • Google Kubernetes Engine (GKE)
    • Azure Kubernetes Service (AKS)
    • Red Hat OpenShift 4.11
    • On-premises installation of stock Kubernetes with load balancer support (and optionally PVCs for persistence)

    Calisti resource requirements

    Make sure that your Kubernetes or OpenShift cluster has sufficient resources to install Calisti. The following table shows the number of resources needed on the cluster:

    Resource Required
    CPU - 32 vCPU in total
    - 4 vCPU available for allocation per worker node (If you are testing on a cluster at a cloud provider, use nodes that have at least 4 CPUs, for example, c5.xlarge on AWS.)
    Memory - 64 GiB in total
    - 4 GiB available for allocation per worker node for the Kubernetes cluster (8 GiB in case of the OpenShift cluster)
    Storage 12 GB of ephemeral storage on the Kubernetes worker nodes (for Traces and Metrics)

    These minimum requirements need to be available for allocation within your cluster, in addition to the requirements of any other loads running in your cluster (for example, DaemonSets and Kubernetes node-agents). If Kubernetes cannot allocate sufficient resources to Service Mesh Manager, some pods will remain in Pending state, and Service Mesh Manager will not function properly.

    Enabling additional features, such as High Availability increases this value.

    The default installation, when enough headroom is available in the cluster, should be able to support at least 150 running Pods with the same amount of Services. For setting up Service Mesh Manager for bigger workloads, see scaling Service Mesh Manager.

  2. Set Kubernetes configuration and context.

    The Service Mesh Manager command-line tool uses your current Kubernetes context, as set in the KUBECONFIG environment variable (~/.kube/config by default). Check if the cluster is the same as the one you plan to deploy the Service Mesh Manager. Run the following command:

    kubectl config get-contexts
    

    If there are multiple contexts in the Kubeconfig file, specify the one you want to use with the use-context parameter, for example:

    kubectl config use-context <context-to-use>
    

Preparation

To access and install Calisti, complete the following steps.

  1. You’ll need a Cisco Customer account to download Calisti. If you don’t already have one here’s how to sign up:

    1. Visit the Cisco Account registration page and complete the registration form.
    2. Look out for an email from no-reply@mail-id.cisco.com titled Activate Account and click on the Activate Account button to activate your account.
  2. Download the Calisti command-line tools.

    1. Visit the Calisti download center.
    2. If you’re redirected to the home page, check the upper right-hand corner to see if you’re signed in. If you see a login button go ahead and login using your Cisco Customer account credentials. If, instead, you see “welcome, ” then you are already logged in.
    3. Once you have logged in, navigate to the Calisti download center again.
    4. Read and accept the End-User License Agreement (EULA).
    5. Download the Service Mesh Manager command-line tool (CLI) suitable for your system. The CLI supports macOS and Linux (x86_64). On Windows, install the Windows Subsystem for Linux (WSL) and use the Linux binary.
    6. Extract the archive. The archive contains two binaries, smm for Service Mesh Manager, and supertubes for Streaming Data Manager.
    7. Navigate to the directory where you have extracted the CLI.

    Note: For information on how to download the CLI using ORAS, see Download the CLI using ORAS.

  3. The Calisti download page shows your credentials that you can use to access the Service Mesh Manager and Streaming Data Manager docker images.

    Open a terminal and login to the image registries of Calisti by running the following command. The <your-password> and <your-username> parts contain the access credentials to the registries.

    SMM_REGISTRY_PASSWORD=<your-password> ./smm activate \
      --host=registry.eticloud.io \
      --prefix=smm \
      --user='<your-username>'
    

Install Service Mesh Manager on a single cluster

  1. Run the following command. This will install the main Service Mesh Manager components.

    • On Kubernetes:

      smm install -a
      
    • On OpenShift (for details, see OpenShift integration):

      smm install -a --platform=openshift
      

    Note: If you are installing Service Mesh Manager on a managed Kubernetes solution of a public cloud provider (for example, Amazon EKS, AKS, or GKE) or kOps, the cluster name auto-discovered by Service Mesh Manager is incompatible with Kubernetes resource naming restrictions and Istio’s method of identifying clusters in a multicluster mesh.

    In earlier Service Mesh Manager versions, you had to manually use the --cluster-name parameter to set a cluster name that complies with the RFC 1123 DNS subdomain/label format (alphanumeric string without “_” or “.” characters). Starting with Service Mesh Manager version 1.11, non-compliant names are automatically converted using the following rules:

    • Replace ‘_’ characters with ‘-’
    • Replace ‘.’ characters with ‘-’
    • Replace ‘:’ characters with ‘-’
    • Truncate the name to 63 characters

    Calisti supports KUBECONFIG contexts having the following authentication methods:

    • certfile and keyfile
    • certdata and keydata
    • bearer token
    • exec/auth provider

    Username-password pairs are not supported.

    If you are installing Service Mesh Manager in a test environment, you can install it without requiring authentication by running:

    smm install --anonymous-auth -a
    

    If you experience errors during the installation, try running the installation in verbose mode: smm install -v

  2. Wait until the installation is completed. This can take a few minutes.

  3. (Optional) If you don’t already have Istio workload and traffic on this cluster, install the demo application:

    smm demoapp install
    
  4. Run the following command to open the dashboard. If you don’t already have Istio workload and traffic, the dashboard will be empty.

    smm dashboard
    

    The Service Mesh Manager Dashboard for your Istio service mesh

  5. (Optional)

    If you are installing Service Mesh Manager on a managed Kubernetes solution of a public cloud provider (for example, AWS, Azure, or Google Cloud), assign admin roles, so that you can tail the logs of your containers from the Service Mesh Manager UI, use Service Level Objectives and perform various tasks from the CLI that require custom permissions. Run the following command:

    kubectl create clusterrolebinding user-cluster-admin --clusterrole=cluster-admin --user=<gcp/aws/azure username>
    

    CAUTION:

    Assigning administrator roles might be very dangerous because it gives wide access to your infrastructure. Be careful and do that only when you’re confident in what you’re doing.

  6. At this point, Service Mesh Manager is up and running. On the dashboard select MENU > TOPOLOGY to see how the traffic flows through your mesh, and experiment with any of the available features described in the documentation.

    The Service Mesh Manager demo application topology The Service Mesh Manager demo application topology

  7. To evaluate Streaming Data Manager, see Getting tarted with Streaming Data Manager.

Next steps

To install applications into the Calisti service mesh, see Deploy custom application into the mesh.

Get help

If you run into errors, experience problems, or just have a question or feedback while using the Free Tier of Service Mesh Manager, visit our Application Networking and Observability community site.

Support details for the Pro and Enterprise Tiers are provided in the purchased plan.

2.3 - Installation

To evaluate the services Service Mesh Manager offers, we recommend using the free edition of Service Mesh Manager in a test environment and using our demo application. This way you can start over any time, and try all the options you are interested in without having to worry about changes made to your existing environment, even if it’s not used in production.

Production installation is very similar, but of course you won’t need to deploy the demo application, and you must exactly specify which components you want to use.

2.3.1 - Prerequisites

Before deploying Service Mesh Manager on your cluster, complete the following tasks.

Create a cluster

You need a Kubernetes cluster to run Service Mesh Manager (and optionally, Streaming Data Manager). If you don’t already have a Kubernetes cluster to work with, create one with one of the methods described in Create a test cluster.

CAUTION:

Supported providers and Kubernetes versions

The cluster must run a Kubernetes version that Service Mesh Manager supports: Kubernetes 1.21, 1.22, 1.23, 1.24.

Service Mesh Manager is tested and known to work on the following Kubernetes providers:

  • Amazon Elastic Kubernetes Service (Amazon EKS)
  • Google Kubernetes Engine (GKE)
  • Azure Kubernetes Service (AKS)
  • Red Hat OpenShift 4.11
  • On-premises installation of stock Kubernetes with load balancer support (and optionally PVCs for persistence)

Calisti resource requirements

Make sure that your Kubernetes or OpenShift cluster has sufficient resources to install Calisti. The following table shows the number of resources needed on the cluster:

Resource Required
CPU - 32 vCPU in total
- 4 vCPU available for allocation per worker node (If you are testing on a cluster at a cloud provider, use nodes that have at least 4 CPUs, for example, c5.xlarge on AWS.)
Memory - 64 GiB in total
- 4 GiB available for allocation per worker node for the Kubernetes cluster (8 GiB in case of the OpenShift cluster)
Storage 12 GB of ephemeral storage on the Kubernetes worker nodes (for Traces and Metrics)

These minimum requirements need to be available for allocation within your cluster, in addition to the requirements of any other loads running in your cluster (for example, DaemonSets and Kubernetes node-agents). If Kubernetes cannot allocate sufficient resources to Service Mesh Manager, some pods will remain in Pending state, and Service Mesh Manager will not function properly.

Enabling additional features, such as High Availability increases this value.

The default installation, when enough headroom is available in the cluster, should be able to support at least 150 running Pods with the same amount of Services. For setting up Service Mesh Manager for bigger workloads, see scaling Service Mesh Manager.

Install the Service Mesh Manager tool

Install the Service Mesh Manager command-line tool. You can use the Service Mesh Manager CLI tool to install Service Mesh Manager and other components on your cluster.

Note: The Service Mesh Manager CLI supports macOS and Linux (x86_64). On Windows, install the Windows Subsystem for Linux (WSL) and use the Linux binary.

  1. Install the Service Mesh Manager CLI for your environment.

  2. Set Kubernetes configuration and context.

    The Service Mesh Manager command-line tool uses your current Kubernetes context, as set in the KUBECONFIG environment variable (~/.kube/config by default). Check if the cluster is the same as the one you plan to deploy the Service Mesh Manager. Run the following command:

    kubectl config get-contexts
    

    If there are multiple contexts in the Kubeconfig file, specify the one you want to use with the use-context parameter, for example:

    kubectl config use-context <context-to-use>
    

Deploy Service Mesh Manager

After you have completed the previous steps, you can install Service Mesh Manager on a single cluster, or you can form a multi-cluster mesh right away.

Note: The default version of Service Mesh Manager is built with the standard SSL libraries. To use a FIPS-compliant version of Istio, see Install FIPS images.

Select the installation method you want to use:

You can install Service Mesh Manager on a single cluster first, and attach additional clusters later to form a multi-cluster mesh.

2.3.1.1 - Accessing the Service Mesh Manager binaries

To evaluate Service Mesh Manager we recommend using the free tier option.

If you don’t already have a Cisco Customer Identity (CCI) account, you’ll also have to complete a brief sign-up procedure.

To access the CLI binaries, you can either download it from the Service Mesh Manager download page or from registry.eticloud.io using ORAS.

Download the CLI

  1. Visit the Calisti download center.
  2. If you’re redirected to the home page, check the upper right-hand corner to see if you’re signed in. If you see a login button go ahead and login using your Cisco Customer account credentials. If, instead, you see “welcome, ” then you are already logged in.
  3. Once you have logged in, navigate to the Calisti download center again.
  4. Read and accept the End-User License Agreement (EULA).
  5. Download the Service Mesh Manager command-line tool (CLI) suitable for your system. The CLI supports macOS and Linux (x86_64). On Windows, install the Windows Subsystem for Linux (WSL) and use the Linux binary.
  6. Extract the archive. The archive contains two binaries, smm for Service Mesh Manager, and supertubes for Streaming Data Manager.
  7. Navigate to the directory where you have extracted the CLI.

Download the CLI using ORAS

To install the Service Mesh Manager CLI using ORAS, complete the following steps.

  1. Install OCI Registry As Storage (ORAS). For details, see the ORAS installation guide for your operating system. For example, on macOS you can run brew install oras

  2. Log in to registry.eticloud.io using ORAS. You can find your credentials and the activation command on the Service Mesh Manager download page. (If you haven’t registered yet, sign up on the Service Mesh Manager page).

    Run the following command to log in, then enter your username and password.

    oras login registry.eticloud.io
    
  3. Download the Service Mesh Manager CLI by running:

    oras pull registry.eticloud.io/smm/smm-cli:v1.12.0
    
  4. To manage Apache Kafka installations using Streaming Data Manager, download the Streaming Data Manager command-line tool (called supertubes-cli) as well.

    oras pull registry.eticloud.io/sdm/supertubes-cli:v1.12.0
    
  5. Extract the archive for your operating system.

  6. Navigate to the directory where you have extracted the CLI.

Activate the CLI

Due to legal requirements the docker images for Service Mesh Manager are stored in a docker registry requiring authentication. Service Mesh Manager has built-in support for transparently performing this authentication. For this feature to work you must “activate” the CLI on every workstation that will be used to install, upgrade, or change the Service Mesh Manager deployment. For using the dashboard or any other CLI command this activation step can be skipped.

You can find your credentials and the activation command on the Service Mesh Manager download page.

Open a terminal and login to the image registries of Calisti by running the following command. The <your-password> and <your-username> parts contain the access credentials to the registries.

SMM_REGISTRY_PASSWORD=<your-password> ./smm activate \
  --host=registry.eticloud.io \
  --prefix=smm \
  --user='<your-username>'

After the activation, you can install Service Mesh Manager on a single cluster or multiple clusters, or manage an existing installation.

2.3.1.2 - Create a test cluster

You need a Kubernetes cluster to run Service Mesh Manager. If you don’t already have a Kubernetes cluster to work with, create one with one of the following methods.

  • Run locally (~5 minutes): Deploy Service Mesh Manager to a single-node Kubernetes cluster running on your development machine.
  • Run on a Kubernetes cluster (~10 minutes): Deploy Service Mesh Manager to a Kubernetes cluster of your choice.

Run Service Mesh Manager locally

Recommended if you don’t have or don’t want to create a Kubernetes cluster, but want to try out Service Mesh Manager quickly.

  1. Install one of the following tools to run a Kubernetes cluster locally:

  2. Ensure that the local Kubernetes cluster meets the following requirements:

CAUTION:

Supported providers and Kubernetes versions

The cluster must run a Kubernetes version that Service Mesh Manager supports: Kubernetes 1.21, 1.22, 1.23, 1.24.

Service Mesh Manager is tested and known to work on the following Kubernetes providers:

  • Amazon Elastic Kubernetes Service (Amazon EKS)
  • Google Kubernetes Engine (GKE)
  • Azure Kubernetes Service (AKS)
  • Red Hat OpenShift 4.11
  • On-premises installation of stock Kubernetes with load balancer support (and optionally PVCs for persistence)

Calisti resource requirements

Make sure that your Kubernetes or OpenShift cluster has sufficient resources to install Calisti. The following table shows the number of resources needed on the cluster:

Resource Required
CPU - 32 vCPU in total
- 4 vCPU available for allocation per worker node (If you are testing on a cluster at a cloud provider, use nodes that have at least 4 CPUs, for example, c5.xlarge on AWS.)
Memory - 64 GiB in total
- 4 GiB available for allocation per worker node for the Kubernetes cluster (8 GiB in case of the OpenShift cluster)
Storage 12 GB of ephemeral storage on the Kubernetes worker nodes (for Traces and Metrics)

These minimum requirements need to be available for allocation within your cluster, in addition to the requirements of any other loads running in your cluster (for example, DaemonSets and Kubernetes node-agents). If Kubernetes cannot allocate sufficient resources to Service Mesh Manager, some pods will remain in Pending state, and Service Mesh Manager will not function properly.

Enabling additional features, such as High Availability increases this value.

The default installation, when enough headroom is available in the cluster, should be able to support at least 150 running Pods with the same amount of Services. For setting up Service Mesh Manager for bigger workloads, see scaling Service Mesh Manager.

  1. Launch the local Kubernetes cluster with one of the following tools:

    • minikube:

      minikube start
      
    • Docker for Desktop: In preferences, choose Enable Kubernetes.

    • kind:

      kind create cluster
      
  2. Proceed to Install Service Mesh Manager.

  3. When you’re done experimenting, you can remove the demo application, Service Mesh Manager, and Istio from your cluster with the following command, which removes all of these components in the correct order:

    smm uninstall -a
    

    Note: Uninstalling Service Mesh Manager does not remove the Custom Resource Definitions (CRDs) from the cluster, because removing a CRD removes all related resources. Since Service Mesh Manager uses several external components, this could remove things not belonging to Service Mesh Manager.

Run on a Kubernetes cluster

Recommended if you have a Kubernetes cluster and want to try out Service Mesh Manager quickly.

  1. Create a cluster that meets the following resource requirements with your favorite provider.

    CAUTION:

    Supported providers and Kubernetes versions

    The cluster must run a Kubernetes version that Service Mesh Manager supports: Kubernetes 1.21, 1.22, 1.23, 1.24.

    Service Mesh Manager is tested and known to work on the following Kubernetes providers:

    • Amazon Elastic Kubernetes Service (Amazon EKS)
    • Google Kubernetes Engine (GKE)
    • Azure Kubernetes Service (AKS)
    • Red Hat OpenShift 4.11
    • On-premises installation of stock Kubernetes with load balancer support (and optionally PVCs for persistence)

    Calisti resource requirements

    Make sure that your Kubernetes or OpenShift cluster has sufficient resources to install Calisti. The following table shows the number of resources needed on the cluster:

    Resource Required
    CPU - 32 vCPU in total
    - 4 vCPU available for allocation per worker node (If you are testing on a cluster at a cloud provider, use nodes that have at least 4 CPUs, for example, c5.xlarge on AWS.)
    Memory - 64 GiB in total
    - 4 GiB available for allocation per worker node for the Kubernetes cluster (8 GiB in case of the OpenShift cluster)
    Storage 12 GB of ephemeral storage on the Kubernetes worker nodes (for Traces and Metrics)

    These minimum requirements need to be available for allocation within your cluster, in addition to the requirements of any other loads running in your cluster (for example, DaemonSets and Kubernetes node-agents). If Kubernetes cannot allocate sufficient resources to Service Mesh Manager, some pods will remain in Pending state, and Service Mesh Manager will not function properly.

    Enabling additional features, such as High Availability increases this value.

    The default installation, when enough headroom is available in the cluster, should be able to support at least 150 running Pods with the same amount of Services. For setting up Service Mesh Manager for bigger workloads, see scaling Service Mesh Manager.

  2. Set Kubernetes configuration and context.

    The Service Mesh Manager command-line tool uses your current Kubernetes context, as set in the KUBECONFIG environment variable (~/.kube/config by default). Check if the cluster is the same as the one you plan to deploy the Service Mesh Manager. Run the following command:

    kubectl config get-contexts
    

    If there are multiple contexts in the Kubeconfig file, specify the one you want to use with the use-context parameter, for example:

    kubectl config use-context <context-to-use>
    
  3. Proceed to Install Service Mesh Manager.

2.3.2 - Licensing options

Service Mesh Manager is available in two different editions:

Tier Capacity Support
Free Maximum of 10 nodes total in maximum 2 two clusters. Community
Pro Maximum of 25 nodes total in any number of clusters. Cisco
Enterprise Unlimited, per node pricing. Cisco

Free tier

The free tier allows using 10 nodes in maximum 2 clusters inside the same mesh (active/active or active/passive setups are allowed).

For example, you can use a single cluster with 10 worker nodes (counting the active nodes too), or you can have an active cluster with 6 worker nodes and a passive cluster attached with 4 additional nodes.

If you exceed these limits, the management functionalities of Service Mesh Manager become restricted until the number of nodes and clusters are back to the allowed values:

  • CLI commands for installing Service Mesh Manager and attaching clusters to the mesh will fail until the node count has been decreased.

    Note: The uninstall command works regardless of the node count.

  • The dashboard shows an error detailing the license violations when over limits. As Kubernetes is dynamic in nature, you can exceed the node limit by 1 for one day, so you can keep using the Service Mesh Manager dashboard during node rotations.

The Istio data plane is available regardless the number of nodes, ensuring that no production outage happens due to license violation.

To register for free-tier access, see Getting started with the Free Tier.

To buy a Pro (paid-tier) license, visit the Service Mesh Manager website. If you have purchased a Pro license for Service Mesh Manager, you have to apply the license to your Service Mesh Manager installations. To achieve that, complete the following steps.

  1. Copy your license token into a file (for example, license.key).

  2. Apply the license to your Service Mesh Manager installation. If you have a multi-cluster setup, apply it to the primary cluster (where the Service Mesh Manager control plane is running). Run the following command:

    smm license apply --licenseKeyPath <license.key>
    

    Note: If you are using Service Mesh Manager with a commercial license in a multi-cluster scenario, Service Mesh Manager automatically synchronizes the license to the attached clusters. If the peer cluster already has a license, it is automatically deleted and replaced with the license of the primary Service Mesh Manager cluster. Detaching a peer cluster automatically deletes the license from the peer cluster.

  3. Run the following command to verify that the new license has been added to the Service Mesh Manager installation.

    smm license list
    

    Alternatively, you can open the Service Mesh Manager dashboard, open the user account in the top-right, then select License to display the details of the license.

    The details of the license include:

    • the number of permitted clusters (MaxClusters) in the mesh,
    • the total number of permitted nodes in the mesh (MaxNodes), and
    • the number of maximum permitted nodes for a cluster (MaxNodesPerCluster).

Exceeding the paid-tier license limit

In case you exceed the license limit for a paid-tier license, you lose access to the Service Mesh Manager dashboard until you decrease the size of the mesh to comply with the license limits.

The Istio data plane is available regardless the number of nodes, ensuring that no production outage happens due to license violation.

Enterprise license

To buy an enterprise license, contact your Cisco sales representative, or directly the Service Mesh Manager sales team.

To apply an enterprise license to your Service Mesh Manager installations, follow the steps described for the Paid tier.

2.3.3 - Create single cluster mesh

Prerequisites

You need the Service Mesh Manager CLI tool installed on your computer and a Kubernetes cluster as described in the Prerequisites section.

Install Service Mesh Manager

For a quick demo or evaluation, complete the following steps to install Service Mesh Manager with every component, including the demo application. If you prefer a more interactive installation, see Installing Service Mesh Manager interactively.

Note: If you are installing Service Mesh Manager on a managed Kubernetes solution of a public cloud provider (for example, Amazon EKS, AKS, or GKE) or kOps, the cluster name auto-discovered by Service Mesh Manager is incompatible with Kubernetes resource naming restrictions and Istio’s method of identifying clusters in a multicluster mesh.

In earlier Service Mesh Manager versions, you had to manually use the --cluster-name parameter to set a cluster name that complies with the RFC 1123 DNS subdomain/label format (alphanumeric string without “_” or “.” characters). Starting with Service Mesh Manager version 1.11, non-compliant names are automatically converted using the following rules:

  • Replace ‘_’ characters with ‘-’
  • Replace ‘.’ characters with ‘-’
  • Replace ‘:’ characters with ‘-’
  • Truncate the name to 63 characters
  1. Run the following command. This will install the main Service Mesh Manager components.

    • On OpenShift (for details, see OpenShift integration):

      smm install -a --platform=openshift
      
    • Otherwise, run

      smm install -a
      

    Calisti supports KUBECONFIG contexts having the following authentication methods:

    • certfile and keyfile
    • certdata and keydata
    • bearer token
    • exec/auth provider

    Username-password pairs are not supported.

    If you are installing Service Mesh Manager in a test environment, you can install it without requiring authentication by running:

    smm install --anonymous-auth -a
    

    If you experience errors during the installation, try running the installation in verbose mode: smm install -v

    Note: If you are installing Service Mesh Manager on a local cluster (for example, using MiniKube) and you don’t have a local LoadBalancer setup, disable the meshexpansion gateway support. To do that, create a file called local_icp_cr.yaml with the following content:

    apiVersion: servicemesh.cisco.com/v1alpha1
    kind: IstioControlPlane
    metadata:
        name: mesh
        namespace: istio-system
    spec:
      meshExpansion:
        enabled: false
    

    Then, run the following command: smm install --istio-cr-file local_icp_cr.yaml

  2. Wait until the installation is completed. This can take a few minutes. Run the following command to open the dashboard.

    smm dashboard
    

    The Service Mesh Manager Dashboard for your Istio service mesh

    If you don’t already have Istio workload and traffic, the dashboard will be empty. To install the demo application, run:

    smm demoapp install
    

    After installation, the demo application automatically starts generating traffic, and the dashboard draws a picture of the data flow. (If it doesn’t, run the smm demoapp load start command, or Generate load on the UI. If you want to stop generating traffic, run smm demoapp load stop.)

  3. If you are installing Service Mesh Manager on a managed Kubernetes solution of a public cloud provider (for example, AWS, Azure, or Google Cloud), assign admin roles, so that you can tail the logs of your containers from the Service Mesh Manager UI, use Service Level Objectives and perform various tasks from the CLI that require custom permissions. Run the following command:

    kubectl create clusterrolebinding user-cluster-admin --clusterrole=cluster-admin --user=<gcp/aws/azure username>
    

    CAUTION:

    Assigning administrator roles might be very dangerous because it gives wide access to your infrastructure. Be careful and do that only when you’re confident in what you’re doing.
  4. At this point, Service Mesh Manager is up and running. On the dashboard select MENU > TOPOLOGY to see how the traffic flows through your mesh, and experiment with any of the available features described in the documentation.

  5. If you have purchased a commercial license for Service Mesh Manager, apply the license. For details, see Paid tier.

Install Service Mesh Manager interactively

With the interactive installation, you can:

  • Install the Service Mesh Manager core, which provides a dashboard and an internal API for handling the service mesh.
  • Install and execute the Istio operator.
  • Install a demo application (optional).

Complete the following steps.

  1. Start the installation.

    smm install
    

    If you experience errors during the installation, try running the installation in verbose mode: smm install -v

    During installation, answer the interactive questions in the terminal.

    ? Install istio-operator (recommended). Press enter to accept Yes
    ? Install cert-manager (recommended). Press enter to accept Yes
    ? Install Streaming Data Manager (optional). Press enter to skip Yes
    ? Install and run demo application (optional). Press enter to skip Yes
    

    Note: If you don’t need the demo application, you can simply accept the defaults by pressing enter for each question as it will only install the core components. You can install additional components later.

  2. Wait until the installation is completed. This can take a few minutes. If you have selected to install the demo application, the Service Mesh Manager dashboard automatically opens in your browser. Otherwise, run the following command to open the dashboard.

    smm dashboard
    

    If you don’t already have Istio workload and traffic, the dashboard will be empty. To install the demo application, run:

    smm demoapp install
    

    After installation, the demo application automatically starts generating traffic, and the dashboard draws a picture of the data flow. (If it doesn’t, run the smm demoapp load start command, or Generate load on the UI. If you want to stop generating traffic, run smm demoapp load stop.)

  3. If you are installing Service Mesh Manager on a managed Kubernetes solution of a public cloud provider (for example, AWS, Azure, or Google Cloud), assign admin roles, so that you can tail the logs of your containers from the Service Mesh Manager UI, use Service Level Objectives and perform various tasks from the CLI that require custom permissions. Run the following command:

    kubectl create clusterrolebinding user-cluster-admin --clusterrole=cluster-admin --user=<gcp/aws/azure username>
    

    CAUTION:

    Assigning administrator roles might be very dangerous because it gives wide access to your infrastructure. Be careful and do that only when you’re confident in what you’re doing.
  4. At this point, Service Mesh Manager is up and running. On the dashboard select MENU > TOPOLOGY to see how the traffic flows through your mesh, and experiment with any of the available features described in the documentation.

  5. If you have purchased a commercial license for Service Mesh Manager, apply the license. For details, see Paid tier.

2.3.4 - Create multi-cluster mesh

Prerequisites

To create a multi-cluster mesh with Service Mesh Manager, you need:

  • At least two Kubernetes clusters, with access to their kubeconfig files.
  • The Service Mesh Manager CLI tool installed on your computer.
  • Network connectivity properly configured between the participating clusters.

Create a multi-cluster mesh

To create a multi-cluster mesh with Service Mesh Manager, complete the following steps.

  1. Install Service Mesh Manager to the primary cluster using the following command. This will install all Service Mesh Manager components to the cluster. Run smm install -a

    Note: If you are installing Service Mesh Manager on a managed Kubernetes solution of a public cloud provider (for example, Amazon EKS, AKS, or GKE) or kOps, the cluster name auto-discovered by Service Mesh Manager is incompatible with Kubernetes resource naming restrictions and Istio’s method of identifying clusters in a multicluster mesh.

    In earlier Service Mesh Manager versions, you had to manually use the --cluster-name parameter to set a cluster name that complies with the RFC 1123 DNS subdomain/label format (alphanumeric string without “_” or “.” characters). Starting with Service Mesh Manager version 1.11, non-compliant names are automatically converted using the following rules:

    • Replace ‘_’ characters with ‘-’
    • Replace ‘.’ characters with ‘-’
    • Replace ‘:’ characters with ‘-’
    • Truncate the name to 63 characters

    If you experience errors during the installation, try running the installation in verbose mode: smm install -v

    Calisti supports KUBECONFIG contexts having the following authentication methods:

    • certfile and keyfile
    • certdata and keydata
    • bearer token
    • exec/auth provider

    Username-password pairs are not supported.

    If you are installing Service Mesh Manager in a test environment, you can install it without requiring authentication by running:

    smm install --anonymous-auth -a
    
  2. On the primary Service Mesh Manager cluster, attach the peer cluster to the mesh using one of the following commands.

    Note: To understand the difference between the remote Istio and primary Istio clusters, see the Istio control plane models section in the official Istio documentation. The short summary is that remote Istio clusters do not have a separate Istio control plane, while primary Istio clusters do.

    The following commands automate the process of creating the resources necessary for the peer cluster, generate and set up the kubeconfig for that cluster, and attach the cluster to the mesh.

    • To attach a remote Istio cluster with the default options, run:

      smm istio cluster attach <PEER_CLUSTER_KUBECONFIG_FILE>
      
    • To attach a primary Istio cluster (one that has an active Istio control plane installed), run:

      smm istio cluster attach <PEER_CLUSTER_KUBECONFIG_FILE> --active-istio-control-plane
      

      Note: If the name of the cluster cannot be used as a Kubernetes resource name (for example, because it contains the underscore, colon, or another special character), you must manually specify a name to use when you are attaching the cluster to the service mesh. For example:

      smm istio cluster attach <PEER_CLUSTER_KUBECONFIG_FILE> --name <KUBERNETES_COMPLIANT_CLUSTER_NAME> --active-istio-control-plane
      

      Otherwise, the following error occurs when you try to attach the cluster:

      could not attach peer cluster: graphql: Secret "example-secret" is invalid: metadata.name: Invalid value: "gke_gcp-cluster_region": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.'**
      
  3. Verify the name that will be used to refer to the cluster in the mesh. To use the name of the cluster, press Enter.

    Note: If you are installing Service Mesh Manager on a managed Kubernetes solution of a public cloud provider (for example, Amazon EKS, AKS, or GKE) or kOps, the cluster name auto-discovered by Service Mesh Manager is incompatible with Kubernetes resource naming restrictions and Istio’s method of identifying clusters in a multicluster mesh.

    In earlier Service Mesh Manager versions, you had to manually use the --cluster-name parameter to set a cluster name that complies with the RFC 1123 DNS subdomain/label format (alphanumeric string without “_” or “.” characters). Starting with Service Mesh Manager version 1.11, non-compliant names are automatically converted using the following rules:

    • Replace ‘_’ characters with ‘-’
    • Replace ‘.’ characters with ‘-’
    • Replace ‘:’ characters with ‘-’
    • Truncate the name to 63 characters
    ? Cluster must be registered. Please enter the name of the cluster (<current-name-of-the-cluster>)
    
  4. Wait until the peer cluster is attached. Attaching the peer cluster takes some time, because it can be completed only after the ingress gateway address works. You can verify that the peer cluster is attached successfully with the following command:

    smm istio cluster status
    

    The process is finished when you see Available in the Status field of all clusters.

    To attach other clusters, or to customize the network settings of the cluster, see Attach a new cluster to the mesh.

  5. Deploy the demo application. You can deploy the demo application in a distributed way to multiple clusters with the following commands:

    smm demoapp install -s frontpage,catalog,bookings,postgresql
    smm -c <PEER_CLUSTER_KUBECONFIG_FILE> demoapp install -s movies,payments,notifications,analytics,database,mysql --peer
    

    After installation, the demo application automatically starts generating traffic, and the dashboard draws a picture of the data flow. (If it doesn’t, run the smm demoapp load start command, or Generate load on the UI. If you want to stop generating traffic, run smm demoapp load stop.)

    If you are looking to deploy your own application, check out Deploy custom application for some guidelines.

  6. If you are installing Service Mesh Manager on a managed Kubernetes solution of a public cloud provider (for example, AWS, Azure, or Google Cloud), assign admin roles, so that you can tail the logs of your containers from the Service Mesh Manager UI, use Service Level Objectives and perform various tasks from the CLI that require custom permissions. Run the following command:

    kubectl create clusterrolebinding user-cluster-admin --clusterrole=cluster-admin --user=<gcp/aws/azure username>
    

    CAUTION:

    Assigning administrator roles might be very dangerous because it gives wide access to your infrastructure. Be careful and do that only when you’re confident in what you’re doing.
  7. Open the dashboard and look around.

    smm dashboard
    
  8. If you have purchased a commercial license for Service Mesh Manager, apply the license. For details, see Paid tier.

Kafka in the multi-cluster service mesh

You can install Streaming Data Manager on the primary Calisti cluster. After the installation, you can make the Kafka brokers on the primary cluster accessible from the peer clusters by setting up DNS resolution for them, as shown in the following section.

Kafka Broker Service DNS resolution

The Kafka brokers are accessible via any cluster in the service mesh. However, DNS resolution for the broker services is required for workloads in the peer clusters for the traffic to make it to the sidecar proxies. To achieve this, you can either:

  • use Istio Proxy DNS (required to be setup in the global service mesh config), or
  • add Kubernetes services for the Kafka brokers in the peer clusters.

The following solution uses the cluster-registry-controller to synchronize the services between clusters.

  1. On the primary Calisti cluster with the Kafka brokers, run:

    cat <<EOF | kubectl apply -f -
    apiVersion: clusterregistry.k8s.cisco.com/v1alpha1
    kind: ClusterFeature
    metadata:
      name: kafka-source
    spec:
      featureName: smm.k8s.cisco.com/kafka-source
    EOF
    
  2. On the peer clusters that need access to the Kafka brokers, run:

    kubectl create ns kafka --kubeconfig <PEER_CLUSTER_KUBECONFIG_FILE>
    cat <<EOF | kubectl apply --kubeconfig <PEER_CLUSTER_KUBECONFIG_FILE> -f -
    apiVersion: clusterregistry.k8s.cisco.com/v1alpha1
    kind: ResourceSyncRule
    metadata:
      labels:
      name: kafka-service-sink
    spec:
      clusterFeatureMatch:
      - featureName: smm.k8s.cisco.com/kafka-source
      groupVersionKind:
        kind: Service
        version: v1
      rules:
      - match:
        - labels:
          - matchLabels:
              app: kafka
              kafka_cr: kafka
          objectKey: {}
        mutations:
          overrides:
          - path: /spec/clusterIP
            type: remove
          - path: /spec/clusterIPs?
            type: remove
    EOF
    

Turn on Kafka client functionality in demoapp workloads

The demo application is made up of a general data traffic generator project called allspark. It has the capability to be configured as Kafka clients. The following commands setup the bookings service to produce to the KafkaTopic recommendations-topic. This topic was automatically created by the demoapp install command if run while Streaming Data Manager is installed.

kubectl set env deploy/bookings -c bookings REQUESTS="http://analytics:8080/#1 http://payments:8080/#1 kafka-produce://kafka-all-broker.kafka.svc.cluster.local:29092/recommendations-topic?message=bookings-message#1"

Setup the movies-v3 service in the peer cluster to consume from the recommendations-topic topic.

kubectl set env deploy/movies-v3 -c movies --kubeconfig <PEER_CLUSTER_KUBECONFIG_FILE> KAFKASERVER_BOOTSTRAP_SERVER=kafka-all-broker.kafka.svc.cluster.local:29092 KAFKASERVER_TOPIC=recommendations-topic KAFKASERVER_CONSUMER_GROUP=recommendations-group

After the pods restart, enable the smm-demo and kafka namespaces on the Calisti dashboard to see the Kafka client traffic.

Kafka traffic with the demo application in a multi-cluster service mesh Kafka traffic with the demo application in a multi-cluster service mesh

Cleanup

  1. To remove the demo application from a peer cluster, run the following command:

    smm -c <PEER_CLUSTER_KUBECONFIG_FILE> demoapp uninstall
    
  2. To remove a peer cluster from the mesh, run the following command:

    smm istio cluster detach <PEER_CLUSTER_KUBECONFIG_FILE>
    

For details, see Detach a cluster from the mesh.

2.3.5 - SMM Operator helm charts

You can deploy Service Mesh Manager by using Helm with the SMM Operator chart.

SMM Operator is a Kubernetes operator to deploy and manage Service Mesh Manager. In this chart the CRD is not managed by the operator, and we expect CI/CD tools to take care of updating CRD.

CAUTION:

Installing Service Mesh Manager by using the smm-operator is recommended only for advanced users. In general, the recommended method is to install Service Mesh Manager by using the smm CLI tool, as the tool handles all the integration and setup tasks of the ControlPlane resource.

2.3.5.1 - Install SMM with the SMM Operator chart

SMM Operator is a Kubernetes operator to deploy and manage Service Mesh Manager. In this chart the CRD is not managed by the operator, and we expect CI/CD tools to take care of updating CRD.

In case you have your own cluster deployed and are authorized to fetch images from the Cisco provided repositories, then you can rely on BasicAuth(url, username, password) for authentication required to pull images.

You can get a Username and Password by signing up for the Free tier version of Service Mesh Manager.

Prerequisites

Helm version 3.7 or newer.

Steps

  1. Create the cert-manager namespace:

    kubectl create ns cert-manager
    
  2. Run the following helm commands. Replace <your-username> and <your-password> with the ones shown on your Service Mesh Manager download page.

    export HELM_EXPERIMENTAL_OCI=1
    echo <your-password> | helm registry login registry.eticloud.io -u '<your-username>' --password-stdin
    
    helm pull oci://registry.eticloud.io/smm-charts/smm-operator --version 1.12.0
    
    helm install \
      --create-namespace \
      --namespace=smm-registry-access \
      --set "global.ecr.enabled=false" \
      --set "global.basicAuth.username=<your-username>" \
      --set "global.basicAuth.password=<your-password>" \
      smm-operator \
      oci://registry.eticloud.io/smm-charts/smm-operator --version 1.12.0
    

    For multi-cluster setups, the Kubernetes API server address of one cluster must be reachable from other clusters. The API server addresses are private for certain clusters (for example, OpenShift) and not reachable by default from other clusters. In such case, use the --set "apiServerEndpointAddress=<PUBLIC_API_SERVER_ENDPOINT_ADDRESS>" flag to provide an address that’s reachable from the other clusters. This can be a public address, or one that’s routable from the other clusters.

    Expected output:

      Pulled: registry.eticloud.io/smm-charts/smm-operator:1.12.0
      Digest: sha256:c67150bca937103db8831d73574d695aace034590c55569bdc60c58d400f7a5b
      NAME: smm-operator
      LAST DEPLOYED: Thu May  4 09:23:14 2023
      NAMESPACE: smm-registry-access
      STATUS: deployed
      REVISION: 1
      TEST SUITE: None
    

    (The smm-registry-access namespace is used because smm-operator should be in the same namespace as the imagepullsecrets-controller.)

    Verify the helm install

    helm list -n smm-registry-access
    

    Expected output:

    NAME        	NAMESPACE          	REVISION	UPDATED                              	STATUS  	CHART                   	APP VERSION
    smm-operator	smm-registry-access	1       	2023-05-04 09:23:14.681227 +0200 CEST	deployed	smm-operator-1.12.0	v1.12.0
    

    Verify the operator pod is up and running

    kubectl get pods -n smm-registry-access
    

    Expected output:

    NAME             READY   STATUS    RESTARTS   AGE
    smm-operator-0   2/2     Running   0          5m5s
    
  3. Install Service Mesh Manager by creating a ControlPlane resource. We recommend that you start with the following ControlPlane resource. This CR assumes that you are using docker-registry authentication, and the secret referenced in the .spec.registryAccess is used to pull smm-operator image and sync across other namespaces created by the smm-operator chart.

    For OpenShift 4.11 installation set the spec.platform=openshift field.

    Replace <cluster-name> with the name of your cluster. The cluster name format must comply with the RFC 1123 DNS subdomain/label format (alphanumeric string without “_” or “.” characters). Otherwise, you get an error message starting with: Reconciler error: cannot determine cluster name controller=controlplane, controllerGroup=smm.cisco.com, controllerKind=ControlPlane

    kubectl apply -f - << EOF
    apiVersion: smm.cisco.com/v1alpha1
    kind: ControlPlane
    metadata:
      name: smm
    spec:
      clusterName: <cluster-name>
      certManager:
        enabled: true
        namespace: cert-manager
      clusterRegistry:
        enabled: true
        namespace: cluster-registry
      log: {}
      meshManager:
        enabled: true
        istio:
          enabled: true
          istioCRRef:
            name: cp-v115x
            namespace: istio-system
          operators:
            namespace: smm-system
        namespace: smm-system
      nodeExporter:
        enabled: true
        namespace: smm-system
        psp:
          enabled: false
        rbac:
          enabled: true
      oneEye: {}
      registryAccess:
        enabled: true
        imagePullSecretsController: {}
        namespace: smm-registry-access
        pullSecrets:
          - name: smm-registry.eticloud.io-pull-secret
            namespace: smm-registry-access
      repositoryOverride:
        host: registry.eticloud.io
        prefix: smm
      role: active
      smm:
        als:
          enabled: true
          log: {}
        application:
          enabled: true
          log: {}
        auth:
          mode: impersonation
        certManager:
          enabled: true
        enabled: true
        federationGateway:
          enabled: true
          name: smm
          service:
            enabled: true
            name: smm-federation-gateway
            port: 80
        federationGatewayOperator:
          enabled: true
        impersonation:
          enabled: true
        istio:
          revision: cp-v115x.istio-system
        leo:
          enabled: true
          log: {}
        log: {}
        namespace: smm-system
        prometheus:
          enabled: true
          replicas: 1
        prometheusOperator: {}
        releaseName: smm
        role: active
        sre:
          enabled: true
        useIstioResources: true
    EOF
    
  4. Verify all pods in the smm-system namespace is up and running

    kubectl get pods -n smm-system
    

    Expected output:

    NAME                                               READY   STATUS    RESTARTS      AGE
    istio-operator-v113x-7fd87bcd79-7g6wz              2/2     Running   0             26m
    istio-operator-v115x-657f9f58b8-mg8pw              2/2     Running   0             26m
    mesh-manager-0                                     2/2     Running   0             26m
    prometheus-node-exporter-9hrpr                     1/1     Running   0             23m
    prometheus-node-exporter-h866t                     1/1     Running   0             23m
    prometheus-node-exporter-j9ljd                     1/1     Running   0             23m
    prometheus-node-exporter-qsc2r                     1/1     Running   0             23m
    prometheus-smm-prometheus-0                        4/4     Running   0             24m
    smm-77c6d4fd6-czdbg                                2/2     Running   0             25m
    smm-77c6d4fd6-vrzqp                                2/2     Running   0             25m
    smm-als-8698db887b-vr96g                           2/2     Running   0             25m
    smm-authentication-57b44b8d94-spcbh                2/2     Running   0             25m
    smm-federation-gateway-6698684fb9-4l5q9            2/2     Running   0             24m
    smm-federation-gateway-operator-5f7868448c-59d6l   2/2     Running   0             25m
    smm-grafana-5c5bf778fb-zlg4s                       3/3     Running   0             25m
    smm-health-5994fcb477-n6f8v                        2/2     Running   0             25m
    smm-health-api-5d49fd6c84-v2jml                    2/2     Running   0             25m
    smm-ingressgateway-988b74656-cgb9x                 1/1     Running   0             25m
    smm-kubestatemetrics-58ff74d48c-mwvz5              2/2     Running   0             25m
    smm-leo-6f4dfccdbc-wm5gb                           2/2     Running   0             25m
    smm-prometheus-operator-b5dd94cc-7bg5z             3/3     Running   1 (24m ago)   25m
    smm-sre-alert-exporter-759547d77f-jffr4            2/2     Running   0             25m
    smm-sre-api-84cb7974c5-vhjcf                       2/2     Running   0             25m
    smm-sre-controller-6c999f7dfc-9szqq                2/2     Running   0             25m
    smm-tracing-6b9f9cdd74-gj9j5                       2/2     Running   0             25m
    smm-vm-integration-5b66db6c9c-xtlcv                2/2     Running   0             25m
    smm-web-75994644d8-cm996                           3/3     Running   0             25m
    

Uninstalling the chart

To uninstall/delete the ControlPlane resource and the smm-operator release, complete the following steps.

  1. Run:

    kubectl delete controlplanes.smm.cisco.com smm
    
  2. Wait until all pods are deleted. This will take a couple of minutes. After all pods are deleted run:

    helm uninstall --namespace=smm-registry-access smm-operator
    
  3. Delete the following namespaces:

    kubectl delete namespaces cert-manager cluster-registry istio-system smm-registry-access smm-system
    
  4. Delete the Cluster CR

    kubectl delete clusters.clusterregistry.k8s.cisco.com <cluster-name>
    

Chart configuration

The following table lists the configurable parameters of the Service Mesh Manager chart and their default values.

Parameter Description Default
operator.image.repository Operator container image repository registry.eticloud.io/smm/smm-operator
operator.image.tag Operator container image tag Same as chart version
operator.image.pullPolicy Operator container image pull policy IfNotPresent
operator.resources CPU/Memory resource requests/limits (YAML) Memory: 256Mi, CPU: 200m
prometheusMetrics.enabled If true, use direct access for Prometheus metrics false
prometheusMetrics.authProxy.enabled If true, use auth proxy for Prometheus metrics true
prometheusMetrics.authProxy.image.repository Auth proxy container image repository gcr.io/kubebuilder/kube-rbac-proxy
prometheusMetrics.authProxy.image.tag Auth proxy container image tag v0.5.0
prometheusMetrics.authProxy.image.pullPolicy Auth proxy container image pull policy IfNotPresent
rbac.enabled Create rbac service account and roles true
rbac.psp.enabled Create pod security policy and binding false
ecr.enabled Should SMM Operator Chart handle the ECR login procedure true
ecr.accessKeyID Access Key ID to be used for ECR logins Empty
ecr.secretAccessKey Secret Access Key to be used for ECR logins Empty

2.3.5.2 - The ControlPlane Custom Resource

Service Mesh Manager installs the ControlPlane Custom Resource with the following default values.


To understand how Service Mesh Manager can be customized through its CRs, see Customize Installation.

2.3.6 - Install SMM - GitOps - single cluster

This guide details how to set up a GitOps environment for Service Mesh Manager using Argo CD. The same principles can be used for other tools as well.

CAUTION:

Do not push the secrets directly into the git repository, especially when it is a public repository. Argo CD provides solutions to keep secrets safe.

Architecture

The high level architecture for Argo CD with a single-cluster Service Mesh Manager consists of the following components:

  • A git repository that stores the various charts and manifests,
  • a management cluster that runs the Argo CD server, and
  • the Service Mesh Manager cluster managed by Argo CD.

Service Mesh Manager GitOps architecture Service Mesh Manager GitOps architecture

Prerequisites

To complete this procedure, you need:

  • A free registration for the Service Mesh Manager download page
  • A Kubernetes or OpenShift cluster to deploy Argo CD on (called management-cluster in the examples).
  • A Kubernetes or OpenShift cluster to deploy Service Mesh Manager on (called workload-cluster-1 in the examples).

CAUTION:

Supported providers and Kubernetes versions

The cluster must run a Kubernetes version that Service Mesh Manager supports: Kubernetes 1.21, 1.22, 1.23, 1.24.

Service Mesh Manager is tested and known to work on the following Kubernetes providers:

  • Amazon Elastic Kubernetes Service (Amazon EKS)
  • Google Kubernetes Engine (GKE)
  • Azure Kubernetes Service (AKS)
  • Red Hat OpenShift 4.11
  • On-premises installation of stock Kubernetes with load balancer support (and optionally PVCs for persistence)

Calisti resource requirements

Make sure that your Kubernetes or OpenShift cluster has sufficient resources to install Calisti. The following table shows the number of resources needed on the cluster:

Resource Required
CPU - 32 vCPU in total
- 4 vCPU available for allocation per worker node (If you are testing on a cluster at a cloud provider, use nodes that have at least 4 CPUs, for example, c5.xlarge on AWS.)
Memory - 64 GiB in total
- 4 GiB available for allocation per worker node for the Kubernetes cluster (8 GiB in case of the OpenShift cluster)
Storage 12 GB of ephemeral storage on the Kubernetes worker nodes (for Traces and Metrics)

These minimum requirements need to be available for allocation within your cluster, in addition to the requirements of any other loads running in your cluster (for example, DaemonSets and Kubernetes node-agents). If Kubernetes cannot allocate sufficient resources to Service Mesh Manager, some pods will remain in Pending state, and Service Mesh Manager will not function properly.

Enabling additional features, such as High Availability increases this value.

The default installation, when enough headroom is available in the cluster, should be able to support at least 150 running Pods with the same amount of Services. For setting up Service Mesh Manager for bigger workloads, see scaling Service Mesh Manager.

Procedure overview

The high-level steps of the procedure are:

  1. Install Argo CD and register the clusters
  2. Prepare the Git repository
  3. Deploy Service Mesh Manager

Install Argo CD

Complete the following steps to install Argo CD on the management cluster.

Set up the environment

  1. Set the KUBECONFIG location and context name for the management-cluster cluster.

    MANAGEMENT_CLUSTER_KUBECONFIG=management_cluster_kubeconfig.yaml
    MANAGEMENT_CLUSTER_CONTEXT=management-cluster
    kubectl config --kubeconfig "${MANAGEMENT_CLUSTER_KUBECONFIG}" get-contexts "${MANAGEMENT_CLUSTER_CONTEXT}"
    

    Expected output:

    CURRENT   NAME                 CLUSTER              AUTHINFO   NAMESPACE
    *         management-cluster   management-cluster
    
  2. Set the KUBECONFIG location and context name for the workload-cluster-1 cluster.

    WORKLOAD_CLUSTER_1_KUBECONFIG=workload_cluster_1_kubeconfig.yaml
    WORKLOAD_CLUSTER_1_CONTEXT=workload-cluster-1
    kubectl config --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" get-contexts "${WORKLOAD_CLUSTER_1_CONTEXT}"
    

    Expected output:

    CURRENT   NAME                 CLUSTER              AUTHINFO                                          NAMESPACE
    *         workload-cluster-1   workload-cluster-1
    

    Repeat this step for any additional workload clusters you want to use.

  3. Add the cluster configurations to KUBECONFIG. Include any additional workload clusters you want to use.

    KUBECONFIG=$KUBECONFIG:$MANAGEMENT_CLUSTER_KUBECONFIG:$WORKLOAD_CLUSTER_1_KUBECONFIG
    
  4. Make sure the management-cluster Kubernetes context is the current context.

    kubectl config use-context "${MANAGEMENT_CLUSTER_CONTEXT}"
    

    Expected output:

    Switched to context "management-cluster".
    

Install Argo CD Server

  1. Create the argocd namespace.

    kubectl create namespace argocd
    

    Expected output:

    namespace/argocd created
    
  2. On OpenShift: Run the following command to grant the service accounts access to the argocd namespace.

    oc adm policy add-scc-to-group privileged system:serviceaccounts:argocd
    

    Expected output:

    clusterrole.rbac.authorization.k8s.io/system:openshift:scc:privileged added: "system:serviceaccounts:argocd"
    
  3. Deploy Argo CD.

    kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
    
    customresourcedefinition.apiextensions.k8s.io/applications.argoproj.io created
    customresourcedefinition.apiextensions.k8s.io/applicationsets.argoproj.io created
    customresourcedefinition.apiextensions.k8s.io/appprojects.argoproj.io created
    serviceaccount/argocd-application-controller created
    serviceaccount/argocd-applicationset-controller created
    serviceaccount/argocd-dex-server created
    serviceaccount/argocd-notifications-controller created
    serviceaccount/argocd-redis created
    serviceaccount/argocd-repo-server created
    serviceaccount/argocd-server created
    role.rbac.authorization.k8s.io/argocd-application-controller created
    role.rbac.authorization.k8s.io/argocd-applicationset-controller created
    role.rbac.authorization.k8s.io/argocd-dex-server created
    role.rbac.authorization.k8s.io/argocd-notifications-controller created
    role.rbac.authorization.k8s.io/argocd-server created
    clusterrole.rbac.authorization.k8s.io/argocd-application-controller created
    clusterrole.rbac.authorization.k8s.io/argocd-server created
    rolebinding.rbac.authorization.k8s.io/argocd-application-controller created
    rolebinding.rbac.authorization.k8s.io/argocd-applicationset-controller created
    rolebinding.rbac.authorization.k8s.io/argocd-dex-server created
    rolebinding.rbac.authorization.k8s.io/argocd-notifications-controller created
    rolebinding.rbac.authorization.k8s.io/argocd-redis created
    rolebinding.rbac.authorization.k8s.io/argocd-server created
    clusterrolebinding.rbac.authorization.k8s.io/argocd-application-controller created
    clusterrolebinding.rbac.authorization.k8s.io/argocd-server created
    configmap/argocd-cm created
    configmap/argocd-cmd-params-cm created
    configmap/argocd-gpg-keys-cm created
    configmap/argocd-notifications-cm created
    configmap/argocd-rbac-cm created
    configmap/argocd-ssh-known-hosts-cm created
    configmap/argocd-tls-certs-cm created
    secret/argocd-notifications-secret created
    secret/argocd-secret created
    service/argocd-applicationset-controller created
    service/argocd-dex-server created
    service/argocd-metrics created
    service/argocd-notifications-controller-metrics created
    service/argocd-redis created
    service/argocd-repo-server created
    service/argocd-server created
    service/argocd-server-metrics created
    deployment.apps/argocd-applicationset-controller created
    deployment.apps/argocd-dex-server created
    deployment.apps/argocd-notifications-controller created
    deployment.apps/argocd-redis created
    deployment.apps/argocd-repo-server created
    deployment.apps/argocd-server created
    statefulset.apps/argocd-application-controller created
    networkpolicy.networking.k8s.io/argocd-application-controller-network-policy created
    networkpolicy.networking.k8s.io/argocd-applicationset-controller-network-policy created
    networkpolicy.networking.k8s.io/argocd-dex-server-network-policy created
    networkpolicy.networking.k8s.io/argocd-notifications-controller-network-policy created
    networkpolicy.networking.k8s.io/argocd-redis-network-policy created
    networkpolicy.networking.k8s.io/argocd-repo-server-network-policy created
    networkpolicy.networking.k8s.io/argocd-server-network-policy created
    
  4. Wait until the installation is complete, then check that the Argo CD pods are up and running.

    kubectl get pods -n argocd
    

    The output should be similar to:

    NAME                                                    READY   STATUS    RESTARTS   AGE
    pod/argocd-application-controller-0                     1/1     Running   0          7h59m
    pod/argocd-applicationset-controller-78b8b554f9-pgwbl   1/1     Running   0          7h59m
    pod/argocd-dex-server-6bbc85c688-8p7zf                  1/1     Running   0          16h
    pod/argocd-notifications-controller-75847756c5-dbbm5    1/1     Running   0          16h
    pod/argocd-redis-f4cdbff57-wcpxh                        1/1     Running   0          7h59m
    pod/argocd-repo-server-d5c7f7ffb-c8962                  1/1     Running   0          7h59m
    pod/argocd-server-76497676b-pnvf4                       1/1     Running   0          7h59m
    
  5. For the Argo CD UI, set the argocd-server service type to LoadBalancer.

    kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "LoadBalancer"}}'
    

    Expected output:

    service/argocd-server patched
    
  6. Patch the App of Apps health check in Argo CD configuration to ignore diffs of controller/operator managed fields. For details about this patch, see the Argo CD documentation sections Resource Health and Diffing Customization.

    Apply the new Argo CD health check configurations:

    kubectl apply -f - <<EOF
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: argocd-cm
      namespace: argocd
      labels:
        app.kubernetes.io/name: argocd-cm
        app.kubernetes.io/part-of: argocd
    data:
      # App of app health check
      resource.customizations.health.argoproj.io_Application: |
        hs = {}
        hs.status = "Progressing"
        hs.message = ""
        if obj.status ~= nil then
          if obj.status.health ~= nil then
            hs.status = obj.status.health.status
            if obj.status.health.message ~= nil then
              hs.message = obj.status.health.message
            end
          end
        end
        return hs
      # Ignoring RBAC changes made by AggregateRoles
      resource.compareoptions: |
        # disables status field diffing in specified resource types
        ignoreAggregatedRoles: true
    
        # disables status field diffing in specified resource types
        # 'crd' - CustomResourceDefinition-s (default)
        # 'all' - all resources
        # 'none' - disabled
        ignoreResourceStatusField: all
    EOF
    

    Expected output:

    configmap/argocd-cm configured
    
  7. Get the initial password for the admin user.

    kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo
    

    Expected output:

    argocd-admin-password
    
  8. Check the external-ip-or-hostname address of the argocd-server service.

    kubectl get service -n argocd argocd-server
    

    The output should be similar to:

    NAME                                      TYPE           CLUSTER-IP      EXTERNAL-IP               PORT(S)                      AGE
    argocd-server                             LoadBalancer   10.108.14.130   external-ip-or-hostname   80:31306/TCP,443:30063/TCP   7d13h
    
  9. Open the https://external-ip-or-hostname URL and log in to the Argo CD server using the password received in the previous step.

    # Exactly one of hostname or IP will be available and used for the remote URL.
    open https://$(kubectl get service -n argocd argocd-server -o jsonpath='{.status.loadBalancer.ingress[0].hostname}{.status.loadBalancer.ingress[0].ip}')
    

Install Argo CD CLI

  1. Install Argo CD CLI on your computer. For details, see the Argo CD documentation.

  2. Log in with the CLI:

    # Exactly one of hostname or IP will be available and used for the remote URL.
    argocd login $(kubectl get service -n argocd argocd-server -o jsonpath='{.status.loadBalancer.ingress[0].hostname}{.status.loadBalancer.ingress[0].ip}') --insecure --username admin --password $(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d)
    

    Expected output:

    'admin:login' logged in successfully
    

For more details about Argo CD installation, see the Argo CD getting started guide.

Register clusters

  1. Register the clusters that will run Service Mesh Manager in Argo CD. In this example, register workload-cluster-1 using one of the following methods.

    • Register the cluster from the command line by running:

      argocd cluster add --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" "${WORKLOAD_CLUSTER_1_CONTEXT}"
      

      Expected output:

      WARNING: This will create a service account `argocd-manager` on the cluster referenced by context `workload-cluster-1` with full cluster level privileges. Do you want to continue [y/N]? y
      INFO[0005] ServiceAccount "argocd-manager" created in namespace "kube-system"
      INFO[0005] ClusterRole "argocd-manager-role" created
      INFO[0005] ClusterRoleBinding "argocd-manager-role-binding" created
      INFO[0011] Created bearer token secret for ServiceAccount "argocd-manager"
      Cluster 'https://workload-cluster-1-ip-or-hostname' added
      
    • Alternatively, you can register clusters declaratively as Kubernetes secrets. Modify the following command for your environment and apply it. For details, see the Argo CD documentation.

      WORKLOAD_CLUSTER_1_IP="https://workload-cluster-1-IP" ARGOCD_BEARER_TOKEN="authentication-token" ARGOCD_CA_B64="base64 encoded certificate" ; kubectl apply -f - <<EOF
      apiVersion: v1
      kind: Secret
      metadata:
        name: workload-cluster-1-secret
        labels:
          argocd.argoproj.io/secret-type: cluster
      type: Opaque
      stringData:
        name: workload-cluster-1
        server: "${WORKLOAD_CLUSTER_1_IP}"
        config: |
          {
            "bearerToken": "${ARGOCD_BEARER_TOKEN}",
            "tlsClientConfig": {
              "insecure": false,
              "caData": "${ARGOCD_CA_B64}"
            }
          }
      EOF
      
  2. Make sure that the cluster is registered in Argo CD by running the following command:

    argocd cluster list
    

    The output should be similar to:

    SERVER                                      NAME                VERSION  STATUS   MESSAGE                                                  PROJECT
    https://kubernetes.default.svc              in-cluster                   Unknown  Cluster has no applications and is not being monitored.
    https://workload-cluster-1-ip-or-hostname   workload-cluster-1           Unknown  Cluster has no applications and is not being monitored.
    

Prepare Git repository

  1. Create an empty repository called calisti-gitops on GitHub (or another provider that Argo CD supports) and initialize it with a README.md file so that you can clone the repository. Because Service Mesh Manager credentials will be stored in this repository, make it a private repository.

    GITHUB_ID="github-id"
    GITHUB_REPOSITORY_NAME="calisti-gitops"
    
  2. Obtain a personal access token to the repository (on GitHub, see Creating a personal access token), that has the following permissions:

    • admin:org_hook
    • admin:repo_hook
    • read:org
    • read:public_key
    • repo
  3. Log in with your personal access token with git.

    export GH_TOKEN="github-personal-access-token" # Note: this environment variable needs to be exported so the `git` binary is going to use it automatically for authentication.
    
  4. Clone the repository into your local workspace.

    git clone "https://github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git"
    

    Expected output:

    Cloning into 'calisti-gitops'...
    remote: Enumerating objects: 144, done.
    remote: Counting objects: 100% (144/144), done.
    remote: Compressing objects: 100% (93/93), done.
    remote: Total 144 (delta 53), reused 135 (delta 47), pack-reused 0
    Receiving objects: 100% (144/144), 320.08 KiB | 746.00 KiB/s, done.
    Resolving deltas: 100% (53/53), done.
    
  5. Add the repository to Argo CD by running the following command. Alternatively, you can add it on Argo CD Web UI.

    argocd repo add "https://github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git" --name "${GITHUB_REPOSITORY_NAME}" --username "${GITHUB_ID}" --password "${GH_TOKEN}"
    

    Expected output:

    Repository 'https://github.com/github-id/calisti-gitops.git' added
    
  6. Verify that the repository is connected by running:

    argocd repo list
    

    In the output, Status should be Successful:

    TYPE  NAME            REPO                                             INSECURE  OCI    LFS    CREDS  STATUS      MESSAGE  PROJECT
    git   calisti-gitops  https://github.com/github-id/calisti-gitops.git  false     false  false  true   Successful
    
  7. Change into the root directory of the cloned repository and create the following directories.

    cd "${GITHUB_REPOSITORY_NAME}"
    
    mkdir -p apps/demo-app apps/smm-controlplane apps/smm-operator charts demo-app manifests
    

    The final structure of the repository will look like this:

    .
    ├── apps
    │   ├── demo-app
    │   │   └── demo-app.yaml
    │   ├── smm-controlplane
    │   │   └── smm-controlplane.yaml
    │   └── smm-operator
    │       └── smm-operator.yaml
    ├── charts
    │   └── smm-operator
    │       └── ...
    ├── demo-app
    │   ├── demo-app-ns.yaml
    │   └── demo-app.yaml
    └── manifests
        ├── cert-manager-namespace.yaml
        └── smm-controlplane.yaml
    
    • The apps folder contains the Argo CD Application of the smm-operator, the smm-controlplane, and the demo-app.
    • The charts folder contains the Helm chart of the smm-operator.
    • The demo-app folder contains the manifest files of the demo application that represents your business application.
    • The manifests folder contains the smm-controlplane file and the cert-manager namespace file.

Prepare the helm charts

  1. You need an active Service Mesh Manager registration to download the Service Mesh Manager charts and images. You can sign up for free, or obtain Enterprise credentials on the official Cisco Service Mesh Manager page. After registration, you can obtain your username and password from the Download Center. Set them as environment variables.

    CALISTI_USERNAME="<your-calisti-username>"
    
    CALISTI_PASSWORD="<your-calisti-password>"
    
  2. Download the smm-operator chart from registry.eticloud.io into the charts directory of your Service Mesh Manager GitOps repository and extract it. Run the following commands:

    export HELM_EXPERIMENTAL_OCI=1 # Needed prior to Helm version 3.8.0
    
    echo "${CALISTI_PASSWORD}" | helm registry login registry.eticloud.io -u "${CALISTI_USERNAME}" --password-stdin
    

    Expected output:

    Login Succeeded
    
    helm pull oci://registry.eticloud.io/smm-charts/smm-operator --destination ./charts/ --untar --version 1.12.0
    

    Expected output:

    Pulled: registry.eticloud.io/smm-charts/smm-operator:latest-stable-version
    Digest: sha256:someshadigest
    

Deploy Service Mesh Manager

Deploy the smm-operator application

Complete the following steps to deploy the smm-operator chart using Argo CD.

  1. Create an Argo CD Application CR for smm-operator.

    Before running the following command, edit it if needed:

    • If you are not using a GitHub repository, set the repoURL field to your repository.
    • For multi-cluster setups, the Kubernetes API server address of one cluster must be reachable from other clusters. The API server addresses are private for certain clusters (for example, OpenShift) and not reachable by default from other clusters. In such case, use the PUBLIC_API_SERVER_ENDPOINT_ADDRESS variable to provide an address that’s reachable from the other clusters. This can be a public address, or one that’s routable from the other clusters.

    ARGOCD_CLUSTER_NAME="${WORKLOAD_CLUSTER_1_CONTEXT}" PUBLIC_API_SERVER_ENDPOINT_ADDRESS="" ; cat > "apps/smm-operator/smm-operator-app.yaml" <<EOF
    # apps/smm-operator/smm-operator-app.yaml
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: smm-operator
      namespace: argocd
      finalizers:
        - resources-finalizer.argocd.argoproj.io
    spec:
      project: default
      source:
        repoURL: https://github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git
        targetRevision: HEAD
        path: charts/smm-operator
        helm:
          parameters:
          - name: "global.ecr.enabled"
            value: 'false'
          - name: "global.basicAuth.username"
            value: "${CALISTI_USERNAME}"
          - name: "global.basicAuth.password"
            value: "${CALISTI_PASSWORD}"
          - name: "apiServerEndpointAddress"
            value: "${PUBLIC_API_SERVER_ENDPOINT_ADDRESS}" # The publicly accessible address of the k8s api server. Some Cloud providers have different API Server endpoint for internal and for public access. In that case the public endpoint needs to be specified here.
      destination:
        name: ${ARGOCD_CLUSTER_NAME}
        namespace: smm-registry-access
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - Validate=false
          - PruneLast=true
          - CreateNamespace=true
          - Replace=true
    EOF
    
  2. Commit and push the calisti-gitops repository.

    git add apps/smm-operator charts/smm-operator
    git commit -m "add smm-operator app"
    
    [main e6c4b4a] add smm-operator app
    36 files changed, 80324 insertions(+)
    create mode 100644 apps/smm-operator/smm-operator-app.yaml
    create mode 100644 charts/smm-operator/.helmignore
    create mode 100644 charts/smm-operator/Chart.yaml
    create mode 100644 charts/smm-operator/README.md
    create mode 100644 charts/smm-operator/crds/clusterfeature-crd.yaml
    create mode 100644 charts/smm-operator/crds/clusters-crd.yaml
    create mode 100644 charts/smm-operator/crds/crd-alertmanagerconfigs.yaml
    create mode 100644 charts/smm-operator/crds/crd-alertmanagers.yaml
    create mode 100644 charts/smm-operator/crds/crd-podmonitors.yaml
    create mode 100644 charts/smm-operator/crds/crd-probes.yaml
    create mode 100644 charts/smm-operator/crds/crd-prometheuses.yaml
    create mode 100644 charts/smm-operator/crds/crd-prometheusrules.yaml
    create mode 100644 charts/smm-operator/crds/crd-servicemonitors.yaml
    create mode 100644 charts/smm-operator/crds/crd-thanosrulers.yaml
    create mode 100644 charts/smm-operator/crds/crds.yaml
    create mode 100644 charts/smm-operator/crds/health.yaml
    create mode 100644 charts/smm-operator/crds/istio-operator-v1-crds.yaml
    create mode 100644 charts/smm-operator/crds/istio-operator-v2-crds.gen.yaml
    create mode 100644 charts/smm-operator/crds/istiooperator-crd.yaml
    create mode 100644 charts/smm-operator/crds/koperator-crds.yaml
    create mode 100644 charts/smm-operator/crds/metadata-crd.yaml
    create mode 100644 charts/smm-operator/crds/resourcesyncrules-crd.yaml
    create mode 100644 charts/smm-operator/crds/sre.yaml
    create mode 100644 charts/smm-operator/templates/_helpers.tpl
    create mode 100644 charts/smm-operator/templates/authproxy-rbac.yaml
    create mode 100644 charts/smm-operator/templates/authproxy-service.yaml
    create mode 100644 charts/smm-operator/templates/cert-manager-namespace.yaml
    create mode 100644 charts/smm-operator/templates/ecr.deployment.yaml
    create mode 100644 charts/smm-operator/templates/ecr.secret.yaml
    create mode 100644 charts/smm-operator/templates/ecr.service-account.yaml
    create mode 100644 charts/smm-operator/templates/namespace.yaml
    create mode 100644 charts/smm-operator/templates/operator-psp-basic.yaml
    create mode 100644 charts/smm-operator/templates/operator-rbac.yaml
    create mode 100644 charts/smm-operator/templates/operator-service.yaml
    create mode 100644 charts/smm-operator/templates/operator-statefulset.yaml
    create mode 100644 charts/smm-operator/values.yaml
    
    git push
    

    Expected output:

    Enumerating objects: 48, done.
    Counting objects: 100% (48/48), done.
    Delta compression using up to 12 threads
    Compressing objects: 100% (44/44), done.
    Writing objects: 100% (47/47), 282.18 KiB | 1.99 MiB/s, done.
    Total 47 (delta 20), reused 0 (delta 0), pack-reused 0
    remote: Resolving deltas: 100% (20/20), done.
    To github.com:pregnor/calisti-gitops.git
    + 8dd47c2...db9e7af main -> main (forced update)
    
  3. Apply the Application manifest.

    kubectl apply -f "apps/smm-operator/smm-operator-app.yaml"
    

    Expected output:

    application.argoproj.io/smm-operator created
    
  4. Verify that the applications have been added to Argo CD and are healthy.

    argocd app list
    

    Expected output:

    NAME          CLUSTER             NAMESPACE            PROJECT  STATUS  HEALTH   SYNCPOLICY  CONDITIONS  REPO                                             PATH                 TARGET
    smm-operator  workload-cluster-1  smm-registry-access  default  Synced  Healthy  Auto-Prune  <none>      https://github.com/github-id/calisti-gitops.git  charts/smm-operator  HEAD
    
  5. Check the smm-operator application on the Argo CD Web UI.

    SMM Operator SMM Operator

Deploy the smm-controlplane application

  1. Create the following namespace for the Service Mesh Manager ControlPlane.

    cat > manifests/cert-manager-namespace.yaml <<EOF
    # manifests/cert-manager-namespace.yaml
    apiVersion: v1
    kind: Namespace
    metadata:
      annotations:
        argocd.argoproj.io/sync-wave: "1"
      name: cert-manager
    EOF
    
  2. Create the smm-controlplane CR for the ControlPlane. For OpenShift installations, add platform: openshift to the spec section.

    ARGOCD_CLUSTER_NAME="${WORKLOAD_CLUSTER_1_CONTEXT}" ISTIO_MINOR_VERSION="1.15" ; cat > "manifests/smm-controlplane.yaml" <<EOF
    # manifests/smm-controlplane.yaml
    apiVersion: smm.cisco.com/v1alpha1
    kind: ControlPlane
    metadata:
      annotations:
        argocd.argoproj.io/sync-wave: "10"
      name: smm
    spec:
      # platform: openshift # Uncomment for OpenShift installations
      certManager:
        enabled: true
        namespace: cert-manager
      clusterName: ${ARGOCD_CLUSTER_NAME}
      clusterRegistry:
        enabled: true
        namespace: cluster-registry
      log: {}
      meshManager:
        enabled: true
        istio:
          enabled: true
          istioCRRef:
            name: cp-v${ISTIO_MINOR_VERSION/.}x
            namespace: istio-system
          operators:
            namespace: smm-system
        namespace: smm-system
      nodeExporter:
        enabled: true
        namespace: smm-system
        psp:
          enabled: false
        rbac:
          enabled: true
      oneEye: {}
      registryAccess:
        enabled: true
        imagePullSecretsController: {}
        namespace: smm-registry-access
        pullSecrets:
        - name: smm-registry.eticloud.io-pull-secret
          namespace: smm-registry-access
      repositoryOverride:
        host: registry.eticloud.io
        prefix: smm
      role: active
      smm:
        exposeDashboard:
          meshGateway:
            enabled: true
        als:
          enabled: true
          log: {}
        application:
          enabled: true
          log: {}
        auth:
          forceUnsecureCookies: true
          mode: anonymous
        certManager:
          enabled: true
        enabled: true
        federationGateway:
          enabled: true
          name: smm
          service:
            enabled: true
            name: smm-federation-gateway
            port: 80
        federationGatewayOperator:
          enabled: true
        impersonation:
          enabled: true
        istio:
          revision: cp-v${ISTIO_MINOR_VERSION/.}x.istio-system
        leo:
          enabled: true
          log: {}
        log: {}
        namespace: smm-system
        prometheus:
          enabled: true
          replicas: 1
        prometheusOperator: {}
        releaseName: smm
        role: active
        sre:
          enabled: true
        useIstioResources: true
    EOF
    
  3. Create the Argo CD Application CR for the smm-controlplane.

    ARGOCD_CLUSTER_NAME="${WORKLOAD_CLUSTER_1_CONTEXT}" ; cat > "apps/smm-controlplane/smm-controlplane-app.yaml" <<EOF
    # apps/smm-controlplane/smm-controlplane-app.yaml
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: smm-controlplane
      namespace: argocd
      finalizers:
        - resources-finalizer.argocd.argoproj.io
    spec:
      project: default
      source:
        repoURL: https://github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git
        targetRevision: HEAD
        path: manifests
      destination:
        name: ${ARGOCD_CLUSTER_NAME}
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
        - Validate=false
        - CreateNamespace=true
        - PrunePropagationPolicy=foreground
        - PruneLast=true
        - Replace=true
    EOF
    
  4. Commit the changes and push the calisti-gitops repository.

    git add apps/smm-controlplane manifests
    git commit -m "add smm-controlplane app"
    

    Expected output:

    [main 25ba7e8] add smm-controlplane app
    3 files changed, 212 insertions(+)
    create mode 100644 apps/smm-controlplane/smm-controlplane-app.yaml
    create mode 100644 manifests/cert-manager-namespace.yaml
    create mode 100644 manifests/smm-controlplane.yaml
    
    git push
    

    Expected output:

    Enumerating objects: 12, done.
    Counting objects: 100% (12/12), done.
    Delta compression using up to 10 threads
    Compressing objects: 100% (10/10), done.
    Writing objects: 100% (10/10), 2.70 KiB | 2.70 MiB/s, done.
    Total 10 (delta 1), reused 0 (delta 0), pack-reused 0
    remote: Resolving deltas: 100% (1/1), done.
    To github.com:<username>/calisti-gitops.git
      529545a..25ba7e8  main -> main
    
  5. Apply the Application manifest.

    kubectl apply -f "apps/smm-controlplane/smm-controlplane-app.yaml"
    

    Expected output:

    application.argoproj.io/smm-controlplane created
    
  6. Verify that the application has been added to Argo CD and is healthy.

    argocd app list
    

    Expected output:

    NAME              CLUSTER             NAMESPACE            PROJECT  STATUS     HEALTH   SYNCPOLICY  CONDITIONS  REPO                                             PATH                 TARGET
    smm-controlplane  workload-cluster-1                       default  Synced     Healthy  Auto-Prune  <none>      https://github.com/github-id/calisti-gitops.git  manifests            HEAD
    smm-operator      workload-cluster-1  smm-registry-access  default  Synced     Healthy  Auto-Prune  <none>      https://github.com/github-id/calisti-gitops.git  charts/smm-operator  HEAD
    
  7. Check that all pods are healthy and running in the smm-system namespace of workload-cluster-1.

    kubectl get pods -n smm-system --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}"
    
    NAME                                               READY   STATUS    RESTARTS        AGE
    istio-operator-v115x-85495cd76f-q7n22              2/2     Running   4 (7m36s ago)   17m
    mesh-manager-0                                     2/2     Running   4 (7m35s ago)   18m
    prometheus-smm-prometheus-0                        4/4     Running   0               15m
    smm-7f95479ff7-rzh2g                               2/2     Running   0               16m
    smm-7f95479ff7-v52vp                               2/2     Running   0               16m
    smm-als-8487fdf4f7-ddklg                           2/2     Running   0               16m
    smm-authentication-7888dfc6d7-w7tdq                2/2     Running   0               16m
    smm-federation-gateway-84f9fbf54d-7glvp            2/2     Running   0               16m
    smm-federation-gateway-operator-6cb99c5798-9fj25   2/2     Running   4 (7m36s ago)   16m
    smm-grafana-95ff96dd9-m6rx6                        3/3     Running   0               16m
    smm-health-86dc8c98d6-pv7bk                        2/2     Running   3 (7m35s ago)   16m
    smm-health-api-5df5b76bf5-lvbsp                    2/2     Running   0               16m
    smm-ingressgateway-7d59684cf7-jsj7f                1/1     Running   0               16m
    smm-ingressgateway-external-59f9874787-p55wr       1/1     Running   0               16m
    smm-kubestatemetrics-f4766d7b8-9mc9f               2/2     Running   0               16m
    smm-leo-9fc8db6db-vlzpw                            2/2     Running   0               16m
    smm-prometheus-operator-6558dbddc8-bgdh9           3/3     Running   1 (16m ago)     16m
    smm-sre-alert-exporter-6656f98dd8-8wvdx            2/2     Running   0               16m
    smm-sre-api-77b65ff6bd-spzk2                       2/2     Running   0               16m
    smm-sre-controller-59d6cdd588-7cvbk                2/2     Running   3 (7m35s ago)   16m
    smm-tracing-6c85986bfd-xjjqw                       2/2     Running   0               16m
    smm-vm-integration-cdd8d8688-sk79s                 2/2     Running   3 (7m35s ago)   16m
    smm-web-84d697fdb4-2fbkm                           3/3     Running   0               16m
    
  8. Check the application on Argo CD Web UI.

    # Exactly one of hostname or IP will be available and used for the remote URL.
    open https://$(kubectl get service -n argocd argocd-server -o jsonpath='{.status.loadBalancer.ingress[0].hostname}{.status.loadBalancer.ingress[0].ip}')
    

    Argo CD Web UI Argo CD Web UI

At this point, you have successfully installed smm-operator and smm-controlplane on workload-cluster-1.

Deploy an application

If you want to deploy an application into the service mesh, complete the following steps. The examples use the Service Mesh Manager demo application.

  1. Create a namespace for the application: create the demo-app-ns.yaml file.

    cat > demo-app/demo-app-ns.yaml << EOF
    apiVersion: v1
    kind: Namespace
    metadata:
      labels:
        app.kubernetes.io/instance: smm-demo
        app.kubernetes.io/name: smm-demo
        app.kubernetes.io/part-of: smm-demo
        app.kubernetes.io/version: 0.1.4
        istio.io/rev: cp-v115x.istio-system
      name: smm-demo
    EOF
    
  2. Create a manifest for Network Attachment Definitions.

    cat > demo-app/smm-demo-nad.yaml << EOF
    apiVersion: k8s.cni.cncf.io/v1
    kind: NetworkAttachmentDefinition
    metadata:
      name: istio-cni-cp-v${ISTIO_MINOR_VERSION/.}x-istio-system
      namespace: smm-demo
      annotations:
        argocd.argoproj.io/sync-wave: "3"
    EOF
    
  3. Create the demo-app.yaml file.

    cat > demo-app/demo-app.yaml << EOF
    apiVersion: smm.cisco.com/v1alpha1
    kind: DemoApplication
    metadata:
      name: smm-demo
      namespace: smm-demo
    spec:
      autoscaling:
        enabled: true
      controlPlaneRef:
        name: smm
      deployIstioResources: true
      deploySLOResources: true
      enabled: true
      enabledComponents:
      - frontpage
      - catalog
      - bookings
      - postgresql
      - payments
      - notifications
      - movies
      - analytics
      - database
      - mysql
      istio:
        revision: cp-v115x.istio-system
      load:
        enabled: true
        maxRPS: 30
        minRPS: 10
        swingPeriod: 1380000000000
      replicas: 1
      resources:
        limits:
          cpu: "2"
          memory: 192Mi
        requests:
          cpu: 40m
          memory: 64Mi
    EOF
    
  4. Create an Argo CD Application file for the application. Create the demo-app.yaml file.

    ARGOCD_CLUSTER_NAME="${WORKLOAD_CLUSTER_1_CONTEXT}" ; cat > apps/demo-app/demo-app.yaml << EOF
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: demo-app
      namespace: argocd
      finalizers:
        - resources-finalizer.argocd.argoproj.io
    spec:
      project: default
      source:
        repoURL: https://github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git
        targetRevision: HEAD
        path: demo-app
      destination:
        name: ${ARGOCD_CLUSTER_NAME}
        namespace: smm-demo
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
        - Validate=false
        - CreateNamespace=true
        - PruneLast=true
        - Replace=true
    EOF
    
  5. Commit and push the calisti-gitops repository.

    git add apps/demo-app demo-app
    git commit -m "add demo app"
    

    Expected output:

    [main 58a236e] add demo app
    3 files changed, 74 insertions(+)
    create mode 100644 apps/demo-app/demo-app.yaml
    create mode 100644 demo-app/demo-app-ns.yaml
    create mode 100644 demo-app/demo-app.yaml
    
    git push
    

    Expected output:

    Enumerating objects: 10, done.
    Counting objects: 100% (10/10), done.
    Delta compression using up to 10 threads
    Compressing objects: 100% (7/7), done.
    Writing objects: 100% (8/8), 1.37 KiB | 1.37 MiB/s, done.
    Total 8 (delta 0), reused 0 (delta 0), pack-reused 0
    To github.com:<username>/calisti-gitops.git
      e16549e..58a236e  main -> main
    
  6. Deploy the application.

    kubectl apply -f apps/demo-app/demo-app.yaml
    
  7. Wait until all the pods in the application namespace (smm-demo) are up and running.

    kubectl get pods -n smm-demo --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}"
    

    Expected output:

    NAME                                READY   STATUS    RESTARTS   AGE
    analytics-v1-7899bd4d4-bnf24        2/2     Running   0          109s
    bombardier-6455fd74f6-jndpv         2/2     Running   0          109s
    bookings-v1-559768454c-7vhzr        2/2     Running   0          109s
    catalog-v1-99b7bb56d-fjvhl          2/2     Running   0          109s
    database-v1-5cb4b4ff67-95ttk        2/2     Running   0          109s
    frontpage-v1-5b4dcbfcb4-djr72       2/2     Running   0          108s
    movies-v1-78fcf666dc-z8c2z          2/2     Running   0          108s
    movies-v2-84d9f5658f-kc65j          2/2     Running   0          108s
    movies-v3-86bbbc9745-r84bl          2/2     Running   0          108s
    mysql-d6b6b78fd-b7dwb               2/2     Running   0          108s
    notifications-v1-794c5dd8f6-lndh4   2/2     Running   0          108s
    payments-v1-858d4b4ffc-vtxxl        2/2     Running   0          108s
    postgresql-555fd55bdb-jn5pq         2/2     Running   0          108s
    
  8. Verify that the application appears on the Argo CD admin view, it is Healthy, and Synced.

    SMM Operator Argo CD admin SMM Operator Argo CD admin

Access the Service Mesh Manager dashboard

  1. You can access the Service Mesh Manager dashboard via the smm-ingressgateway-external LoadBalancer external-ip-or-hostname address. Run the following command to retrieve the IP address:

    kubectl get services -n smm-system smm-ingressgateway-external --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}"
    

    Expected output:

    NAME                          TYPE           CLUSTER-IP   EXTERNAL-IP                PORT(S)        AGE
    smm-ingressgateway-external   LoadBalancer   10.0.0.199   external-ip-or-hostname    80:32505/TCP   2m28s
    
  2. Open the Service Mesh Manager dashboard using one of the following methods:

    • Open the http://<external-ip-or-hostname> URL in your browser.

    • Run the following command to open the dashboard with your default browser:

      # Exactly one of hostname or IP will be available and used for the remote URL.
      open http://$(kubectl get services -n smm-system smm-ingressgateway-external -o jsonpath='{.status.loadBalancer.ingress[0].hostname}{.status.loadBalancer.ingress[0].ip}' --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}")
      
    • If you have installed the Service Mesh Manager CLI on your machine, run the following command to open the Service Mesh Manager Dashboard in the default browser.

      smm dashboard --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}"
      

      Expected output:

      ✓ validate-kubeconfig ❯ checking cluster reachability...
      ✓ opening Service Mesh Manager at http://127.0.0.1:50500
      
  3. Check the deployments on the dashboard, for example, on the MENU > Overview, MENU > MESH, and MENU > TOPOLOGY pages.

Service Mesh Manager Overview Service Mesh Manager Overview

Service Mesh Manager Mesh Service Mesh Manager Mesh

Service Mesh Manager Topology Service Mesh Manager Topology

2.3.7 - Install SMM - GitOps - multi-cluster

This guide details how to set up a multi-cluster Service Mesh Manager scenario in a GitOps environment for Service Mesh Manager using Argo CD. The same principles can be used for other tools as well.

CAUTION:

Do not push the secrets directly into the git repository, especially when it is a public repository. Argo CD provides solutions to keep secrets safe.

Architecture

Service Mesh Manager supports multiple mesh topologies, so you can use the one that best fits your use cases. In multi-cluster configurations it provides automatic locality load-balancing.

The high level architecture for Argo CD with a multi-cluster Service Mesh Manager setup consists of the following components:

  • A git repository that stores the various charts and manifests,
  • a management cluster that runs the Argo CD server, and
  • the Service Mesh Manager clusters managed by Argo CD.

Multi-cluster GitOps architecture Multi-cluster GitOps architecture

Deployment models

When deploying Service Mesh Manager in a multi-cluster scenario you can deploy Service Mesh Manager in an active-passive model. For details on Service Mesh Manager clusters and their relationship to Istio clusters, see Istio clusters and SMM clusters.

Prerequisites

  • A free registration for the Service Mesh Manager download page
  • A Kubernetes cluster to deploy Argo CD on (called management-cluster in the examples).
  • Two Kubernetes clusters to deploy Service Mesh Manager on (called workload-cluster-1 and workload-cluster-2 in the examples).

CAUTION:

Supported providers and Kubernetes versions

The cluster must run a Kubernetes version that Service Mesh Manager supports: Kubernetes 1.21, 1.22, 1.23, 1.24.

Service Mesh Manager is tested and known to work on the following Kubernetes providers:

  • Amazon Elastic Kubernetes Service (Amazon EKS)
  • Google Kubernetes Engine (GKE)
  • Azure Kubernetes Service (AKS)
  • Red Hat OpenShift 4.11
  • On-premises installation of stock Kubernetes with load balancer support (and optionally PVCs for persistence)

Calisti resource requirements

Make sure that your Kubernetes or OpenShift cluster has sufficient resources to install Calisti. The following table shows the number of resources needed on the cluster:

Resource Required
CPU - 32 vCPU in total
- 4 vCPU available for allocation per worker node (If you are testing on a cluster at a cloud provider, use nodes that have at least 4 CPUs, for example, c5.xlarge on AWS.)
Memory - 64 GiB in total
- 4 GiB available for allocation per worker node for the Kubernetes cluster (8 GiB in case of the OpenShift cluster)
Storage 12 GB of ephemeral storage on the Kubernetes worker nodes (for Traces and Metrics)

These minimum requirements need to be available for allocation within your cluster, in addition to the requirements of any other loads running in your cluster (for example, DaemonSets and Kubernetes node-agents). If Kubernetes cannot allocate sufficient resources to Service Mesh Manager, some pods will remain in Pending state, and Service Mesh Manager will not function properly.

Enabling additional features, such as High Availability increases this value.

The default installation, when enough headroom is available in the cluster, should be able to support at least 150 running Pods with the same amount of Services. For setting up Service Mesh Manager for bigger workloads, see scaling Service Mesh Manager.

Install Argo CD

Complete the following steps to install Argo CD on the management cluster.

Set up the environment

  1. Set the KUBECONFIG location and context name for the management-cluster cluster.

    MANAGEMENT_CLUSTER_KUBECONFIG=management_cluster_kubeconfig.yaml
    MANAGEMENT_CLUSTER_CONTEXT=management-cluster
    kubectl config --kubeconfig "${MANAGEMENT_CLUSTER_KUBECONFIG}" get-contexts "${MANAGEMENT_CLUSTER_CONTEXT}"
    

    Expected output:

    CURRENT   NAME                 CLUSTER              AUTHINFO   NAMESPACE
    *         management-cluster   management-cluster
    
  2. Set the KUBECONFIG location and context name for the workload-cluster-1 cluster.

    WORKLOAD_CLUSTER_1_KUBECONFIG=workload_cluster_1_kubeconfig.yaml
    WORKLOAD_CLUSTER_1_CONTEXT=workload-cluster-1
    kubectl config --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" get-contexts "${WORKLOAD_CLUSTER_1_CONTEXT}"
    

    Expected output:

    CURRENT   NAME                 CLUSTER              AUTHINFO                                          NAMESPACE
    *         workload-cluster-1   workload-cluster-1
    

    Repeat this step for any additional workload clusters you want to use.

  3. Add the cluster configurations to KUBECONFIG. Include any additional workload clusters you want to use.

    KUBECONFIG=$KUBECONFIG:$MANAGEMENT_CLUSTER_KUBECONFIG:$WORKLOAD_CLUSTER_1_KUBECONFIG
    
  4. Make sure the management-cluster Kubernetes context is the current context.

    kubectl config use-context "${MANAGEMENT_CLUSTER_CONTEXT}"
    

    Expected output:

    Switched to context "management-cluster".
    

Install Argo CD Server

  1. Create the argocd namespace.

    kubectl create namespace argocd
    

    Expected output:

    namespace/argocd created
    
  2. On OpenShift: Run the following command to grant the service accounts access to the argocd namespace.

    oc adm policy add-scc-to-group privileged system:serviceaccounts:argocd
    

    Expected output:

    clusterrole.rbac.authorization.k8s.io/system:openshift:scc:privileged added: "system:serviceaccounts:argocd"
    
  3. Deploy Argo CD.

    kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
    
    customresourcedefinition.apiextensions.k8s.io/applications.argoproj.io created
    customresourcedefinition.apiextensions.k8s.io/applicationsets.argoproj.io created
    customresourcedefinition.apiextensions.k8s.io/appprojects.argoproj.io created
    serviceaccount/argocd-application-controller created
    serviceaccount/argocd-applicationset-controller created
    serviceaccount/argocd-dex-server created
    serviceaccount/argocd-notifications-controller created
    serviceaccount/argocd-redis created
    serviceaccount/argocd-repo-server created
    serviceaccount/argocd-server created
    role.rbac.authorization.k8s.io/argocd-application-controller created
    role.rbac.authorization.k8s.io/argocd-applicationset-controller created
    role.rbac.authorization.k8s.io/argocd-dex-server created
    role.rbac.authorization.k8s.io/argocd-notifications-controller created
    role.rbac.authorization.k8s.io/argocd-server created
    clusterrole.rbac.authorization.k8s.io/argocd-application-controller created
    clusterrole.rbac.authorization.k8s.io/argocd-server created
    rolebinding.rbac.authorization.k8s.io/argocd-application-controller created
    rolebinding.rbac.authorization.k8s.io/argocd-applicationset-controller created
    rolebinding.rbac.authorization.k8s.io/argocd-dex-server created
    rolebinding.rbac.authorization.k8s.io/argocd-notifications-controller created
    rolebinding.rbac.authorization.k8s.io/argocd-redis created
    rolebinding.rbac.authorization.k8s.io/argocd-server created
    clusterrolebinding.rbac.authorization.k8s.io/argocd-application-controller created
    clusterrolebinding.rbac.authorization.k8s.io/argocd-server created
    configmap/argocd-cm created
    configmap/argocd-cmd-params-cm created
    configmap/argocd-gpg-keys-cm created
    configmap/argocd-notifications-cm created
    configmap/argocd-rbac-cm created
    configmap/argocd-ssh-known-hosts-cm created
    configmap/argocd-tls-certs-cm created
    secret/argocd-notifications-secret created
    secret/argocd-secret created
    service/argocd-applicationset-controller created
    service/argocd-dex-server created
    service/argocd-metrics created
    service/argocd-notifications-controller-metrics created
    service/argocd-redis created
    service/argocd-repo-server created
    service/argocd-server created
    service/argocd-server-metrics created
    deployment.apps/argocd-applicationset-controller created
    deployment.apps/argocd-dex-server created
    deployment.apps/argocd-notifications-controller created
    deployment.apps/argocd-redis created
    deployment.apps/argocd-repo-server created
    deployment.apps/argocd-server created
    statefulset.apps/argocd-application-controller created
    networkpolicy.networking.k8s.io/argocd-application-controller-network-policy created
    networkpolicy.networking.k8s.io/argocd-applicationset-controller-network-policy created
    networkpolicy.networking.k8s.io/argocd-dex-server-network-policy created
    networkpolicy.networking.k8s.io/argocd-notifications-controller-network-policy created
    networkpolicy.networking.k8s.io/argocd-redis-network-policy created
    networkpolicy.networking.k8s.io/argocd-repo-server-network-policy created
    networkpolicy.networking.k8s.io/argocd-server-network-policy created
    
  4. Wait until the installation is complete, then check that the Argo CD pods are up and running.

    kubectl get pods -n argocd
    

    The output should be similar to:

    NAME                                                    READY   STATUS    RESTARTS   AGE
    pod/argocd-application-controller-0                     1/1     Running   0          7h59m
    pod/argocd-applicationset-controller-78b8b554f9-pgwbl   1/1     Running   0          7h59m
    pod/argocd-dex-server-6bbc85c688-8p7zf                  1/1     Running   0          16h
    pod/argocd-notifications-controller-75847756c5-dbbm5    1/1     Running   0          16h
    pod/argocd-redis-f4cdbff57-wcpxh                        1/1     Running   0          7h59m
    pod/argocd-repo-server-d5c7f7ffb-c8962                  1/1     Running   0          7h59m
    pod/argocd-server-76497676b-pnvf4                       1/1     Running   0          7h59m
    
  5. For the Argo CD UI, set the argocd-server service type to LoadBalancer.

    kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "LoadBalancer"}}'
    

    Expected output:

    service/argocd-server patched
    
  6. Patch the App of Apps health check in Argo CD configuration to ignore diffs of controller/operator managed fields. For details about this patch, see the Argo CD documentation sections Resource Health and Diffing Customization.

    Apply the new Argo CD health check configurations:

    kubectl apply -f - <<EOF
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: argocd-cm
      namespace: argocd
      labels:
        app.kubernetes.io/name: argocd-cm
        app.kubernetes.io/part-of: argocd
    data:
      # App of app health check
      resource.customizations.health.argoproj.io_Application: |
        hs = {}
        hs.status = "Progressing"
        hs.message = ""
        if obj.status ~= nil then
          if obj.status.health ~= nil then
            hs.status = obj.status.health.status
            if obj.status.health.message ~= nil then
              hs.message = obj.status.health.message
            end
          end
        end
        return hs
      # Ignoring RBAC changes made by AggregateRoles
      resource.compareoptions: |
        # disables status field diffing in specified resource types
        ignoreAggregatedRoles: true
    
        # disables status field diffing in specified resource types
        # 'crd' - CustomResourceDefinition-s (default)
        # 'all' - all resources
        # 'none' - disabled
        ignoreResourceStatusField: all
    EOF
    

    Expected output:

    configmap/argocd-cm configured
    
  7. Get the initial password for the admin user.

    kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo
    

    Expected output:

    argocd-admin-password
    
  8. Check the external-ip-or-hostname address of the argocd-server service.

    kubectl get service -n argocd argocd-server
    

    The output should be similar to:

    NAME                                      TYPE           CLUSTER-IP      EXTERNAL-IP               PORT(S)                      AGE
    argocd-server                             LoadBalancer   10.108.14.130   external-ip-or-hostname   80:31306/TCP,443:30063/TCP   7d13h
    
  9. Open the https://external-ip-or-hostname URL and log in to the Argo CD server using the password received in the previous step.

    # Exactly one of hostname or IP will be available and used for the remote URL.
    open https://$(kubectl get service -n argocd argocd-server -o jsonpath='{.status.loadBalancer.ingress[0].hostname}{.status.loadBalancer.ingress[0].ip}')
    

Install Argo CD CLI

  1. Install Argo CD CLI on your computer. For details, see the Argo CD documentation.

  2. Log in with the CLI:

    # Exactly one of hostname or IP will be available and used for the remote URL.
    argocd login $(kubectl get service -n argocd argocd-server -o jsonpath='{.status.loadBalancer.ingress[0].hostname}{.status.loadBalancer.ingress[0].ip}') --insecure --username admin --password $(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d)
    

    Expected output:

    'admin:login' logged in successfully
    

For more details about Argo CD installation, see the Argo CD getting started guide.

Register clusters

  1. Register the clusters that will run Service Mesh Manager in Argo CD. In this example, register workload-cluster-1 and workload-cluster-2 using one of the following methods.

    • Register the cluster from the command line by running:

      argocd cluster add --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" "${WORKLOAD_CLUSTER_1_CONTEXT}"
      

      Expected output:

      WARNING: This will create a service account `argocd-manager` on the cluster referenced by context `workload-cluster-1` with full cluster level privileges. Do you want to continue [y/N]? y
      INFO[0005] ServiceAccount "argocd-manager" created in namespace "kube-system"
      INFO[0005] ClusterRole "argocd-manager-role" created
      INFO[0005] ClusterRoleBinding "argocd-manager-role-binding" created
      INFO[0011] Created bearer token secret for ServiceAccount "argocd-manager"
      Cluster 'https://workload-cluster-1-ip-or-hostname' added
      
      argocd cluster add --kubeconfig "${WORKLOAD_CLUSTER_2_KUBECONFIG}" "${WORKLOAD_CLUSTER_2_CONTEXT}"
      

      Expected output:

      WARNING: This will create a service account `argocd-manager` on the cluster referenced by context `workload-cluster-2` with full cluster level privileges. Do you want to continue [y/N]? y
      INFO[0005] ServiceAccount "argocd-manager" created in namespace "kube-system"
      INFO[0005] ClusterRole "argocd-manager-role" created
      INFO[0005] ClusterRoleBinding "argocd-manager-role-binding" created
      INFO[0011] Created bearer token secret for ServiceAccount "argocd-manager"
      Cluster 'https://workload-cluster-2-ip-or-hostname' added
      
    • Alternatively, you can register clusters declaratively as Kubernetes secrets. Modify the following command for your environment and apply it. For details, see the Argo CD documentation.

      WORKLOAD_CLUSTER_1_IP="https://workload-cluster-1-IP" ARGOCD_BEARER_TOKEN="authentication-token" ARGOCD_CA_B64="base64 encoded certificate" ; kubectl apply -f - <<EOF
      apiVersion: v1
      kind: Secret
      metadata:
        name: workload-cluster-1-secret
        labels:
          argocd.argoproj.io/secret-type: cluster
      type: Opaque
      stringData:
        name: workload-cluster-1
        server: "${WORKLOAD_CLUSTER_1_IP}"
        config: |
          {
            "bearerToken": "${ARGOCD_BEARER_TOKEN}",
            "tlsClientConfig": {
              "insecure": false,
              "caData": "${ARGOCD_CA_B64}"
            }
          }
      EOF
      
      WORKLOAD_CLUSTER_2_IP="https://workload-cluster-2-IP" ARGOCD_BEARER_TOKEN="authentication-token" ARGOCD_CA_B64="base64 encoded certificate" ; kubectl apply -f - <<EOF
      apiVersion: v1
      kind: Secret
      metadata:
        name: workload-cluster-2-secret
        labels:
          argocd.argoproj.io/secret-type: cluster
      type: Opaque
      stringData:
        name: workload-cluster-2
        server: "${WORKLOAD_CLUSTER_2_IP}"
        config: |
          {
            "bearerToken": "${ARGOCD_BEARER_TOKEN}",
            "tlsClientConfig": {
              "insecure": false,
              "caData": "${ARGOCD_CA_B64}"
            }
          }
      EOF
      
  2. Make sure that the cluster is registered in Argo CD by running the following command:

    argocd cluster list
    

    The output should be similar to:

    SERVER                                      NAME                VERSION  STATUS   MESSAGE                                                  PROJECT
    https://kubernetes.default.svc              in-cluster                   Unknown  Cluster has no applications and is not being monitored.
    https://workload-cluster-1-ip-or-hostname   workload-cluster-1           Unknown  Cluster has no applications and is not being monitored.
    https://workload-cluster-2-ip-or-hostname   workload-cluster-2           Unknown  Cluster has no applications and is not being monitored.
    

Prepare Git repository

  1. Create an empty repository called calisti-gitops on GitHub (or another provider that Argo CD supports) and initialize it with a README.md file so that you can clone the repository. Because Service Mesh Manager credentials will be stored in this repository, make it a private repository.

    GITHUB_ID="github-id"
    GITHUB_REPOSITORY_NAME="calisti-gitops"
    
  2. Obtain a personal access token to the repository (on GitHub, see Creating a personal access token), that has the following permissions:

    • admin:org_hook
    • admin:repo_hook
    • read:org
    • read:public_key
    • repo
  3. Log in with your personal access token with git.

    export GH_TOKEN="github-personal-access-token" # Note: this environment variable needs to be exported so the `git` binary is going to use it automatically for authentication.
    
  4. Clone the repository into your local workspace, for example:

    git clone "https://${GITHUB_ID}:${GH_TOKEN}@github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git"
    

    Expected output:

    Cloning into 'calisti-gitops'...
    remote: Enumerating objects: 144, done.
    remote: Counting objects: 100% (144/144), done.
    remote: Compressing objects: 100% (93/93), done.
    remote: Total 144 (delta 53), reused 135 (delta 47), pack-reused 0
    Receiving objects: 100% (144/144), 320.08 KiB | 746.00 KiB/s, done.
    Resolving deltas: 100% (53/53), done.
    
  5. Add the repository to Argo CD by running the following command. Alternatively, you can add it on Argo CD Web UI.

    argocd repo add "https://github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git" --name "${GITHUB_REPOSITORY_NAME}" --username "${GITHUB_ID}" --password "${GH_TOKEN}"
    

    Expected output:

    Repository 'https://github.com/github-id/calisti-gitops.git' added
    
  6. Verify that the repository is connected by running:

    argocd repo list
    

    In the output, Status should be Successful:

    TYPE  NAME            REPO                                             INSECURE  OCI    LFS    CREDS  STATUS      MESSAGE  PROJECT
    git   calisti-gitops  https://github.com/github-id/calisti-gitops.git  false     false  false  true   Successful
    
  7. Change into the directory of the cloned repository (for example, calisti-gitops) and create the following directories.

    cd "${GITHUB_REPOSITORY_NAME}"
    
    mkdir -p apps/smm-controlplane apps/smm-operator apps/demo-app charts manifests/smm-controlplane/base manifests/smm-controlplane/overlays/workload-cluster-1 manifests/smm-controlplane/overlays/workload-cluster-2 manifests/demo-app/base manifests/demo-app/overlays/workload-cluster-1 manifests/demo-app/overlays/workload-cluster-2
    

    The final structure of the repository will look like this:

    .
    ├── README.md
    ├── apps
    │   ├── smm-controlplane
    │   │   └── app-set.yaml
    │   ├── smm-operator
    │   │   └── app-set.yaml
    │   └── demo-app
    │       └── app-set.yaml
    ├── charts
    │   └── smm-operator
    │       ├── Chart.yaml
    │       └── ...
    ├── export-secrets.sh
    └── manifests
       ├── smm-controlplane
       │   ├── base
       │   │   ├── control-plane.yaml
       │   │   ├── cert-manager-namespace.yaml
       │   │   ├── istio-system-namespace.yaml
       │   │   ├── istio-cp-v115x.yaml
       │   │   └── kustomization.yaml
       │   └── overlays
       │       ├── workload-cluster-1
       │       │   ├── control-plane.yaml
       │       │   ├── istio-cp-v115x.yaml
       │       │   └── kustomization.yaml
       │       └── workload-cluster-2
       │           ├── control-plane.yaml
       │           ├── istio-cp-v115x.yaml
       │           └── kustomization.yaml
       └── demo-app
           ├── base
           │   ├── demo-app-namespace.yaml
           │   ├── demo-app.yaml
           │   └── kustomization.yaml
           └── overlays
               ├── workload-cluster-1
               │   ├── demo-app.yaml
               │   └── kustomization.yaml
               └── workload-cluster-2
                   ├── demo-app.yaml
                   └── kustomization.yaml
    
    • The apps folder contains the Argo CD Application of the smm-operator, the smm-controlplane, and the demo-app.
    • The charts folder contains the Helm chart of the smm-operator.
    • The manifests/demo-app folder contains the manifest files of the demo application that represents your business application.
    • The manifests/smm-controlplane folder contains the manifest files of the SMM ControlPlane.

Prepare the helm charts

  1. You need an active Service Mesh Manager registration to download the Service Mesh Manager charts and images. You can sign up for free, or obtain Enterprise credentials on the official Cisco Service Mesh Manager page. After registration, you can obtain your username and password from the Download Center. Set them as environment variables.

    CALISTI_USERNAME="<your-calisti-username>"
    
    CALISTI_PASSWORD="<your-calisti-password>"
    
  2. Download the smm-operator chart from registry.eticloud.io into the charts directory of your Service Mesh Manager GitOps repository and extract it. Run the following commands:

    export HELM_EXPERIMENTAL_OCI=1 # Needed prior to Helm version 3.8.0
    
    echo "${CALISTI_PASSWORD}" | helm registry login registry.eticloud.io -u "${CALISTI_USERNAME}" --password-stdin
    

    Expected output:

    Login Succeeded
    
    helm pull oci://registry.eticloud.io/smm-charts/smm-operator --destination ./charts/ --untar --version 1.12.0
    

    Expected output:

    Pulled: registry.eticloud.io/smm-charts/smm-operator:latest-stable-version
    Digest: sha256:someshadigest
    

Deploy Service Mesh Manager

Deploy the smm-operator application set

Complete the following steps to deploy the smm-operator chart using Argo CD.

  1. Create the smm-operator’s Argo CD ApplicationSet CR. Argo CD ApplicationSet is perfect for deploying the same application on to different clusters. You can use list generators with cluster data in the ApplicationSet.

    Before running the following command, edit it if needed:

    • If you are not using a GitHub repository, set the repoURL field to your repository.
    • For multi-cluster setups, the Kubernetes API server address of one cluster must be reachable from other clusters. The API server addresses are private for certain clusters (for example, OpenShift) and not reachable by default from other clusters. In such case, use the PUBLIC_API_SERVER_ENDPOINT_ADDRESS variable to provide an address that’s reachable from the other clusters. This can be a public address, or one that’s routable from the other clusters.

    PUBLIC_API_SERVER_ENDPOINT_ADDRESS_1="" PUBLIC_API_SERVER_ENDPOINT_ADDRESS_2="" ;
    cat > "apps/smm-operator/app-set.yaml" <<EOF
    # apps/smm-operator/app-set.yaml
    apiVersion: argoproj.io/v1alpha1
    kind: ApplicationSet
    metadata:
      name: smm-operator-appset
      namespace: argocd
    spec:
      generators:
      - list:
          elements:
          - cluster: "${WORKLOAD_CLUSTER_1_CONTEXT}"
            apiServerEndpointAddress: "${PUBLIC_API_SERVER_ENDPOINT_ADDRESS_1}"
          - cluster: "${WORKLOAD_CLUSTER_2_CONTEXT}"
            apiServerEndpointAddress: "${PUBLIC_API_SERVER_ENDPOINT_ADDRESS_2}"
      template:
        metadata: 
          name: 'smm-operator-{{cluster}}'
          namespace: argocd
        spec:
          project: default
          source:
            repoURL: https://github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git
            targetRevision: HEAD
            path: charts/smm-operator
            helm:
              parameters:
              - name: "global.ecr.enabled"
                value: 'false'
              - name: "global.basicAuth.username"
                value: "${CALISTI_USERNAME}"
              - name: "global.basicAuth.password"
                value: "${CALISTI_PASSWORD}"
              - name: "apiServerEndpointAddress"
                value: '{{apiServerEndpointAddress}}'
          destination:
            namespace: smm-registry-access
            name: '{{cluster}}'
          ignoreDifferences:
          - kind: ValidatingWebhookConfiguration
            group: admissionregistration.k8s.io
            jsonPointers:
            - /webhooks
          syncPolicy:
            automated:
              prune: true
              selfHeal: true
            retry:
              limit: 5
              backoff:
                duration: 5s
                maxDuration: 3m0s
                factor: 2
            syncOptions:
              - Validate=false
              - PruneLast=true
              - CreateNamespace=true
              - Replace=true
    EOF
    
  2. Commit and push the calisti-gitops repository.

    git add apps/smm-operator charts/smm-operator
    
    git commit -m "add smm-operator app"
    
    [main d4f6809] add smm-operator app
    35 files changed, 80310 insertions(+)
    create mode 100644 apps/smm-operator/app-set.yaml
    create mode 100644 charts/smm-operator/.helmignore
    create mode 100644 charts/smm-operator/Chart.yaml
    create mode 100644 charts/smm-operator/README.md
    create mode 100644 charts/smm-operator/crds/clusterfeature-crd.yaml
    create mode 100644 charts/smm-operator/crds/clusters-crd.yaml
    create mode 100644 charts/smm-operator/crds/crd-alertmanagerconfigs.yaml
    create mode 100644 charts/smm-operator/crds/crd-alertmanagers.yaml
    create mode 100644 charts/smm-operator/crds/crd-podmonitors.yaml
    create mode 100644 charts/smm-operator/crds/crd-probes.yaml
    create mode 100644 charts/smm-operator/crds/crd-prometheuses.yaml
    create mode 100644 charts/smm-operator/crds/crd-prometheusrules.yaml
    create mode 100644 charts/smm-operator/crds/crd-servicemonitors.yaml
    create mode 100644 charts/smm-operator/crds/crd-thanosrulers.yaml
    create mode 100644 charts/smm-operator/crds/crds.yaml
    create mode 100644 charts/smm-operator/crds/health.yaml
    create mode 100644 charts/smm-operator/crds/istio-operator-v1-crds.yaml
    create mode 100644 charts/smm-operator/crds/istio-operator-v2-crds.gen.yaml
    create mode 100644 charts/smm-operator/crds/istiooperator-crd.yaml
    create mode 100644 charts/smm-operator/crds/koperator-crds.yaml
    create mode 100644 charts/smm-operator/crds/metadata-crd.yaml
    create mode 100644 charts/smm-operator/crds/resourcesyncrules-crd.yaml
    create mode 100644 charts/smm-operator/crds/sre.yaml
    create mode 100644 charts/smm-operator/templates/_helpers.tpl
    create mode 100644 charts/smm-operator/templates/authproxy-rbac.yaml
    create mode 100644 charts/smm-operator/templates/authproxy-service.yaml
    create mode 100644 charts/smm-operator/templates/ecr.deployment.yaml
    create mode 100644 charts/smm-operator/templates/ecr.secret.yaml
    create mode 100644 charts/smm-operator/templates/ecr.service-account.yaml
    create mode 100644 charts/smm-operator/templates/namespace.yaml
    create mode 100644 charts/smm-operator/templates/operator-psp-basic.yaml
    create mode 100644 charts/smm-operator/templates/operator-rbac.yaml
    create mode 100644 charts/smm-operator/templates/operator-service.yaml
    create mode 100644 charts/smm-operator/templates/operator-statefulset.yaml
    create mode 100644 charts/smm-operator/values.yaml
    
    git push
    
  3. Apply the Application manifests.

    kubectl apply -f "apps/smm-operator/app-set.yaml"
    
  4. Verify that the applications have been added to Argo CD and are healthy.

    argocd app list
    

    Expected output:

    NAME                                    CLUSTER             NAMESPACE            PROJECT  STATUS  HEALTH   SYNCPOLICY  CONDITIONS  REPO                                                         PATH                 TARGET
    argocd/smm-operator-workload-cluster-1  workload-cluster-1  smm-registry-access  default  Synced  Healthy  Auto-Prune  <none>      https://github.com/<github-user>/calisti-gitops-multi-cluster.git  charts/smm-operator  HEAD
    argocd/smm-operator-workload-cluster-2  workload-cluster-2  smm-registry-access  default  Synced  Healthy  Auto-Prune  <none>      https://github.com/<github-user>/calisti-gitops-multi-cluster.git  charts/smm-operator  HEAD
    
  5. Check the smm-operator application on the Argo CD Web UI.

    SMM Operator SMM Operator

Deploy the smm-controlplane application

The following steps show you how to deploy the smm-controlplane application as an active-passive deployment. To create and active-active deployment, follow the same steps, there is an optional step that changes the active-passive deployment to active-active. For details, see Deployment models.

Deploy the smm-controlplane application using Kustomize: the active on workload-cluster-1 and the passive on workload-cluster-2. The active cluster receives every component, while the passive cluster only a few required components. This part of the repository will look like this:

└── manifests
    ├── smm-controlplane
    │   ├── base
    │   │   ├── control-plane.yaml
    │   │   ├── cert-manager-namespace.yaml
    │   │   ├── istio-system-namespace.yaml
    │   │   ├── istio-cp-v115x.yaml
    │   │   └── kustomization.yaml
    │   └── overlays
    │       ├── workload-cluster-1
    │       │   ├── control-plane.yaml
    │       │   ├── istio-cp-v115x.yaml
    │       │   └── kustomization.yaml
    │       └── workload-cluster-2
    │           ├── control-plane.yaml
    │           ├── istio-cp-v115x.yaml
    │           └── kustomization.yaml
  1. Create the following namespaces files.

    cat > manifests/smm-controlplane/base/cert-manager-namespace.yaml <<EOF
    apiVersion: v1
    kind: Namespace
    metadata:
      annotations:
        argocd.argoproj.io/sync-wave: "1"
      name: cert-manager
    EOF
    
    cat > manifests/smm-controlplane/base/istio-system-namespace.yaml << EOF
    apiVersion: v1
    kind: Namespace
    metadata:
      annotations:
        argocd.argoproj.io/sync-wave: "2"
      name: istio-system
    EOF
    
  2. Create the IstioControlPlane file.

    cat > manifests/smm-controlplane/base/istio-cp-v115x.yaml << EOF
    apiVersion: servicemesh.cisco.com/v1alpha1
    kind: IstioControlPlane
    metadata:
      annotations:
        argocd.argoproj.io/sync-wave: "5"
      name: cp-v115x
      namespace: istio-system
    spec:
      containerImageConfiguration:
        imagePullPolicy: Always
        imagePullSecrets:
        - name: smm-pull-secret
      distribution: cisco
      istiod:
        deployment:
          env:
          - name: ISTIO_MULTIROOT_MESH
            value: "true"
          image: registry.eticloud.io/smm/istio-pilot:v1.15.3-bzc.0
      k8sResourceOverlays:
      - groupVersionKind:
          group: apps
          kind: Deployment
          version: v1
        objectKey:
          name: istiod-cp-v115x
          namespace: istio-system
        patches:
        - path: /spec/template/spec/containers/0/args/-
          type: replace
          value: --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_AES_256_GCM_SHA384,TLS_AES_128_GCM_SHA256
      meshConfig:
        defaultConfig:
          envoyAccessLogService:
            address: smm-als.smm-system.svc.cluster.local:50600
            tcpKeepalive:
              interval: 10s
              probes: 3
              time: 10s
            tlsSettings:
              mode: ISTIO_MUTUAL
          holdApplicationUntilProxyStarts: true
          proxyMetadata:
            ISTIO_META_ALS_ENABLED: "true"
            PROXY_CONFIG_XDS_AGENT: "true"
          tracing:
            tlsSettings:
              mode: ISTIO_MUTUAL
            zipkin:
              address: smm-zipkin.smm-system.svc.cluster.local:59411
        enableEnvoyAccessLogService: true
        enableTracing: true
      meshExpansion:
        enabled: true
        gateway:
          deployment:
            podMetadata:
              labels:
                app: istio-meshexpansion-gateway
                istio: meshexpansiongateway
          service:
            ports:
            - name: tcp-smm-als-tls
              port: 50600
              protocol: TCP
              targetPort: 50600
            - name: tcp-smm-zipkin-tls
              port: 59411
              protocol: TCP
              targetPort: 59411
      meshID: mesh1
      mode: ACTIVE
      networkName: network1
      proxy:
        image: registry.eticloud.io/smm/istio-proxyv2:v1.15.3-bzc.0
      proxyInit:
        cni:
          daemonset:
            image: registry.eticloud.io/smm/istio-install-cni:v1.15.3-bzc.0
        image: registry.eticloud.io/smm/istio-proxyv2:v1.15.3-bzc.0
      sidecarInjector:
        deployment:
          image: registry.eticloud.io/smm/istio-sidecar-injector:v1.15.3-bzc.0
      version: 1.15.3
    EOF
    
  3. Create the kustomization.yaml file.

    cat > manifests/smm-controlplane/base/kustomization.yaml <<EOF
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    metadata:
      name: cluster-secrets
    
    
    resources:
    - cert-manager-namespace.yaml
    - istio-system-namespace.yaml
    - istio-cp-v115x.yaml
    - control-plane.yaml
    EOF
    
  4. Create the manifests/smm-controlplane/base/control-plane.yaml file. You don’t need to set the CLUSTER-NAME here, you will set it with the overlays customization.

    cat > manifests/smm-controlplane/base/control-plane.yaml <<EOF
    apiVersion: smm.cisco.com/v1alpha1
    kind: ControlPlane
    metadata:
      annotations:
        argocd.argoproj.io/sync-wave: "10"
      name: smm
    spec:
      certManager:
        namespace: cert-manager
      clusterName: CLUSTER-NAME
      clusterRegistry:
        enabled: true
        namespace: cluster-registry
      log: {}
      meshManager:
        enabled: true
        istio:
          enabled: true
          istioCRRef:
            name: cp-v115x
            namespace: istio-system
          operators:
            namespace: smm-system
        namespace: smm-system
      nodeExporter:
        enabled: true
        namespace: smm-system
        psp:
          enabled: false
        rbac:
          enabled: true
      oneEye: {}
      registryAccess:
        enabled: true
        imagePullSecretsController: {}
        namespace: smm-registry-access
        pullSecrets:
        - name: smm-registry.eticloud.io-pull-secret
          namespace: smm-registry-access
      repositoryOverride:
        host: registry.eticloud.io
        prefix: smm
      role: active
      smm:
        als:
          enabled: true
          log: {}
        application:
          enabled: true
          log: {}
        auth:
          mode: impersonation
        certManager:
          enabled: true
        enabled: true
        federationGateway:
          enabled: true
          name: smm
          service:
            enabled: true
            name: smm-federation-gateway
            port: 80
        federationGatewayOperator:
          enabled: true
        impersonation:
          enabled: true
        istio:
          revision: cp-v115x.istio-system
        leo:
          enabled: true
          log: {}
        log: {}
        namespace: smm-system
        prometheus:
          enabled: true
          replicas: 1
        prometheusOperator: {}
        releaseName: smm
        role: active
        sdm:
          enabled: false
        sre:
          enabled: true
        useIstioResources: true
    EOF
    
  5. Create the kustomization.yaml file for workload-cluster-1.

    cat > manifests/smm-controlplane/overlays/workload-cluster-1/kustomization.yaml <<EOF
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    
    bases:
      - ../../base
    
    patchesStrategicMerge:
      - istio-cp-v115x.yaml
      - control-plane.yaml
    EOF
    
  6. Set the clusterName by overriding some settings coming from the base configuration. Create the following files.

    cat > manifests/smm-controlplane/overlays/workload-cluster-1/control-plane.yaml <<EOF
    apiVersion: smm.cisco.com/v1alpha1
    kind: ControlPlane
    metadata:
      name: smm
    spec:
      clusterName: workload-cluster-1
      certManager:
        enabled: true
      smm:
        exposeDashboard:
          meshGateway:
            enabled: true
        auth:
          forceUnsecureCookies: true
          mode: anonymous
    EOF
    
    cat > manifests/smm-controlplane/overlays/workload-cluster-1/istio-cp-v115x.yaml <<EOF
    apiVersion: servicemesh.cisco.com/v1alpha1
    kind: IstioControlPlane
    metadata:
      annotations:
        argocd.argoproj.io/sync-wave: "5"
      name: cp-v115x
      namespace: istio-system
    spec:
      meshID: mesh1
      mode: ACTIVE
      networkName: network1
    EOF
    
  7. Create the kustomization.yaml file for workload-cluster-2.

    cat > manifests/smm-controlplane/overlays/workload-cluster-2/kustomization.yaml <<EOF
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    
    bases:
      - ../../base
    
    patchesStrategicMerge:
      - istio-cp-v115x.yaml
      - control-plane.yaml
    EOF
    
  8. Create the following files for workload-cluster-2. This sets the clusterName, and also overrides some settings of the base configuration.

    cat > manifests/smm-controlplane/overlays/workload-cluster-2/control-plane.yaml <<EOF
    apiVersion: smm.cisco.com/v1alpha1
    kind: ControlPlane
    metadata:
      name: smm
    spec:
      clusterName: workload-cluster-2
      role: passive
      smm:
        als:
          enabled: true
          log: {}
        application:
          enabled: false
          log: {}
        auth:
          mode: impersonation
        certManager:
          enabled: false
        enabled: true
        federationGateway:
          enabled: false
          name: smm
          service:
            enabled: true
            name: smm-federation-gateway
            port: 80
        federationGatewayOperator:
          enabled: true
        grafana:
          enabled: false
        impersonation:
          enabled: true
        istio:
          revision: cp-v115x.istio-system
        kubestatemetrics:
          enabled: true
        leo:
          enabled: false
          log: {}
        log: {}
        namespace: smm-system
        prometheus:
          enabled: true
          replicas: 1
          retentionTime: 8h
        prometheusOperator: {}
        releaseName: smm
        role: passive
        sdm:
          enabled: false
        sre:
          enabled: false
        tracing:
          enabled: true
        useIstioResources: false
        web:
          enabled: false
    EOF
    
    cat > manifests/smm-controlplane/overlays/workload-cluster-2/istio-cp-v115x.yaml <<EOF
    apiVersion: servicemesh.cisco.com/v1alpha1
    kind: IstioControlPlane
    metadata:
      annotations:
        argocd.argoproj.io/sync-wave: "5"
      name: cp-v115x
      namespace: istio-system
    spec:
      meshID: mesh1
      mode: PASSIVE
      networkName: workload-cluster-2
    EOF
    
  9. (Optional) If you want to change your active-passive deployment to active-active, complete this step. Otherwise, continue with the Commit deployment step. Run the following commands to modify the control planes of workload-cluster-2.

    cat > manifests/smm-controlplane/overlays/workload-cluster-2/control-plane.yaml <<EOF
    apiVersion: smm.cisco.com/v1alpha1
    kind: ControlPlane
    metadata:
      name: smm
    spec:
      clusterName: workload-cluster-2
      role: active
    EOF
    
    cat > manifests/smm-controlplane/overlays/workload-cluster-2/istio-cp-v115x.yaml <<EOF
    apiVersion: servicemesh.cisco.com/v1alpha1
    kind: IstioControlPlane
    metadata:
      annotations:
        argocd.argoproj.io/sync-wave: "5"
      name: cp-v115x
      namespace: istio-system
    spec:
      meshID: mesh1
      mode: ACTIVE
      networkName: network1
    EOF
    
  10. Create the smm-controlplane’s Argo CD ApplicationSet CR.

    cat > apps/smm-controlplane/app-set.yaml <<EOF
    apiVersion: argoproj.io/v1alpha1
    kind: ApplicationSet
    metadata:
      name: smm-cp-appset
      namespace: argocd
    spec:
      generators:
      - list:
          elements:
          - cluster: "${WORKLOAD_CLUSTER_1_CONTEXT}"
            path: "${WORKLOAD_CLUSTER_1_CONTEXT}"
          - cluster: "${WORKLOAD_CLUSTER_2_CONTEXT}"
            path: "${WORKLOAD_CLUSTER_2_CONTEXT}"
      template:
        metadata: 
          name: 'smm-cp-{{cluster}}'
          namespace: argocd
        spec:
          project: default
          source:
            repoURL: https://github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git
            targetRevision: HEAD
            path: manifests/smm-controlplane/overlays/{{path}}
          destination:
            name: '{{cluster}}'
          syncPolicy:
            automated:
              prune: true
              selfHeal: true
            retry:
              limit: 5
                backoff:
                  duration: 5s
                  maxDuration: 3m0s
                  factor: 2
            syncOptions:
              - Validate=false
              - PruneLast=true
              - CreateNamespace=true
              - Replace=true
    EOF
    
  11. Commit and push the calisti-gitops repository.

    git add apps/smm-controlplane manifests
    
    git commit -m "add smm-controlplane app"
    
    git push
    
  12. Apply the Application manifests.

    kubectl apply -f "apps/smm-controlplane/app-set.yaml"
    
  13. Verify that the applications have been added to Argo CD and are healthy.

    argocd app list
    
  14. To create trust between workload-cluster-1 and workload-cluster-2, you must exchange the Secret CRs of the clusters. The cluster registry controller helps to form a group of Kubernetes clusters and synchronize any resources across those clusters arbitrarily.

    Create the following bash script and run it on workload-cluster-1.

    cat > export-secrets.sh <<EOF
    set -e
    
    kubectl --context workload-cluster-1 get cluster workload-cluster-1 -o yaml | kubectl --context workload-cluster-2 apply -f -
    kubectl --context workload-cluster-1 -n cluster-registry get secrets workload-cluster-1 -o yaml | kubectl --context workload-cluster-2 apply -f -
    
    kubectl --context workload-cluster-2 get cluster workload-cluster-2 -o yaml | kubectl --context workload-cluster-1 apply -f -
    kubectl --context workload-cluster-2 -n cluster-registry get secrets workload-cluster-2 -o yaml | kubectl --context workload-cluster-1 apply -f -
    
    echo "Exporting cluster and secrets CRs successfully."
    EOF
    
    chmod +x export-secrets.sh
    ./export-secrets.sh
    

    Expected output:

    cluster.clusterregistry.k8s.cisco.com/workload-cluster-1 created
    secret/workload-cluster-1 created
    cluster.clusterregistry.k8s.cisco.com/workload-cluster-2 created
    secret/workload-cluster-2 created
    Exporting cluster and secrets CRs successfully.
    
  15. Check that all pods are healthy and running in the smm-system namespace on workload-cluster-1 and workload-cluster-2. Note that it takes some time while the ControlPlane operator reconciles the resources.

    For workload-cluster-1:

    kubectl get pods -n smm-system --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}"
    

    Expected output:

    NAME                                               READY   STATUS    RESTARTS        AGE
    istio-operator-v115x-7d77fc549f-fxmtd              2/2     Running   0               8m15s
    mesh-manager-0                                     2/2     Running   0          19m
    prometheus-node-exporter-9xnmj                     1/1     Running   0          17m
    prometheus-node-exporter-bf7g5                     1/1     Running   0          17m
    prometheus-node-exporter-cl69q                     1/1     Running   0          17m
    prometheus-smm-prometheus-0                        4/4     Running   0          18m
    smm-7f4d5d4fff-4dlcp                               2/2     Running   0          18m
    smm-7f4d5d4fff-59k7g                               2/2     Running   0          18m
    smm-als-7cc4bfb998-wjsr6                           2/2     Running   0          18m
    smm-authentication-569484f748-fj5zk                2/2     Running   0          18m
    smm-federation-gateway-6964fb956f-pb5pv            2/2     Running   0          18m
    smm-federation-gateway-operator-6664774695-9tmzj   2/2     Running   0          18m
    smm-grafana-59c54f67f4-9snc5                       3/3     Running   0          18m
    smm-health-75bf4f49c5-z9tqg                        2/2     Running   0          18m
    smm-health-api-7767d4f46-744wn                     2/2     Running   0          18m
    smm-ingressgateway-6ffdfc6d79-jttjz                1/1     Running   0          11m
    smm-ingressgateway-external-8c9bb9445-kjt8h        1/1     Running   0          11m
    smm-kubestatemetrics-86c6f96789-lp576              2/2     Running   0          18m
    smm-leo-67cd7d49b5-gmcvf                           2/2     Running   0          18m
    smm-prometheus-operator-ffbfb8b67-fwj6g            3/3     Running   0          18m
    smm-sre-alert-exporter-6654968479-fthk6            2/2     Running   0          18m
    smm-sre-api-86c9fb7cd7-mq7cm                       2/2     Running   0          18m
    smm-sre-controller-6889685f9-hxxh5                 2/2     Running   0          18m
    smm-tracing-5886d59dd-v8nb8                        2/2     Running   0          18m
    smm-vm-integration-5b89c4f7c9-wz4bt                2/2     Running   0          18m
    smm-web-d5b49c7f6-jgz7b                            3/3     Running   0          18m
    

    For workload-cluster-2:

    kubectl get pods -n smm-system --kubeconfig "${WORKLOAD_CLUSTER_2_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_2_CONTEXT}"
    

    Expected output:

    NAME                                       READY   STATUS    RESTARTS        AGE
    istio-operator-v115x-7d77fc549f-s5wnz      2/2     Running   0               9m5s
    mesh-manager-0                            2/2     Running   0             21m
    prometheus-node-exporter-fzdn4            1/1     Running   0             5m18s
    prometheus-node-exporter-rkbcl            1/1     Running   0             5m18s
    prometheus-node-exporter-x2mwp            1/1     Running   0             5m18s
    prometheus-smm-prometheus-0               3/3     Running   0             5m20s
    smm-ingressgateway-5db7859d45-6d6ns       1/1     Running   0             12m
    smm-kubestatemetrics-86c6f96789-j64q2     2/2     Running   0             19m
    smm-prometheus-operator-ffbfb8b67-zwqn2   3/3     Running   1 (11m ago)   19m
    
  16. Check the applications on the Argo CD Web UI.

    Argo CD Web UI Argo CD Web UI

At this point, you have successfully installed smm-operator and workload-cluster-1 and workload-cluster-2. You can open the Service Mesh Manager dashboard to check them, or deploy an application.

Deploy an application

If you want to deploy want to deploy an application into the service mesh, complete the following steps. The examples use the Service Mesh Manager demo application.

The file structure for the demo application looks like this:

.
├── README.md
├── apps
│   ├── demo-app
│   │   └── app-set.yaml
│   └── ...
...manifests
   └── demo-app
        ├── base
        │   ├── demo-app-namespace.yaml
        │   ├── demo-app.yaml
        │   └── kustomization.yaml
        └── overlays
            ├── workload-cluster-1
            │   ├── demo-app.yaml
            │   └── kustomization.yaml
            └── workload-cluster-2
                ├── demo-app.yaml
                └── kustomization.yaml
...
  1. Create the application manifest files.

    cat > manifests/demo-app/base/kustomization.yaml <<EOF
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    metadata:
      name: demo-app
    
    resources:
    - demo-app-namespace.yaml
    - demo-app.yaml
    EOF
    
    cat > manifests/demo-app/base/demo-app-namespace.yaml <<EOF
    apiVersion: v1
    kind: Namespace
    metadata:
      labels:
        app.kubernetes.io/instance: smm-demo
        app.kubernetes.io/name: smm-demo
        app.kubernetes.io/part-of: smm-demo
        app.kubernetes.io/version: 0.1.4
        istio.io/rev: cp-v115x.istio-system
      name: smm-demo
    EOF
    
    cat > manifests/demo-app/base/demo-app.yaml <<EOF
    apiVersion: smm.cisco.com/v1alpha1
    kind: DemoApplication
    metadata:
      name: smm-demo
      namespace: smm-demo
    spec:
      autoscaling:
        enabled: true
      controlPlaneRef:
        name: smm
    EOF
    
    cat > manifests/demo-app/overlays/workload-cluster-1/kustomization.yaml <<EOF
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    
    bases:
      - ../../base
    
    patchesStrategicMerge:
      - demo-app.yaml
    EOF
    
    cat > manifests/demo-app/overlays/workload-cluster-1/demo-app.yaml <<EOF
    apiVersion: smm.cisco.com/v1alpha1
    kind: DemoApplication
    metadata:
      name: smm-demo
      namespace: smm-demo
    spec:
      autoscaling:
        enabled: true
      controlPlaneRef:
        name: smm
      deployIstioResources: true
      deploySLOResources: true
      enabled: true
      enabledComponents:
      - frontpage
      - catalog
      - bookings
      - postgresql
      istio:
        revision: cp-v115x.istio-system
      load:
        enabled: true
        maxRPS: 30
        minRPS: 10
        swingPeriod: 1380000000000
      replicas: 1
      resources:
        limits:
          cpu: "2"
          memory: 192Mi
        requests:
          cpu: 40m
          memory: 64Mi
    EOF
    
    cat > manifests/demo-app/overlays/workload-cluster-2/kustomization.yaml <<EOF
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    
    bases:
      - ../../base
    
    patchesStrategicMerge:
      - demo-app.yaml
    EOF
    
    cat > manifests/demo-app/overlays/workload-cluster-2/demo-app.yaml <<EOF
    apiVersion: smm.cisco.com/v1alpha1
    kind: DemoApplication
    metadata:
      name: smm-demo
      namespace: smm-demo
    spec:
      autoscaling:
        enabled: true
      controlPlaneRef:
        name: smm
      deployIstioResources: false
      deploySLOResources: false
      enabled: true
      enabledComponents:
      - movies
      - payments
      - notifications
      - analytics
      - database
      - mysql
      istio:
        revision: cp-v115x.istio-system
      replicas: 1
      resources:
        limits:
          cpu: "2"
          memory: 192Mi
        requests:
          cpu: 40m
          memory: 64Mi
    EOF
    
  2. Create the Demo application ApplicationSet.

    cat > apps/demo-app/app-set.yaml <<EOF
    apiVersion: argoproj.io/v1alpha1
    kind: ApplicationSet
    metadata:
      name: demo-app-appset
      namespace: argocd
    spec:
      generators:
      - list:
          elements:
          - cluster: "${WORKLOAD_CLUSTER_1_CONTEXT}"
            path: "${WORKLOAD_CLUSTER_1_CONTEXT}"
          - cluster: "${WORKLOAD_CLUSTER_2_CONTEXT}"
            path: "${WORKLOAD_CLUSTER_2_CONTEXT}"
      template:
        metadata: 
          name: 'demo-app-{{cluster}}'
          namespace: argocd
        spec:
          project: default
          source:
            repoURL: https://github.com/${GITHUB_ID}/${GITHUB_REPOSITORY_NAME}.git
            targetRevision: HEAD
            path: manifests/demo-app/overlays/{{path}}
          destination:
            name: '{{cluster}}'
          syncPolicy:
            automated:
              prune: true
              selfHeal: true
            retry:
              limit: 5
              backoff:
                duration: 5s
                maxDuration: 3m0s
                factor: 2
            syncOptions:
              - Validate=false
              - PruneLast=true
              - CreateNamespace=true
              - Replace=true
    EOF
    
  3. Commit and push the calisti-gitops repository.

    git add apps/demo-app manifests
    git commit -m "add demo application"
    git push origin
    
  4. Deploy the demo application on the clusters.

    kubectl apply -f apps/demo-app/app-set.yaml
    
  5. Wait until all the pods in the smm-demo namespace are up and running.

    kubectl get pods -n smm-demo --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}"
    

    Expected output:

    NAME                           READY   STATUS    RESTARTS   AGE
    bombardier-5f59948978-zx99c    2/2     Running   0          3m21s
    bookings-v1-68dd865855-fdcxk   2/2     Running   0          3m21s
    catalog-v1-6d564bbcb8-qmhbx    2/2     Running   0          3m21s
    frontpage-v1-b4686759b-fhfmv   2/2     Running   0          3m21s
    postgresql-7cf55cd596-grs46    2/2     Running   0          3m21s
    
    kubectl get pods -n smm-demo --kubeconfig "${WORKLOAD_CLUSTER_2_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_2_CONTEXT}"
    

    Expected output:

    NAME                                READY   STATUS    RESTARTS   AGE
    analytics-v1-799d668f84-p4nkk       2/2     Running   0          3m58s
    database-v1-6896cd4b59-9xxgg        2/2     Running   0          3m58s
    movies-v1-9594fff5f-8hv9l           2/2     Running   0          3m58s
    movies-v2-5559c5567c-2279n          2/2     Running   0          3m58s
    movies-v3-649b99d977-nkdxc          2/2     Running   0          3m58s
    mysql-669466cc8d-bs4s9              2/2     Running   0          3m58s
    notifications-v1-79bc79c89b-4bbss   2/2     Running   0          3m58s
    payments-v1-547884bfdf-dg2dm        2/2     Running   0          3m58s
    
  6. Check the applications on the Argo CD web UI.

    Argo CD Web UI Argo CD Web UI

  7. Open the Service Mesh Manager web interface, select MENU > TOPOLOGY, then select the smm-demo namespace.

    Demo Application Topology Demo Application Topology

Access the Service Mesh Manager dashboard

  1. You can access the Service Mesh Manager dashboard via the smm-ingressgateway-external LoadBalancer external-ip-or-hostname address. Run the following command to retrieve the IP address:

    kubectl get services -n smm-system smm-ingressgateway-external --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}"
    

    Expected output:

    NAME                          TYPE           CLUSTER-IP   EXTERNAL-IP                PORT(S)        AGE
    smm-ingressgateway-external   LoadBalancer   10.0.0.199   external-ip-or-hostname    80:32505/TCP   2m28s
    
  2. Open the Service Mesh Manager dashboard using one of the following methods:

    • Open the http://<external-ip-or-hostname> URL in your browser.

    • Run the following command to open the dashboard with your default browser:

      # Exactly one of hostname or IP will be available and used for the remote URL.
      open http://$(kubectl get services -n smm-system smm-ingressgateway-external -o jsonpath='{.status.loadBalancer.ingress[0].hostname}{.status.loadBalancer.ingress[0].ip}' --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}")
      
    • If you have installed the Service Mesh Manager CLI on your machine, run the following command to open the Service Mesh Manager Dashboard in the default browser.

      smm dashboard --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" --context "${WORKLOAD_CLUSTER_1_CONTEXT}"
      

      Expected output:

      ✓ validate-kubeconfig ❯ checking cluster reachability...
      ✓ opening Service Mesh Manager at http://127.0.0.1:50500
      
  3. Check the deployments on the dashboard, for example, on the MENU > Overview, MENU > MESH, and MENU > TOPOLOGY pages.

Service Mesh Manager Overview Service Mesh Manager Overview

Service Mesh Manager Mesh Service Mesh Manager Mesh

Service Mesh Manager Topology Service Mesh Manager Topology

2.3.8 - OpenShift integration

Red Hat OpenShift provides scalable and reliable solutions to monitor microservices and offers full-fledged container security. With a Red Hat OpenShift Service on AWS (ROSA) setup, you can seamlessly install Calisti on Red Hat OpenShift clusters. Calisti avoids vendor lock-in so that you can mix-n-match your cluster cloud providers and transition or continue using mixed multi-clusters (GKE, EKS, AKS, OpenShift).

Supported OpenShift Versions

Cloud Managed Services/Platforms Calisti 1.12.x
Red Hat OpenShift Service on AWS (ROSA) 4.11

Prerequisites

Install Calisti on OpenShift

  • To install Calisti using the CLI, run the Calisti installation command with --platform=openshift flag:

    smm install --platform=openshift
    
  • To install Calisti as a Kubernetes operator, set the spec.platform field to openshift in the Calisti control plane manifest:

    spec:
      platform: openshift
    

Istio Container Network Interface and NetworkAttachmentDefinition

Calisti performs the following two steps to have istio-cni work well in OpenShift.

Install Istio CNI pods

The Container Network Interface (CNI) on OpenShift is managed by Multus and istio-cni is not enabled in OpenShift by default. Calisti can invoke istio-cni with ease. During the Calisti installation process, the following CNI configuration is automatically patched to the existing Istio control plane manifest:

cni:
  binDir: /var/lib/cni/bin
  chained: false
  confDir: /etc/cni/multus/net.d
  confFileName: istio-cni.conf
  daemonset:
  image: registry.eticloud.io/smm/istio-install-cni:v1.15.3-bzc.0
    securityContext:
      privileged: true
  enabled: true

The meaning of the fields is:

  • binDir is the directory path to store CNI binaries.
  • chained determines whether to install CNI plugin as chained or standalone.
  • confDir is the directory path on the host where CNI network plugins are installed.
  • confFileName specifies the name of the CNI configuration file.

With the CNI configuration pathched, the istio-operator invokes istio-cni on OpenShift. After the Istio control plane is updated, the istio-operator reconciles and deploys istio-cni DaemonSets which create related pods on every node in the cluster.

For more details on how istio-cni works, see the Istio CNI plugin documentation.

Install NetworkAttachmentDefinition objects

The network Redirect method via the istio-cni is also required for Openshift support. The NetworkAttachmentDefinition custom resource objects are required for the istio-cni to be invoked. Calisti automatically deploys NetworkAttachmentDefinition into the data plane namespaces with Istio sidecar enabled (for example, smm-system, supertubes-system).

For details, see the Istio documentation.

SecurityContextConstraints

The Calisti container security configurations align with OpenShift’s pod and container security standards.

Note: Calisti components that are installed on Openshift adhere to the SecurityContextConstraints set forth by the Red Hat SCC standards.

UID/PID 1337 for Istio sidecars

The Istio network redirect layer assumes the istio-proxy is running in the pod with UID/GID 1337. This is not configurable in Istio. The Istio injection templates for sidecars and gateways enable the istio-proxy containers' securityContext field with:

runAsGroup: 1337
runAsNonRoot: true
runAsUser: 1337

The default OpenShift SecurityContextConstraints reject the UID/GID of 1337 as it’s not in the [1001190000, 1001199999] range.

To allow pods with istio-proxy sidecars/gateways to come up, the service account for the pod needs to be added to the nonroot-v2 policy. One or more of the following SecurityContextConstraints approaches need to be implemented to attach to nonroot-v2 SCC policy.

RBAC bindings to SecurityContextConstraints

The Istio control-plane component service accounts require RBAC bindings to SecurityContextConstraints (SCC). Here is the list of SCC bindings Calisti deploys for Istio:

  • istiod and istio meshgateway: nonroot-v2 SCC binding for UID/GID 1337
  • istio-cni-node pods: privileged SCC binding for NET_ADMIN to allow iptables setup and hostnetwork SCC binding for access to the node’s network namespaces (netns)

Calisti deploys Prometheus node-exporter. To have node-exporters functioning properly, Calisti also deploys the following SCC bindings for node-exporters:

  • node-exporter SCC binding for privileged accesses to the node resources
  • hostwork SCC binding to allow node-exporter having configuration hostNetwork=true

For more details regarding SCC policies, see Important OpenShift changes to Pod Security Standards.

Calisti Prometheus OpenShift integration

Calisti leverages its own Prometheus chart to provide the functionality to monitor all kinds of different metrics. Given that the memory requirement varies in the OpenShift environment, the memory limits of k8sproxy container deployed by the Prometheus operator are increased when installing Calisti on OpenShift.

resources:
      limits:
        cpu: 100m
        memory: 1000Mi
      requests:
        cpu: 50m
        memory: 500Mi

2.3.9 - Install FIPS images

To install the FIPS-compliant build of Service Mesh Manager, complete the following steps.

  1. Download the following YAML file. It contains the list of FIPS-compliant images the installer should use.

    
    
  2. Follow any of the regular installation guides (for example, Create single cluster mesh or Create multi-cluster mesh), but use the following customized YAML file with the initial installation command to use the FIPS-compliant versions of the images. For example, for a non-interactive single-cluster installation, run:

    smm install  -a --istio-cr-file istio-fips.yaml
    

2.3.10 - Customize installation

The installation of Service Mesh Manager can be customized through its CRs. This page covers the most frequently used configuration options for Service Mesh Manager.

Configure container images

Service Mesh Manager images

The ControlPlane CR can be configured to set the following container images:


The IstioOperator CR can be configured to set the istio-operator container image:


If you installed Service Mesh Manager in operator mode, the changes in these CRs should be reflected automatically on your cluster.

If you don’t have the Service Mesh Manager operator installed, run the following command so that the changes take effect:

smm operator reconcile

Istio images

The IstioControlPlane CR can be configured to set the following Istio container images:


These changes should be automatically reflected on your cluster after editing the CR.

List of configurable images

Based on the CRs above, you can configure the following components in Service Mesh Manager:

Images Repository Tag
smm-als 033498657557.dkr.ecr.us-east-2.amazonaws.com/smm-als v1.12.0
smm 033498657557.dkr.ecr.us-east-2.amazonaws.com/smm v1.12.0
smm-auth 033498657557.dkr.ecr.us-east-2.amazonaws.com/smm-authentication v1.12.0
smm-grafana grafana/grafana 7.5.11
smm-federation-gateway 033498657557.dkr.ecr.us-east-2.amazonaws.com/smm-federation-gateway v1.12.0
smm-federation-gateway-operator 033498657557.dkr.ecr.us-east-2.amazonaws.com/smm-federation-gateway-operator v1.12.0
smm-health 033498657557.dkr.ecr.us-east-2.amazonaws.com/smm-health v1.12.0
smm-health-api 033498657557.dkr.ecr.us-east-2.amazonaws.com/smm-health-api v1.12.0
smm-kubestatemetrics k8s.gcr.io/kube-state-metrics/kube-state-metrics v2.6.0
smm-leo 033498657557.dkr.ecr.us-east-2.amazonaws.com/smm-leo v1.12.0
smm-prometheus prom/prometheus v2.39.1
smm-prometheus-config-reloader quay.io/prometheus-operator/prometheus-config-reloader v0.63.0
smm-thanos quay.io/thanos/thanos v0.28.1
smm-prometheus-operator quay.io/prometheus-operator/prometheus-operator v0.63.0
smm-k8s-proxy 033498657557.dkr.ecr.us-east-2.amazonaws.com/banzaicloud/k8s-proxy v0.0.9
smm-sre 033498657557.dkr.ecr.us-east-2.amazonaws.com/smm-sre v1.12.0
smm-sre-api 033498657557.dkr.ecr.us-east-2.amazonaws.com/smm-sre-api v1.12.0
smm-sre-alert-exporter 033498657557.dkr.ecr.us-east-2.amazonaws.com/smm-sre-alert-exporter v1.12.0
smm-tracing jaegertracing/all-in-one “1.28.0”
smm-web 033498657557.dkr.ecr.us-east-2.amazonaws.com/smm-web v1.12.0
smm-vm-integration 033498657557.dkr.ecr.us-east-2.amazonaws.com/smm-vm-integration v1.12.0
kube-rbac-proxy gcr.io/kubebuilder/kube-rbac-proxy v0.11.0
cert-manager quay.io/jetstack/cert-manager-controller v1.11.0
cert-manager-cainjector quay.io/jetstack/cert-manager-cainjector v1.11.0
cert-manager-webhook quay.io/jetstack/cert-manager-webhook v1.11.0
cluster-registry 033498657557.dkr.ecr.us-east-2.amazonaws.com/banzaicloud/cluster-registry-controller v0.2.10
imagepullsecrets-controller ghcr.io/banzaicloud/imagepullsecrets v0.3.12
istio-operator ghcr.io/banzaicloud/istio-operator v2.15.3(v115x)
istio-sidecarinjector 033498657557.dkr.ecr.us-east-2.amazonaws.com/banzaicloud/istio-sidecar-injector v1.15.3-bzc.0
istio-pilot 033498657557.dkr.ecr.us-east-2.amazonaws.com/banzaicloud/istio-pilot v1.15.3-bzc.1
istio-proxy 033498657557.dkr.ecr.us-east-2.amazonaws.com/banzaicloud/istio-proxyv2 v1.15.3-bzc.0
istio-init-cni 033498657557.dkr.ecr.us-east-2.amazonaws.com/banzaicloud/istio-install-cni v1.15.3-bzc.0

(Updated as of May 5, 2023)

If there is a Service Mesh Manager related image that you’d like to change and that image is not listed here, contact us!

Customize IstioControlPlane CR

You can customize the ControlPlane CR to change the configuration of the IstioControlPlane CR. Set your custom values under the spec.meshManager.istio.istioCROverrides of the ControlPlane CR, and Service Mesh Manager merges them to the IstioControlPlane CR.

For example to enable basic DNS proxying, you can set the ISTIO_META_DNS_CAPTURE field using a similar configuration:

apiVersion: smm.cisco.com/v1alpha1
kind: ControlPlane
metadata:
 name: smm
spec:
...
    meshmanager:
      istio:
      ...
        istioCROverrides: |
          spec:
            meshConfig:
              defaultConfig:
                proxyMetadata:
                  # Enable basic DNS proxying
                  ISTIO_META_DNS_CAPTURE: "true"
                  # Enable automatic address allocation, optional
                  ISTIO_META_DNS_AUTO_ALLOCATE: "true"

Customize Istio namespace

Istio is installed to the istio-system namespace by default. To configure this namespace during installation, use one of these methods:

  • Use the --istio-namespace CLI flag. For example:

    smm install -a --istio-namespace custom-istio-namespace
    
  • Create a YAML file that contains the istioCRRef.namespace field (see the following example), then use the --additional-cp-settings CLI flag.

    spec:
      meshManager:
        enabled: true
        istio:
          enabled: true
          istioCRRef:
            name: cp-test
            namespace: custom-istio-namespace
    

    An example of the command:

    smm install -a --additional-cp-settings /path/to/file.yaml
    

Note: The --istio-namespace CLI flag has the highest priority. If you specify both flags at the same time, the value from the --istio-namespace flag is used.

2.4 - Upgrade

Calisti provides safe upgrades for its main components and their dependencies:

  • Service Mesh Manager,
  • Streaming Data Manager, and
  • the Istio control plane.

Istio follows a rolling support cycle: only the last few versions are supported by the Istio community. The Cisco Istio Distribution included in Service Mesh Manager follows the same model.

Service Mesh Manager follows semantic versioning. To support a new Istio version, a new minor version is created (for example, Istio 1.11 was introduced in Service Mesh Manager 1.8, Istio 1.12 was introduced in Service Mesh Manager 1.9). Always consult the What’s new page to see if a new version of Istio has been introduced.

CAUTION:

Supported upgrade paths

Service Mesh Manager supports upgrades from the prior minor release and patch releases. The current supported upgrade path is: v1.11.x to v1.12.x

Overview of the upgrade procedure

Upgrading Calisti consists of the following high-level steps. The exact method to perform each step depends on the way you have installed Calisti (from the CLI, in operator mode, or using GitOps methodology).

  1. Upgrading the Service Mesh Manager control plane. This is needed regardless of the target Istio version. This step ensures that all Service Mesh Manager components are containing the latest features and security fixes.

    This upgrade also upgrades Istio to the latest patch level. For example: if before the upgrade the cluster had Istio 1.11.0, and the target Service Mesh Manager version contains Istio 1.11.2, then this step upgrades Istio to 1.11.2.

    For details on performing this step, see How to upgrade.

  2. If the new version of Service Mesh Manager contains a new minor or major version of Istio (for example, you have Istio 1.11.2 installed, and the new version contains Istio 1.12), complete the Upgrading your business applications procedure after upgrading Service Mesh Manager.

    Service Mesh Manager avoids big changes to the production traffic by running two versions of the Istio control planes in parallel (for example, 1.11.2 and 1.12.0) on the same cluster. After the upgrade, the existing workloads continue using the older version of Istio (for example, 1.11.2). You can gradually (on a per-namespace basis) move workloads to the new (in the example the 1.12.0) version. This allows operators to start moving services with less business value or risk associated to the new Istio version before moving on to more mission critical services.

  3. If you have Streaming Data Manager installed, upgrade Streaming Data Manager. For details, see Upgrade.

2.4.1 - How to upgrade

The procedure to upgrade Calisti depends on how you have installed Calisti.

CAUTION:

Supported upgrade paths

Service Mesh Manager supports upgrades from the prior minor release and patch releases. The current supported upgrade path is: v1.11.x to v1.12.x

2.4.2 - CLI - single cluster upgrade

CAUTION:

Supported upgrade paths

Service Mesh Manager supports upgrades from the prior minor release and patch releases. The current supported upgrade path is: v1.11.x to v1.12.x

In case you have installed your Calisti deployment using the Calisti CLI, use the CLI to upgrade to the new version.

  • If you are using Calisti on a single cluster, follow this guide.
  • If you are using Calisti in a multi-cluster setup, see CLI - multi-cluster upgrade.
  1. Download the Service Mesh Manager command-line tool for version 1.12.0. The archive contains the smm and supertubes binaries. Extract these binaries and update your local copy on your machine. For details, see Accessing the Service Mesh Manager binaries.

  2. Deploy a new version of Service Mesh Manager.

    The following command upgrades the Service Mesh Manager control plane. The new version has the same Istio control plane (version 1.15.3), so there is no need to restart workloads.

    In the following examples, smm refers to version 1.12.0 of the binary.

    • If you want to upgrade only Service Mesh Manager:

      smm install -a
      
    • If you want to upgrade both Service Mesh Manager and Streaming Data Manager use below command

      smm install -a --install-sdm
      
      • In case you want to have custom settings for your Istio control plane, you can provide that during the installation:

        smm install -a --istio-cr-file <custom-istio-cr-file.yaml>
        
  3. Check that the Service Mesh Manager control plane is upgraded and already uses the new Istio control plane.

    • If you are upgrading only Service Mesh Manager, run the following command to verify that the installation is complete.

      kubectl get pods -n=smm-system -L istio.io/rev
      

      The output should be similar to:

      NAME                                              READY   STATUS    RESTARTS   AGE   REV
      istio-operator-v113x-64bc574fdf-mdtwj             2/2     Running   0          21m
      istio-operator-v115x-8558dbb88c-6r6fx             2/2     Running   0          21m
      mesh-manager-0                                    2/2     Running   0          21m
      prometheus-node-exporter-76jwv                    1/1     Running   0          18m
      prometheus-node-exporter-ptbwk                    1/1     Running   0          18m
      prometheus-node-exporter-w86lc                    1/1     Running   0          18m
      prometheus-smm-prometheus-0                       4/4     Running   0          19m   cp-v115x.istio-system
      smm-6b5575474d-l88lg                              2/2     Running   0          19m   cp-v115x.istio-system
      smm-6b5575474d-wp727                              2/2     Running   0          19m   cp-v115x.istio-system
      smm-als-6b995458c-z8jt9                           2/2     Running   0          19m   cp-v115x.istio-system
      smm-authentication-78d96d6fc9-hg89p               2/2     Running   0          19m   cp-v115x.istio-system
      smm-federation-gateway-7c7d9b7fb5-xgv5t           2/2     Running   0          19m   cp-v115x.istio-system
      smm-federation-gateway-operator-ff8598cb7-xj7pk   2/2     Running   0          19m   cp-v115x.istio-system
      smm-grafana-7bcf9f5885-jhwpg                      3/3     Running   0          19m   cp-v115x.istio-system
      smm-health-56896f5b9b-r54w8                       2/2     Running   0          19m   cp-v115x.istio-system
      smm-health-api-665d4787-pw7z4                     2/2     Running   0          19m   cp-v115x.istio-system
      smm-ingressgateway-b6d5b5b84-l5llx                1/1     Running   0          17m   cp-v115x.istio-system
      smm-kubestatemetrics-5455b9697-5tbgq              2/2     Running   0          19m   cp-v115x.istio-system
      smm-leo-7b64559786-2sj4c                          2/2     Running   0          19m   cp-v115x.istio-system
      smm-prometheus-operator-66dbdb499d-sz6t8          3/3     Running   1          19m   cp-v115x.istio-system
      smm-sre-alert-exporter-668d9cbd68-926t5           2/2     Running   0          19m   cp-v115x.istio-system
      smm-sre-api-86cf44fbbb-lxvxd                      2/2     Running   0          19m   cp-v115x.istio-system
      smm-sre-controller-858b984df6-6b5r6               2/2     Running   0          19m   cp-v115x.istio-system
      smm-tracing-76c688ff6f-7ctjk                      2/2     Running   0          19m   cp-v115x.istio-system
      smm-vm-integration-5df64bdb4b-68xgh               2/2     Running   0          19m   cp-v115x.istio-system
      smm-web-677b9f4f5b-ss9zs                          3/3     Running   0          19m   cp-v115x.istio-system
      
    • If you are upgrading both Service Mesh Manager and Streaming Data Manager, run the following command to verify that the installation is complete.

      kubectl get pods -A -L istio.io/rev
      

      The output should be similar to:

      NAMESPACE                  NAME                                                      READY   STATUS      RESTARTS        AGE     REV
      cert-manager               cert-manager-67575448dd-8qbws                             1/1     Running     0               5h56m
      cert-manager               cert-manager-cainjector-79f8d775c7-ww7fw                  1/1     Running     0               5h56m
      cert-manager               cert-manager-webhook-5949cc4b67-gwknv                     1/1     Running     0               5h56m
      cluster-registry           cluster-registry-controller-b86f8857c-44jh8               1/1     Running     0               5h57m
      csr-operator-system        csr-operator-5955b44674-bvl9p                             2/2     Running     0               5h56m
      istio-system               istio-meshexpansion-v115x-d8555488f-btdx6              1/1     Running     0               37m     v115x.istio-system
      istio-system               istiod-v115x-555749b797-dcwwm                          1/1     Running     0               5h55m   v115x.istio-system
      istio-system               istiod-sdm-iv115x-6c8cfb5fc5-85w2d                     1/1     Running     0               5h55m   sdm-iv115x.istio-system
      kafka                      kafka-operator-operator-76df6db8d4-l4kkq                  3/3     Running     2 (5h52m ago)   5h53m   sdm-iv115x.istio-system
      smm-registry-access        imagepullsecrets-controller-6c45b46459-qb9j8              1/1     Running     0               6h1m
      smm-system                 istio-operator-v113x-6fb944b86b-xgpbd                     2/2     Running     0               5h55m
      smm-system                 istio-operator-v115x-68dcbc59c8-vt2mp                     2/2     Running     0               5h55m
      smm-system                 mesh-manager-0                                            2/2     Running     0               5h56m
      smm-system                 prometheus-node-exporter-74dcm                            1/1     Running     0               5h53m
      smm-system                 prometheus-node-exporter-8s458                            1/1     Running     0               5h59m
      smm-system                 prometheus-node-exporter-vmth4                            1/1     Running     0               5h59m
      smm-system                 prometheus-node-exporter-xsk8j                            1/1     Running     0               5h59m
      smm-system                 prometheus-smm-prometheus-0                               4/4     Running     0               5h55m   v115x.istio-system
      smm-system                 smm-656d45f7cc-c2kd6                                      2/2     Running     0               5h55m   v115x.istio-system
      smm-system                 smm-656d45f7cc-xrx9n                                      2/2     Running     0               5h55m   v115x.istio-system
      smm-system                 smm-als-855c6878b7-55gvd                                  2/2     Running     0               5h55m   v115x.istio-system
      smm-system                 smm-authentication-666547f79f-hwt6t                       2/2     Running     0               5h55m   v115x.istio-system
      smm-system                 smm-federation-gateway-fd4bbb4f8-4nql8                    2/2     Running     1 (5h54m ago)   5h55m   v115x.istio-system
      smm-system                 smm-federation-gateway-operator-bd94d8444-nbvjz           2/2     Running     0               5h55m   v115x.istio-system
      smm-system                 smm-grafana-59c54f67f4-tft2h                              3/3     Running     0               5h55m   v115x.istio-system
      smm-system                 smm-health-86b8dbdf68-k8bfr                               2/2     Running     0               5h55m   v115x.istio-system
      smm-system                 smm-health-api-69bc97d89-gkdp5                            2/2     Running     0               5h55m   v115x.istio-system
      smm-system                 smm-ingressgateway-9875bc895-v95m9                        1/1     Running     0               37m     v115x.istio-system
      smm-system                 smm-kubestatemetrics-86c6f96789-cxsrb                     2/2     Running     0               5h55m   v115x.istio-system
      smm-system                 smm-leo-8446486596-2w7fc                                  2/2     Running     0               5h55m   v115x.istio-system
      smm-system                 smm-prometheus-operator-77cd64556d-ghz5r                  3/3     Running     1 (5h55m ago)   5h55m   v115x.istio-system
      smm-system                 smm-sre-alert-exporter-5dd8b64d58-ccrnh                   2/2     Running     0               5h55m   v115x.istio-system
      smm-system                 smm-sre-api-998fc554b-lpvsq                               2/2     Running     0               5h55m   v115x.istio-system
      smm-system                 smm-sre-controller-68c974c9db-grb44                       2/2     Running     0               5h55m   v115x.istio-system
      smm-system                 smm-tracing-5886d59dd-7k6kt                               2/2     Running     0               5h55m   v115x.istio-system
      smm-system                 smm-vm-integration-5cb96cdd78-mh5lh                       2/2     Running     0               5h55m   v115x.istio-system
      smm-system                 smm-web-55f45cc8c5-gd894                                  3/3     Running     0               5h55m   v115x.istio-system
      supertubes-control-plane   supertubes-control-plane-5bdbfcf5b6-85bw7                 2/2     Running     0               5h57m
      supertubes-system          prometheus-operator-grafana-5fd88bcf86-55kgg              4/4     Running     0               5h53m   sdm-iv115x.istio-system
      supertubes-system          prometheus-operator-kube-state-metrics-5dbf8656db-wlzfw   2/2     Running     2 (5h53m ago)   5h53m   sdm-iv115x.istio-system
      supertubes-system          prometheus-operator-operator-7bdc575546-b4n94             2/2     Running     1 (5h53m ago)   5h53m   sdm-iv115x.istio-system
      supertubes-system          prometheus-operator-prometheus-node-exporter-69cmx        1/1     Running     0               5h53m
      supertubes-system          prometheus-operator-prometheus-node-exporter-75b7q        1/1     Running     0               5h53m
      supertubes-system          prometheus-operator-prometheus-node-exporter-skksk        1/1     Running     0               5h53m
      supertubes-system          prometheus-operator-prometheus-node-exporter-v2pll        1/1     Running     0               5h53m
      supertubes-system          prometheus-prometheus-operator-prometheus-0               3/3     Running     0               5h53m   sdm-iv115x.istio-system
      supertubes-system          supertubes-6f6b86b497-c5zqf                               3/3     Running     1 (5h54m ago)   5h54m   sdm-iv115x.istio-system
      supertubes-system          supertubes-ui-backend-c97564f84-c2vd6                     2/2     Running     2 (5h54m ago)   5h54m   sdm-iv115x.istio-system
      zookeeper                  zookeeper-operator-6ff85cf58d-6kxhk                       2/2     Running     1 (5h54m ago)   5h54m   sdm-iv115x.istio-system
      zookeeper                  zookeeper-operator-post-install-upgrade-qq4kf             0/1     Completed   0               5h54m
      

Restarting workloads

After the upgrade has completed, the Pods running in applications' namespaces are still running the old version of Istio proxy sidecar.

  1. To obtain the latest security patches, restart these Controllers (Deployments, StatefulSets, and so on) either using the kubectl rollout command, or by instructing the CI/CD systems enabled on the cluster. For example, to restart the deployments in a namespace, you can run:

    kubectl rollout restart deployment --namespace <name-of-your-namespace>
    
  2. If the upgrade also involved a minor or major version upgrade of Istio, the kubectl rollout command will only ensure that the latest patch level is being used on the Pods.

    For example: Service Mesh Manager 1.8.2 comes with Istio 1.11, while Service Mesh Manager 1.9.0 is bundled with Istio 1.12. By upgrading from Service Mesh Manager 1.8.2 to 1.9.0, and then restarting the Controllers will only result in the latest 1.11 Istio sidecar proxy to be started in the Pods.

    To upgrade to the new minor/major version of Istio on your workloads, complete the Upgrading your business applications procedure.

2.4.3 - CLI - multi-cluster upgrade

This document shows you how to upgrade Service Mesh Manager in a multi-cluster mesh scenario. For details on how to set up a multi-cluster mesh, see the multi-cluster installation guide. To access the latest binary files, see Accessing the Service Mesh Manager binaries.

Upgrade from 1.11.0 to 1.12.0

To upgrade Service Mesh Manager from 1.11.0 to 1.12.0 for a multi-cluster setup, complete the following steps.

  1. Download the Service Mesh Manager command-line tool for version 1.12.0. The archive contains the smm and supertubes binaries. Extract these binaries and update your local copy on your machine. For details, see Accessing the Service Mesh Manager binaries.

    smm --version
    

    The output should be similar to:

    Service Mesh Manager CLI version  1.12.0 (4c8509faa) built on 2023-05-05T20:57:58Z
    
  2. Deploy a new version of Service Mesh Manager.

    The following command upgrades the Service Mesh Manager control plane. The new version 1.12.0 has the same Istio control plane (version 1.15.3) from the previous version, so there is no need to worry about upgrading istio version for Service Mesh Manager from 1.11.0 to 1.12.0.

    In the following examples, smm refers to version 1.12.0 of the binary.

    • If you want to upgrade only Service Mesh Manager:

      smm install -a
      
      • In case you want to have custom settings for your Istio control plane, you can provide that during the installation:

        smm install -a --istio-cr-file <custom-istio-cr-file.yaml>
        
  3. Rerun the attach command with --force flag to upgrade Service Mesh Manager on the peer cluster:

    smm istio cluster attach <PEER_CLUSTER_KUBECONFIG_FILE> --force
    

Upgrade existing workloads

The new version 1.12.0 has the same Istio control plane (version 1.15.3) as the previous version, so there is no need to upgrade and restart workloads.

2.4.4 - Operator mode

If have installed your Calisti deployment in operator mode, the upgrade procedure only consists of installing a newer version of the operator helm chart and allowing it to reconcile the cluster. Complete the following steps.

Service Mesh Manager upgrade

  1. Uninstall the previous version (1.11.0) of the smm-operator chart.

    helm uninstall smm-operator --namespace smm-registry-access
    
  2. Install the new version (1.12.0) of the smm-operator chart.

    helm install \
      --namespace=smm-registry-access \
      --set "global.ecr.enabled=false" \
      --set "global.basicAuth.username=<your-username>" \
      --set "global.basicAuth.password=<your-password>" \
      smm-operator \
      oci://registry.eticloud.io/smm-charts/smm-operator --version 1.12.0
    

    Note: If the system uses helm for deploying the chart (and not some other CI/CD solution such as Argo CD), then the CustomResourceDefinitions (CRDs) will not be automatically upgraded. In this case, fetch the helm chart locally using the helm pull command and apply the CRDs in the crds folder of the helm chart manually.

  3. After the operator has been started, monitor the status of the ControlPlane resource until it finishes the upgrade (reconciliation). Run the following command:

    kubectl describe cp
    

    After the upgrade is finished, the output should be similar to the following. The Status: Succeeded line shows that the deployment has been upgraded. In case of any errors, consult the Kubernetes logs of the operator (installed by Helm) for further information.

    ...
    Status:
      Components:
        Cert Manager:
          Status:  Available
        Cluster Registry:
          Status:  Available
        Mesh Manager:
          Status:  Available
        Node Exporter:
          Status:  Available
        Registry Access:
          Status:  Available
        Smm:
          Status:  Available
      Status:      Succeeded
    

Streaming Data Manager upgrade

  1. Upgrade to the new version ( version 1.9.0) of the supertubes-control-plane chart.

    helm upgrade \
      --namespace supertubes-control-plane \
      --set imagePullSecrets\[0\].name=smm-pull-secret \
      --set operator.image.repository="registry.eticloud.io/sdm/supertubes-control-plane" \
      supertubes-control-plane \
      oci://registry.eticloud.io/sdm-charts/supertubes-control-plane --version 1.9.0
    
  2. After the operator has been started, monitor the status of the applicationmanifest resource until it finishes the upgrade (reconciliation). Run the following command:

    kubectl describe applicationmanifests.supertubes.banzaicloud.io -n supertubes-control-plane sdm-applicationmanifest
    

    The output should be similar to:

    ...
    Status:
      Components:
        Cluster Registry:
          Status:  Removed
        Csr Operator:
          Status:  Available
        Imps Operator:
          Image Pull Secret Status:  Unmanaged
          Status:                    Removed
        Istio Operator:
          Status:  Removed
        Kafka Operator:
          Status:  Available
        Monitoring:
          Status:  Available
        Supertubes:
          Status:  Available
        Zookeeper Operator:
          Status:  Available
      Status:      Succeeded
    
  3. If the following error shows up in the ApplicationManifest under the Message field:

    resource type is not allowed to be recreated: Job.batch "zookeeper-operator-post-install-upgrade" is invalid...
    

    Delete the zookeeper-operator-post-install-upgrade job so it is recreated when ZooKeeper is reconciled:

    kubectl delete job -n zookeeper zookeeper-operator-post-install-upgrade
    

Restarting workloads

After the upgrade has completed, the Pods running in applications' namespaces are still running the old version of Istio proxy sidecar.

  1. To obtain the latest security patches, restart these Controllers (Deployments, StatefulSets, and so on) either using the kubectl rollout command, or by instructing the CI/CD systems enabled on the cluster. For example, to restart the deployments in a namespace, you can run:

    kubectl rollout restart deployment --namespace <name-of-your-namespace>
    
  2. If the upgrade also involved a minor or major version upgrade of Istio, the kubectl rollout command will only ensure that the latest patch level is being used on the Pods.

    For example: Service Mesh Manager 1.8.2 comes with Istio 1.11, while Service Mesh Manager 1.9.0 is bundled with Istio 1.12. By upgrading from Service Mesh Manager 1.8.2 to 1.9.0, and then restarting the Controllers will only result in the latest 1.11 Istio sidecar proxy to be started in the Pods.

    To upgrade to the new minor/major version of Istio on your workloads, complete the Upgrading your business applications procedure.

2.4.5 - Upgrade SMM - GitOps - single cluster

This document describes how to upgrade Calisti and a business application.

CAUTION:

Do not push the secrets directly into the git repository, especially when it is a public repository. Argo CD provides solutions to keep secrets safe.

Prerequisites

To complete this procedure, you need:

  • A free registration for the Calisti download page
  • A Kubernetes cluster running Argo CD (called management-cluster in the examples).
  • A Kubernetes cluster running the previous version of Calisti (called workload-cluster-1 in the examples). It is assumed that Calisti has been installed on this cluster as described in the Calisti 1.11.0 documentation, and that the cluster meets the resource requirements of Calisti version 1.12.0.

CAUTION:

Supported providers and Kubernetes versions

The cluster must run a Kubernetes version that Service Mesh Manager supports: Kubernetes 1.21, 1.22, 1.23, 1.24.

Service Mesh Manager is tested and known to work on the following Kubernetes providers:

  • Amazon Elastic Kubernetes Service (Amazon EKS)
  • Google Kubernetes Engine (GKE)
  • Azure Kubernetes Service (AKS)
  • Red Hat OpenShift 4.11
  • On-premises installation of stock Kubernetes with load balancer support (and optionally PVCs for persistence)

Calisti resource requirements

Make sure that your Kubernetes or OpenShift cluster has sufficient resources to install Calisti. The following table shows the number of resources needed on the cluster:

Resource Required
CPU - 32 vCPU in total
- 4 vCPU available for allocation per worker node (If you are testing on a cluster at a cloud provider, use nodes that have at least 4 CPUs, for example, c5.xlarge on AWS.)
Memory - 64 GiB in total
- 4 GiB available for allocation per worker node for the Kubernetes cluster (8 GiB in case of the OpenShift cluster)
Storage 12 GB of ephemeral storage on the Kubernetes worker nodes (for Traces and Metrics)

These minimum requirements need to be available for allocation within your cluster, in addition to the requirements of any other loads running in your cluster (for example, DaemonSets and Kubernetes node-agents). If Kubernetes cannot allocate sufficient resources to Service Mesh Manager, some pods will remain in Pending state, and Service Mesh Manager will not function properly.

Enabling additional features, such as High Availability increases this value.

The default installation, when enough headroom is available in the cluster, should be able to support at least 150 running Pods with the same amount of Services. For setting up Service Mesh Manager for bigger workloads, see scaling Service Mesh Manager.

This document describes how to upgrade Service Mesh Manager version 1.11.0 to Service Mesh Manager version 1.12.0.

Set up the environment

  1. Set the KUBECONFIG location and context name for the management-cluster cluster.

    MANAGEMENT_CLUSTER_KUBECONFIG=management_cluster_kubeconfig.yaml
    MANAGEMENT_CLUSTER_CONTEXT=management-cluster
    kubectl config --kubeconfig "${MANAGEMENT_CLUSTER_KUBECONFIG}" get-contexts "${MANAGEMENT_CLUSTER_CONTEXT}"
    

    Expected output:

    CURRENT   NAME                 CLUSTER              AUTHINFO   NAMESPACE
    *         management-cluster   management-cluster
    
  2. Set the KUBECONFIG location and context name for the workload-cluster-1 cluster.

    WORKLOAD_CLUSTER_1_KUBECONFIG=workload_cluster_1_kubeconfig.yaml
    WORKLOAD_CLUSTER_1_CONTEXT=workload-cluster-1
    kubectl config --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" get-contexts "${WORKLOAD_CLUSTER_1_CONTEXT}"
    

    Expected output:

    CURRENT   NAME                 CLUSTER              AUTHINFO                                          NAMESPACE
    *         workload-cluster-1   workload-cluster-1
    

    Repeat this step for any additional workload clusters you want to use.

  3. Add the cluster configurations to KUBECONFIG. Include any additional workload clusters you want to use.

    KUBECONFIG=$KUBECONFIG:$MANAGEMENT_CLUSTER_KUBECONFIG:$WORKLOAD_CLUSTER_1_KUBECONFIG
    
  4. Make sure the management-cluster Kubernetes context is the current context.

    kubectl config use-context "${MANAGEMENT_CLUSTER_CONTEXT}"
    

    Expected output:

    Switched to context "management-cluster".
    

Upgrade Service Mesh Manager

The high-level steps of the upgrade process are:

  • Upgrade the smm-operator.
  • If you are also running Streaming Data Manager on your Calisti cluster and have installed it using the GitOps guide, upgrade the sdm-operator chart.
  • Upgrade the business applications (demo-app) to use the new control plane.

Upgrade the smm-operator

  1. Clone your calisti-gitops repository.

  2. Remove the old version (1.11.0) of the smm-operator Helm chart.

    rm -rf charts/smm-operator
    
  3. Pull the new version (1.12.0) of the smm-operator Helm chart and extract it into the charts folder.

    helm pull oci://registry.eticloud.io/smm-charts/smm-operator --destination ./charts/ --untar --version 1.12.0
    
  4. Commit and push the changes to the Git repository.

    git add .
    
    git commit -m "upgrade smm to 1.12.0"
    
    git push
    
  5. Wait a few minutes until the upgrade is completed.

  6. Open the Service Mesh Manager dashboard.

    On the Dashboard Overview page everything should be fine, except for some validation issues. The validation issues show that the business application (demo-app) is behind the smm-controlplane and the business application should be updated.

    Service Mesh Manager Overview Service Mesh Manager Overview

    Service Mesh Manager Overview Service Mesh Manager Overview

Upgrade the sdm-operator

If you are also running Streaming Data Manager on your Calisti cluster and have installed it using the GitOps guide, upgrade the sdm-operator chart. Otherwise, skip this section and upgrade your business applications.

  1. Check your username and password on the download page.

  2. Remove the previously installed sdm-operator chart from the GitOps repository with the following command:

    rm -rf charts/supertubes-control-plane
    
  3. Download the sdm-operator chart from registry.eticloud.io into the charts directory of your Streaming Data Manager GitOps repository and extract it. Run the following commands:

    export HELM_EXPERIMENTAL_OCI=1 # Needed prior to Helm version 3.8.0
    
    echo "${CALISTI_PASSWORD}" | helm registry login registry.eticloud.io -u "${CALISTI_USERNAME}" --password-stdin
    

    Expected output:

    Login Succeeded
    
     helm pull oci://registry.eticloud.io/sdm-charts/supertubes-control-plane --destination ./charts/ --untar --version 1.12.0
    

    Expected output:

    Pulled: registry.eticloud.io/sdm-charts/supertubes-control-plane:1.12.0
    Digest: sha256:someshadigest
    
  4. Modify the sdm-operator Application CR by editing the apps/sdm-operator/sdm-operator-app.yaml file from the GitOps repository.

    Note: This is needed because of a change in the sdm-operator helm chart. The imagePullSecret helm value needs to be extended with a name key tag as in the following example.

    spec:
    ...
      source:
      ...
        helm:
          helm:
            values: |
              imagePullSecrets:
                - name: "smm-registry.eticloud.io-pull-secret"
    ...
    
  5. Commit the changes and push the repository.

    git add .
    git commit -m "Update sdm-operator"
    git push origin
    
  6. Apply the modified Application CR.

    kubectl apply -f "apps/sdm-operator/sdm-operator-app.yaml"
    

    Expected output:

    application.argoproj.io/sdm-operator configured
    
  7. Follow the progress of the upgrade by checking the status of the ApplicationManifest. You can do this on the Argo CD UI, or with the following command:

    kubectl describe applicationmanifests.supertubes.banzaicloud.io -n smm-registry-access applicationmanifest
    

    Expected output when the upgrade is finished:

    Status:
      Cluster ID:  ...
      Components:
        Cluster Registry:
          Image:   ghcr.io/cisco-open/cluster-registry-controller:...
          Status:  Removed
        Csr Operator:
          Image:   registry.eticloud.io/csro/csr-operator:...
          Status:  Available
        Imps Operator:
          Image Pull Secret Status:  Unmanaged
          Status:                    Removed
        Istio Operator:
          Status:  Removed
        Kafka Operator:
          Image:   ghcr.io/banzaicloud/kafka-operator:...
          Status:  Available
        Monitoring:
          Status:  Available
        Supertubes:
          Image:   registry.eticloud.io/sdm/supertubes:...
          Status:  Available
        Zookeeper Operator:
          Image:   pravega/zookeeper-operator:...
          Status:  Available
      Status:      Succeeded
    
  8. If the following error shows up in the ApplicationManifest under the Message field:

    resource type is not allowed to be recreated: Job.batch "zookeeper-operator-post-install-upgrade" is invalid...
    

    Delete the zookeeper-operator-post-install-upgrade job so it is recreated when ZooKeeper is reconciled:

    kubectl delete job -n zookeeper zookeeper-operator-post-install-upgrade
    

Upgrade Demo application

  1. Update the demo-app.

    kubectl –context “${WORKLOAD_CLUSTER_1_CONTEXT}” -n smm-demo rollout restart deploy

  2. Check the dashboard.

    Service Mesh Manager Overview Service Mesh Manager Overview

2.4.6 - Upgrade SMM - GitOps - multi-cluster

This document describes how to upgrade Calisti and the business applications.

CAUTION:

Do not push the secrets directly into the git repository, especially when it is a public repository. Argo CD provides solutions to keep secrets safe.

Prerequisites

To complete this procedure, you need:

  • A free registration for the Calisti download page
  • A Kubernetes cluster running Argo CD (called management-cluster in the examples).
  • Two Kubernetes clusters running the previous version of Calisti (called workload-cluster-1 and workload-cluster-2 in the examples). It is assumed that Calisti has been installed on these clusters as described in the Calisti 1.11.0 documentation, and that the clusters meet the resource requirements of Calisti version 1.12.0.
  • If the primary Calisti cluster is running Streaming Data Manager, it is assumed that you have installed it using the GitOps guide.

CAUTION:

Supported providers and Kubernetes versions

The cluster must run a Kubernetes version that Service Mesh Manager supports: Kubernetes 1.21, 1.22, 1.23, 1.24.

Service Mesh Manager is tested and known to work on the following Kubernetes providers:

  • Amazon Elastic Kubernetes Service (Amazon EKS)
  • Google Kubernetes Engine (GKE)
  • Azure Kubernetes Service (AKS)
  • Red Hat OpenShift 4.11
  • On-premises installation of stock Kubernetes with load balancer support (and optionally PVCs for persistence)

Calisti resource requirements

Make sure that your Kubernetes or OpenShift cluster has sufficient resources to install Calisti. The following table shows the number of resources needed on the cluster:

Resource Required
CPU - 32 vCPU in total
- 4 vCPU available for allocation per worker node (If you are testing on a cluster at a cloud provider, use nodes that have at least 4 CPUs, for example, c5.xlarge on AWS.)
Memory - 64 GiB in total
- 4 GiB available for allocation per worker node for the Kubernetes cluster (8 GiB in case of the OpenShift cluster)
Storage 12 GB of ephemeral storage on the Kubernetes worker nodes (for Traces and Metrics)

These minimum requirements need to be available for allocation within your cluster, in addition to the requirements of any other loads running in your cluster (for example, DaemonSets and Kubernetes node-agents). If Kubernetes cannot allocate sufficient resources to Service Mesh Manager, some pods will remain in Pending state, and Service Mesh Manager will not function properly.

Enabling additional features, such as High Availability increases this value.

The default installation, when enough headroom is available in the cluster, should be able to support at least 150 running Pods with the same amount of Services. For setting up Service Mesh Manager for bigger workloads, see scaling Service Mesh Manager.

This document describes how to upgrade Calisti version 1.11.0 to Calisti version 1.12.0.

Set up the environment

  1. Set the KUBECONFIG location and context name for the management-cluster cluster.

    MANAGEMENT_CLUSTER_KUBECONFIG=management_cluster_kubeconfig.yaml
    MANAGEMENT_CLUSTER_CONTEXT=management-cluster
    kubectl config --kubeconfig "${MANAGEMENT_CLUSTER_KUBECONFIG}" get-contexts "${MANAGEMENT_CLUSTER_CONTEXT}"
    

    Expected output:

    CURRENT   NAME                 CLUSTER              AUTHINFO   NAMESPACE
    *         management-cluster   management-cluster
    
  2. Set the KUBECONFIG location and context name for the workload-cluster-1 cluster.

    WORKLOAD_CLUSTER_1_KUBECONFIG=workload_cluster_1_kubeconfig.yaml
    WORKLOAD_CLUSTER_1_CONTEXT=workload-cluster-1
    kubectl config --kubeconfig "${WORKLOAD_CLUSTER_1_KUBECONFIG}" get-contexts "${WORKLOAD_CLUSTER_1_CONTEXT}"
    

    Expected output:

    CURRENT   NAME                 CLUSTER              AUTHINFO                                          NAMESPACE
    *         workload-cluster-1   workload-cluster-1
    

    Repeat this step for any additional workload clusters you want to use.

  3. Add the cluster configurations to KUBECONFIG. Include any additional workload clusters you want to use.

    KUBECONFIG=$KUBECONFIG:$MANAGEMENT_CLUSTER_KUBECONFIG:$WORKLOAD_CLUSTER_1_KUBECONFIG
    
  4. Make sure the management-cluster Kubernetes context is the current context.

    kubectl config use-context "${MANAGEMENT_CLUSTER_CONTEXT}"
    

    Expected output:

    Switched to context "management-cluster".
    

Upgrade Calisti

The high-level steps of the upgrade process are:

  • Upgrade the Calisti operator.
  • Note that both 1.11.0 and 1.12.0 Calisti are using istio-cp-v115x on the workload clusters.
  • Upgrade the business applications (the demo applications) to use the new istio-cp-v115x istio control plane.
  1. Remove the old version (1.11.0) of the smm-operator Helm chart.

    rm -rf charts/smm-operator
    
  2. Pull the new version (1.12.0) of the smm-operator Helm chart and extract it into the charts folder.

    helm pull oci://registry.eticloud.io/smm-charts/smm-operator --destination ./charts/ --untar --version 1.12.0
    
  3. Commit and push the changes to the Git repository.

    git add .
    
    git commit -m "upgrade Calisti to 1.12.0"
    
    git push
    
  4. Open the Calisti dashboard.

    On the MENU > OVERVIEW page everything should be fine

    Calisti Overview Calisti Overview

    As you can see on the MENU > MESH page, the Calisti control plane is using the cp-v115x.istio-system istio control plane.

    Calisti Mesh Calisti Mesh

Upgrade Streaming Data Manager

If the primary Calisti cluster is running Streaming Data Manager, and you have installed it using the GitOps guide, upgrade the sdm-operator as described in Upgrade the sdm-operator.

Restart Demo applications

  1. If you have existing demo applications pods on both clusters, after upgrading Calisti, you will see validation errors on the MENU > OVERVIEW page. This is due to the mismatch between the previously cached istio sidecar injector config and the new config which is created after the upgrade.

    Calisti Overview Calisti Overview

    To solve this, you could run the following command on both the workload clusters to restart the demo application:

    kubectl -n smm-demo rollout restart deploy
    
  2. Check the MENU > TOPOLOGY page.

    Calisti Topology Calisti Topology

2.4.7 - Upgrading your business applications

Overview

When a Calisti upgrade includes a new minor or major Istio release, Calisti runs both versions of the Istio control plane on the upgraded cluster, so you can gradually migrate your workloads to the new Istio version.

To list the available control planes, run:

kubectl get istiocontrolplanes -n istio-system

The output should be similar to:

NAME       MODE     NETWORK    STATUS      MESH EXPANSION   EXPANSION GW IPS                   ERROR   AGE
cp-v113x   ACTIVE   network1   Available   true             ["3.122.28.53","3.122.43.249"]             87m
cp-v115x   ACTIVE   network1   Available   true             ["3.122.31.252","18.195.79.209"]           66m

Here cp-v113x is running Istio 1.13.x, while cp-v115x is running Istio 1.15.3.

A special label on the namespaces specifies which Istio control plane the proxies use in that namespace. In the following example the smm-demo namespace is attached to the cp-v113x.istio-system control plane (where the .istio-system is the name of the namespace of the Istio control plane).

kubctl get ns smm-demo -o yaml

The output should be similar to:

apiVersion: v1
kind: Namespace
metadata:
  ...
  labels:
    istio.io/rev: cp-v113x.istio-system
  name: smm-demo
spec:
  finalizers:
  - kubernetes
status:
  phase: Active

Both cp-v113x and cp-v115x are able to discover services in all namespaces. This means that:

  • Workloads can communicate with each other regardless which Istio control plane they are attached to.
  • In case of an error, any namespace can be rolled back to use the previous version of the Istio control plane by simply changing the annotation

Migrate workload to a new Istio control plane

After upgrading Calisti to a new version that includes a new minor or major Istio version, you have to modify your workloads to use the new Istio control plane. Complete the following steps.

  1. Before starting the migration of the workloads to the new Istio control plane, check the Validation UI and fix any errors with your configuration.

  2. Find the name of the new Istio control plane by running the following command:

    kubectl get istiocontrolplanes -n istio-system
    

    The output should be similar to:

    NAME       MODE     NETWORK    STATUS      MESH EXPANSION   EXPANSION GW IPS                   ERROR   AGE
    cp-v113x   ACTIVE   network1   Available   true             ["3.122.28.53","3.122.43.249"]             87m
    cp-v115x   ACTIVE   network1   Available   true             ["3.122.31.252","18.195.79.209"]           66m
    

    In this case the new Istio Control Plane is called cp-v115x which is running Istio 1.15.3.

  3. Migrate a namespace to the new Istio control plane. Complete the following steps.

    1. Select a namespace, preferably one with the least impact on production traffic. Edit the istio.io/rev label on the namespace by running:

      kubectl label ns <your-namespace> istio.io/rev=cp-v115x.istio-system --overwrite
      

      Expected output:

      namespace/<your-namespace> labeled
      
    2. Restart all Controllers (Deployments, StatefulSets, and so on) in the namespace. After the restart, the workloads in the namespace are attached to the new Istio control plane. For example, to restart the deployments in a namespace, you can run:

      kubectl rollout restart deployment -n <name-of-your-namespace>
      
    3. Test your application to verify that it works with the new control plane as expected. In case of any issues, refer to the rollback section to roll back to the original Istio control plane.

  4. Migrate your other namespaces.

  5. After all of the applications has been migrated to the new control plane and you have verified that the applications work as expected, you can delete the old Istio control plane.

Roll back the data plane to the old control plane in case of issues

CAUTION:

Perform this step only if you have issues with your data plane pods, which were working with the old Istio control plane, and you deliberately want to move your workloads back to that control plane!
  1. If there is a problem and you want to roll the namespace back to the old control plane, set the istio.io/rev label on the namespace to point to the old Istio control plane, and restart the pod using the kubectl rollout restart deployment command:

    kubectl label ns <name-of-your-namespace-with-issues> istio.io/rev=cp-v113x.istio-system
    kubectl rollout restart deployment -n <name-of-your-namespace-with-issues>
    

2.4.8 - Delete old Istio control plane

After upgrading Calisti to a new version that includes a new minor or major Istio version, you can delete the old Istio control plane if it’s not needed anymore.

Prerequisites

Before deleting the old Istio control plane, make sure that:

  • The new control plane is working as expected with your applications.

Steps

  1. Open the Service Mesh Manager dashboard and navigate to the MAIN MENU > MESH page.

  2. Verify that you have migrated every application to use the new control plane. This means that no Pods are attached to the old Istio control plane (the number of proxies for the old control plane should be 0).

  3. Delete the old Istio control plane:

    kubectl delete istiocontrolplanes -n istio-system cp-v113x
    

Note: Deleting the prometheus-smm-prometheus-x pod erases historic timeline data. To persist timeline data for Prometheus rollout, see Set up Persistent Volumes for Prometheus.

2.5 - Dashboard

Service Mesh Manager provides a dashboard interface that can be used to diagnose any issues with the underlying deployment. This section provides an introduction to the list of available features on this user interface.

Accessing the dashboard

To access the dashboard, set your KUBECONFIG file to the cluster where the Service Mesh Manager control plane is running, then run the following command to open a browser window to the dashboard.

smm dashboard --kubeconfig <path/to/kubeconfig>

In case you are executing this command on a remote machine, complete the following additional steps.

  1. Check the output of the command and forward the indicated port to your local machine.

  2. Open a browser window and navigate to http://127.0.0.1:50500/.

  3. Service Mesh Manager asks for login credentials. To acquire the credentials, follow the instructions on the user interface.

    Alternatively, you can complete the following steps.

    1. Set your KUBECONFIG file to the cluster where the Service Mesh Manager control plane is running, then run the following command.

      smm login --kubeconfig <path/to/kubeconfig>
      

      A temporary login token is displayed. Now you can perform other actions from the command line.

2.5.1 - Dashboard overview

The MENU > OVERVIEW page on the Service Mesh Manager web interface shows information about your the traffic in your mesh and the health of your services and workloads.

overview overview

If your application hasn’t received any traffic yet, there will be no metrics in the system so you won’t see any visualization yet. To send some traffic to your services as a test, see Generate test load.

The page shows the following information and controls:

Metrics

  • Requests per second
  • Average latency (95th percentile)
  • Error rate (for 5XX errors). Client-side errors (with 4XX status code) are not included.
  • Clusters (number of clusters in the mesh)
  • Services (number of services in the mesh / total number of services)
  • Workloads (number of workloads in the mesh / total number of workloads)
  • Pods (number of pods in the mesh / total number of pods)
  • Validation issues

Dashboards

The OVERVIEW page shows charts about the health of the services and workloads, as well as the aggregated status of your service level objectives (SLOs). Click on the chart to get more information, for example, about the SLO burn rates that display warnings.

The OVERVIEW page also shows the following live Grafana dashboards:

  • Requests per second
  • Average latency (95th percentile)
  • Error rate (for 5XX errors). Client-side errors (with 4XX status code) are not included.

Validations

To check the validation status of your YAML configuration files, select OVERVIEW > VALIDATION ISSUES. For details, see Validation.

2.5.2 - Mesh

The MENU > MESH page on the Service Mesh Manager web interface shows information about your service mesh and the control planes.

Mesh overview Mesh overview

The page shows the following real-time information:

The mesh in numbers

  • CONTROL PLANES: The number of Istio control planes in the mesh.
  • CLUSTERS: The number of clusters in the mesh.
  • ISTIO PROXIES MEMORY USAGE: Current memory usage of the Istio proxies (sidecars).
  • ISTIO PROXIES CPU USAGE: Current CPU usage of the Istio proxies (sidecars).
  • ISTIO PROXIES NOT RUNNING: The Istio proxies (sidecars) that are not running for some reason.

Clusters

Displays basic status information about the clusters in the mesh.

Clusters Clusters

This is mostly useful in the multi-cluster setup when multiple clusters are in the mesh.

Control planes

This section displays information and metrics about the Istio control planes in the mesh, including version and revision information, and validation errors.

Istio control planes in the mesh Istio control planes in the mesh

Click on a specific control plane to display information about:

In addition, selecting a control plane also shows the following basic information:

  • CLUSTER NAME: The name of the cluster the control plane is running on.
  • VERSION: The Istio version of the service mesh.
  • ROOT NAMESPACE: The administrative root namespace for Istio configuration of the service mesh.
  • TRUST DOMAIN: The list of trust domains.
  • AUTOMATIC MTLS: Shows whether automatic mutual TLS is enabled for the service mesh.
  • OUTBOUND TRAFFIC POLICY: The default outbound traffic policy for accessing external services set for the mesh. For details, see External Services.
  • PROXIES: The number of sidecar proxies in the mesh.
  • CONFIG: Click the Show YAML configuration icon to display the configuration of the control plane.

Pods

Shows information and status about the pods of the control plane.

Control plane pods Control plane pods

Proxies

Lists the proxies managed by the control plane, and the synchronization status of the cluster discovery service (CDS), listener discovery service (LDS), endpoint discovery service(EDS), and route discovery service (RDS) for the proxy.

Control plane proxies Control plane proxies

Trust bundles

Shows the trust bundles defined for the control plane.

Validation issues

Lists the validation issues for the entire control plane.

Control plane validation Control plane validation

Metrics

The timeline charts show the version and revision of the Istio proxies used in the mesh, as well as error metrics from the Istio Pilot agent, for example, rejected CDS and EDS configurations. (Istio Pilot agent runs in the sidecar or gateway container and bootstraps Envoy.)

To display more detailed metrics about the resource usage of Istiod and the proxies, click on a control plane in the Control planes section.

Control plane metrics Control plane metrics

2.5.2.1 - Validation

The Service Mesh Manager product:

  • simplifies service mesh configuration and management,
  • guides you through setting up complex traffic routing rules
  • takes care of creating, merging and validating the YAML configuration.

And unlike some other similar products, it’s working in both directions: you can edit the YAML files manually, and you can still view and manipulate the configuration from Service Mesh Manager. That’s possible because there’s no intermediate configuration layer in Service Mesh Manager.

To support the bi-directional mesh configuration, Service Mesh Manager provides a validation subsystem for the entire mesh. Istio itself provides some syntactic and semantic validation for the individual Istio resources, but Service Mesh Manager goes even further. Service Mesh Manager performs complex validations which take the whole cluster state and related resources into account to check whether everything is configured correctly within the whole mesh.

Service Mesh Manager performs a lot of syntactical and semantical validation checks for various aspects of the configuration. The validation checks are constantly curated and new checks added with every release. For example:

  • Sidecar injection template validation: Validates whether there are any pods in the environment that run with outdated sidecar proxy image or configuration.
  • Gateway port protocol configuration conflict validation: Detects conflicting port configuration in different Gateway resources.
  • Multiple gateways with the same TLS certificate validation: Configuring multiple gateways to use the same TLS certificate causes most browsers to produce 404 errors when accessing a second host after a connection to another host has already been established.

Check validation results on the Service Mesh Manager UI

The validations are constantly running in the background. To display the actual results, navigate to OVERVIEW > VALIDATION ISSUES. You can use the NAMESPACES field to select the namespaces you want to observe.

Show validation results Show validation results

To display the invalid part of the configuration in the invalid resource, click the Show YAML configuration icon.

Show validation details Show validation details

To display every validation error of a control plane as a list, navigate to MENU > MESH, and click on the control plane in the Control planes section, then select VALIDATIONS. For details, see Validation issues.

Check validation results from the CLI

To check the results of the validation from the CLI, run the smm analyze command. To show only results affecting a specific namespace, use the –namespace option, for example: smm analyze --namespace smm-demo, or smm analyze --namespace istio-system

The smm analyze command can also produce JSON output, for example:

smm analyze --namespace istio-system -o json

Example output:

{
  "gateway.networking.istio.io:master:istio-system:demo-gw-demo1": [
    {
      "checkID": "gateway/reused-cert",
      "istioRevision": "cp-v115x.istio-system",
      "subjectContextKey": "gateway.networking.istio.io:master:istio-system:demo-gw-demo1",
      "passed": false,
      "error": {},
      "errorMessage": "multiple gateways configured with same TLS certificate"
    }
  ],
  "gateway.networking.istio.io:master:istio-system:demo-gw-demo2": [
    {
      "checkID": "gateway/reused-cert",
      "istioRevision": "cp-v115x.istio-system",
      "subjectContextKey": "gateway.networking.istio.io:master:istio-system:demo-gw-demo2",
      "passed": false,
      "error": {},
      "errorMessage": "multiple gateways configured with same TLS certificate"
    }
  ]
}

2.5.3 - Topology view

The MENU > TOPOLOGY page of the Service Mesh Manager web interface displays the topology of services and workloads inside the mesh, and annotates it with real-time and information about latency, throughput, or HTTP request failures. You can also display historical data by adjusting the timeline.

The topology view is almost entirely based on metrics: metrics received from Prometheus and enhanced with information from Kubernetes.

The topology page serves as a starting point of diagnosing problems within the mesh. Service Mesh Manager is integrated with Grafana and Jaeger for easy access to in-depth monitoring and distributed traces) of various services.

Topology view of the demo cluster Topology view of the demo cluster

The nodes in the graph are services or workloads, while the arrows represent network connections between different services. This is based on Istio metrics retrieved from Prometheus.

For certain services like MySQL and PostgreSQL, protocol-specific metrics normally not available in Istio are also shown, for example, sessions or transactions per second.

Note: Protocol-specific metrics for MySQL and PostgreSQL are available only for certain versions of MySQL and PostgreSQL. For details, see the documentation of the MySQL and the PostgreSQL Envoy filters.

For example, the following image shows SQL sessions and transactions per second.

Protocol-specific data Protocol-specific data

The graph serves as a visual monitoring tool, as it displays various errors and metrics in the system. Click the ? icon on the left to show a legend of the graph to better understand what the icons mean in the graph. Virtual machines integrated into the mesh are displayed as workloads with the Virtual machine workload icon in their corner.

Topology view legend Topology view legend

If your application hasn’t received any traffic yet, there will be no metrics in the system so you won’t see any visualization yet. To send some traffic to your services as a test, see Generate test load.

Select the data displayed

You can select and configure what is displayed on the graph in the top bar of the screen. You can also display historical data using the TIMELINE.

Configure the topology view Configure the topology view

Namespaces

Display data only for the selected namespaces.

Resources

You can select the type of resources you want to display in the graph. The following resources can be displayed in a cluster: clusters, namespaces, apps, services, and workloads.

Workloads are always shown, they cannot be disabled.

Here’s an example when only apps, services and workloads are shown:

Show selected resources of the mesh Show selected resources of the mesh

Showing clusters is important in multi-cloud and hybrid-cloud environments. For details, see Multi-cluster.

Edge labels

The labels on the edges of the graph can display various real-time information about the traffic between services. You can display the following information:

  • the protocol used in the communication (HTTP, gRPC, TCP) and the request rate (or throughput for TCP connections),
  • actual P95 latency, or
  • whether the connection is using mTLS or not.

For certain services like MySQL and PostgreSQL, protocol-specific metrics normally not available in Istio are also shown, for example, sessions or transactions per second.

Note: Protocol-specific metrics for MySQL and PostgreSQL are available only for certain versions of MySQL and PostgreSQL. For details, see the documentation of the MySQL and the PostgreSQL Envoy filters.

Show information on the graph edges Show information on the graph edges

Timeline

By default, the graph displays the current data. The timeline view allows you to select a specific point in time, and then move minutes back and forth to see how your most important metrics have changed. For example, you can use it to check how things changed for a specific service, when did the error rates go up, or how your latency values have changed over time when RPS values increased. This can be a good indicator to know where to look for errors in the logs, or to notice if something else has changed in the cluster that can be related to a specific failure.

  • To display the timeline, select TIMELINE on the left, then use the timeline bar to adjust the date and the time. The date corresponding to the displayed data is shown below the topology graph.
  • To return to the current real-time data, select LIVE.

Show historical topology and metrics Show historical topology and metrics

Drill-down to the pods and nodes

You can drill-down from the MENU > TOPOLOGY page by selecting a service or a workload in the Istio service mesh. You can trace back an issue from the top-level service mesh layer by navigating deeper in the stack, and see the status and most important metrics of your Kubernetes controllers, pods, and nodes.

See Drill-down for details.

2.5.3.1 - Drill-down

You can drill-down from the MENU > TOPOLOGY page by selecting a service or a workload in the Istio service mesh. You can trace back an issue from the top-level service mesh layer by navigating deeper in the stack, and see the status and most important metrics of your Kubernetes controllers, pods, and nodes.

For an example on how you can diagnose and solve a real-life problem using the drill-down view, see the Service Mesh Manager drill-down blog post.

Drill-down from the Topology view

The highest level of the Topology view is the service mesh layer. This level contains the most important network-level metrics, and an overview of the corresponding Kubernetes controllers.

Click on a workload ( Workload Workload ) or service ( Service Service ) to display its details on the right. Workloads running on virtual machines have a blue icon in the corner (Workload on a virtual machine ).

From the details overview, you can drill down through the following levels to the underlying resources of the infrastructure:

Details of a workload Details of a workload

To display the metrics-based health of the workload or the service, select the HEALTH tab. You can scroll down to display the charts of the selected metric (for example, saturation, latency, or success rate). Note that for workloads running on virtual machines, the total saturation of the virtual machine is shown. Application health metrics Application health metrics

Service overview

Details of a service Details of a service

The following details of the service are displayed:

  • Namespace: The namespace the service belongs to.

  • APP: The application exposed using the service.

  • PORTS: The ports where the service is accessible, for example:

    http         8080/TCP → 8080
    grpc         8082/TCP → 8082
    tcp          8083/TCP → 8083
    
  • Services: The services exposed in this resource. Click on the name of the service to display the details of the service.

  • Metrics: Dashboards of the most important metrics. Click Open metrics in Grafana to open the related dashboards in Grafana.

  • Traces: Click Open Jaeger tracing to run tracing with Jaeger.

CAUTION:

If you have installed Service Mesh Manager in Anonymous mode, you won’t be able to access the Metrics and Traces dashboards from the UI. Clicking the Open metrics in Grafana or Open Jaeger tracing icon in anonymous mode causes the RBAC: access denied error message.

Details of the workload

Details of a workload Details of a workload

The following details of the workload are displayed:

  • Namespace: The namespace the workload belongs to.
  • APP: The application running in the workload.
  • VERSION: The version number of the workload, for example, v2.
  • REPLICAS: The number of replicas for the workload.
  • LABELS: The list of Kubernetes labels assigned to the resource.
  • Controllers: The controllers related to the workload. Click on a controller to display its details.
  • Metrics: Dashboards of the most important metrics. Click Open metrics in Grafana to open the related dashboards in Grafana.

Service details

Select a Service in the SERVICE section of a service overview to display the details of the service.

Details of a service Details of a service

The following details of the service are displayed:

  • Namespace: The namespace the service belongs to.

  • CLUSTER: The name of the Kubernetes cluster the service belongs to.

  • SELECTOR: The label selector used to select the set of Pods targeted by the Service.

  • PORTS: The ports where the service is accessible, for example:

    http         8080/TCP → 8080
    grpc         8082/TCP → 8082
    tcp          8083/TCP → 8083
    
  • TYPE: The ServiceType indicates how your service is exposed, for example, ClusterIP or LoadBalancer.

  • CLUSTER IP: The IP address corresponding to the ServiceType.

  • CREATED: The date when the service was started.

  • LABELS: The list of Kubernetes labels assigned to the resource.

  • Pods: The list of pods running this service, including their name, number of containers in the pod, and their status. Click on the name of the pod to display the details of the pod.

  • Events: Recent events related to the service resource.

Workload controller details

Select a deployment in the CONTROLLER section of a workload to display the details of the deployment.This view contains detailed information about the Kubernetes controller.

While the service mesh layer displays network level metrics and an aggregated view of the corresponding controllers, this view focuses on CPU and memory metrics, and the Kubernetes resources, like related pods or events. It’s also possible that multiple controllers belong to the same service mesh entity, for example, in a shared control plane multi-cluster scenario, when multiple clusters are running controllers that belong to the same logical workload.

Details of a workload controller Details of a workload controller

The following details of the workload controller are displayed:

  • Namespace: The namespace the workload belongs to.
  • CLUSTER: The name of the Kubernetes cluster the workload belongs to.
  • Kind: The type of the controller, for example, Deployment. If the workload is running on a virtual machine, the kind of the controller is WorkloadGroup.
  • APP: The application running in the workload.
  • VERSION: The version number of the workload, for example, v2.
  • REPLICAS: The number of replicas for the workload.
  • CREATED: The date when the workload was started.
  • LABELS: The list of Kubernetes labels assigned to the resource.
  • Pods: The list of pods running this workload. Click on the name of the pod to display the details of the pod.
  • WorkloadEntries: The list of virtual machines running this workload. Click on the name of the WorkloadEntry to display the details of the WorkloadEntry.
  • Events: Recent events related to the resource.

Details of the pod

To check the details of a pod, select a pod in the CONTROLLER > POD or the SERVICE > POD section.

Details of a pod

The following details of the pod are displayed:

  • Namespace: The namespace the pod belongs to.
  • CLUSTER: The name of the Kubernetes cluster the pod belongs to.
  • NODE: The hostname of the node the pod is running on, for example, ip-192-168-1-1.us-east-2.compute.internal. Click on the name of the node to display the details of the node.
  • IP: The IP address of the pod.
  • STARTED: The date when the pod was started.
  • LABELS: The list of Kubernetes labels assigned to the resource.
  • Containers: The list of containers in the pod. Also includes the Name, Image, and Status of the container.
  • Events: Recent events related to the resource.
  • Metrics: Dashboards of the most important metrics. Click Open metrics in Grafana to open the related dashboards in Grafana.

    CAUTION:

    If you have installed Service Mesh Manager in Anonymous mode, you won’t be able to access the Metrics and Traces dashboards from the UI. Clicking the Open metrics in Grafana or Open Jaeger tracing icon in anonymous mode causes the RBAC: access denied error message.

To display the logs of the pod, click the Show pod logs icon icon. The pod logs are displayed at the bottom of the screen.

Note: In multi-cluster scenarios, live log streaming is available only for pods running on the primary cluster.

Show pod logs

Details of the node

To check the health of a node, select a node in the pod details view. The node view is the deepest layer of the drill-down view and shows information about a Kubernetes node.

Details of a node

The following details of the node that the pod is running on are displayed:

  • CLUSTER: The name of the Kubernetes cluster the node belongs to.
  • OS: The operating system running on the node, for example: linux amd64 (Ubuntu 18.04.4 LTS)
  • STARTED: The date when the node was started.
  • TAINTS: The list of Kubernetes taints assigned to the node.
  • LABELS: The list of Kubernetes labels assigned to the node.
  • Conditions: The status of the node, for example, disk and memory pressure, or network and kubelet status.
  • Pods: The list of pods currently running on the node.
  • Events: Recent events related to the node.
  • Metrics: Dashboards of the most important metrics. Click Open metrics in Grafana to open the related dashboards in Grafana.

    CAUTION:

    If you have installed Service Mesh Manager in Anonymous mode, you won’t be able to access the Metrics and Traces dashboards from the UI. Clicking the Open metrics in Grafana or Open Jaeger tracing icon in anonymous mode causes the RBAC: access denied error message.

Details of the WorkloadEntry

Details of a WorkloadEntry

The following details of the pod are displayed:

  • NAMESPACE: The namespace the pod belongs to.
  • CLUSTER: The name of the Kubernetes cluster the pod belongs to.
  • NETWORK: The name of the network the virtual machine running the workload belongs to.
  • ADDRESS: The IP address of the virtual machine.
  • PORTS: The open ports and related protocols of the WorkloadGroup.
  • LABELS: The list of Kubernetes labels assigned to the resource.
  • Events: Recent events related to the resource.
  • Metrics: Dashboards of the most important metrics. Click Open metrics in Grafana to open the related dashboards in Grafana.

2.5.3.2 - Generate test load

There are several places in the Service Mesh Manager interface where you can’t see anything if your application hasn’t received any traffic yet. For example, if there are no metrics in the system you won’t see any visualization on the MENU > OVERVIEW page.

Generate load on the UI

To send generated traffic to an endpoint, complete the following steps.

  1. On the Service Mesh Manager web interface, select MENU > TOPOLOGY.
  2. Click HTTP on the left side of the screen. Generate test traffic Generate test traffic
  3. Complete the form with the endpoint details and click SEND to generate test traffic to your services. Set traffic parameters Set traffic parameters
  4. In a few seconds a graph of your services appears.

Generate load in the CLI

If needed, you can generate constant traffic in the demo application by running: smm demoapp load start

2.5.3.3 - Pod logs

You can display the logs of Kubernetes pods on the Service Mesh Manager web interface to help troubleshooting your infrastructure.

If you are installing Service Mesh Manager on a managed Kubernetes solution of a public cloud provider (for example, AWS, Azure, or Google Cloud), assign admin roles, so that you can tail the logs of your containers from the Service Mesh Manager UI, use Service Level Objectives and perform various tasks from the CLI that require custom permissions. Run the following command:

kubectl create clusterrolebinding user-cluster-admin --clusterrole=cluster-admin --user=<gcp/aws/azure username>

CAUTION:

Assigning administrator roles might be very dangerous because it gives wide access to your infrastructure. Be careful and do that only when you’re confident in what you’re doing.

Note: In multi-cluster scenarios, live log streaming is available only for pods running on the primary cluster.

To display the logs of a pod, complete the following steps.

  1. Open the Service Mesh Manager dashboard.
  2. Drill down to the pod you want to inspect. Drill down to the pod level Drill down to the pod level
  3. Click the Show pod logs icon icon. The pod logs are displayed at the bottom of the screen. Show pod logs Show pod logs
  4. You can also filter the logs to a specific container. Filter pod logs Filter pod logs

From the pod level, you can also go up or down in your infrastructure to inspect other components.

2.5.4 - Workloads

You can drill-down from the MENU > TOPOLOGY page by selecting a service or a workload in the Istio service mesh. You can trace back an issue from the top-level service mesh layer by navigating deeper in the stack, and see the status and most important metrics of your Kubernetes controllers, pods, and nodes.

List of workloads

The MENU > WORKLOADS page contains information about the workloads in your service mesh. Above the list of workloads, there is a summary dashboard about the state of your workloads, showing the following information:

  • Requests per second: Requests per second for the workloads.
  • Average latency: Average latency for the workloads (95th percentile latency in milliseconds).
  • Error rate: The percentage of requests returning a 5xx status code. Client-side errors (with 4XX status code) are not included.
  • Clusters: The number of clusters in the service mesh.
  • Workloads: The number of workloads in the mesh and the total number of workloads.
  • Pods: The number of running pods and the desired number of pods.

The list displays the workloads (grouped by namespaces), and a timeline of the metrics-based health score of each workload. The Kubernetes workload icon indicates Kubernetes workloads, while the Virtual machine workload icon indicates workloads running on virtual machines. You can filter the list to show only the selected namespaces, and display historical data by adjusting the timeline.

  • To display the details of a workload, click the name of the workload.
  • To open the Grafana dashboards related to the workload, click Open metrics in Grafana .
  • To display the detailed health metrics of a workload, click the health indicator of the workload for the selected period. Health metrics of a workload Health metrics of a workload

From the mesh workload overview, you can drill down through the following levels to the underlying resources of the infrastructure:

Details of a workload Details of a workload

Workload details

Select a Workload from the list to display its details.

Details of a workload Details of a workload

The following details of the workload are displayed:

  • NAMESPACE: The namespace the workload belongs to.
  • APP: The application running in the workload.
  • VERSION: The version number of the workload, for example, v2.
  • REPLICAS: The number of replicas for the workload.
  • LABELS: The list of Kubernetes labels assigned to the resource.
  • HEALTH: Indicates the health score of the workload. Click the chart to display more details.
  • Controller: The controllers related to the workload. Click on a controller to display its details.
  • CLUSTER: The name of the Kubernetes cluster the workload belongs to.
  • KIND: The kind of the workload, for example, DaemonSet, Deployment, ReplicaSet, or StatefulSet.
  • Pods: The list of pods running this workload. Click on the name of the pod to display the details of the pod. You can also display and filter logs of the pod.
  • Events: Recent events related to the resource.
  • Metrics: Dashboards of the most important metrics. Click Open metrics in Grafana to open the related dashboards in Grafana.

Details of a controller Details of a controller

Pod details

To check the details of a pod, click the name of a pod in the Pods section.

Details of a pod

The following details of the pod are displayed:

  • Namespace: The namespace the pod belongs to.
  • CLUSTER: The name of the Kubernetes cluster the pod belongs to.
  • NODE: The hostname of the node the pod is running on, for example, ip-192-168-1-1.us-east-2.compute.internal. Click on the name of the node to display the details of the node.
  • IP: The IP address of the pod.
  • STARTED: The date when the pod was started.
  • LABELS: The list of Kubernetes labels assigned to the resource.
  • Containers: The list of containers in the pod. Also includes the Name, Image, and Status of the container.
  • Events: Recent events related to the resource.
  • Metrics: Dashboards of the most important metrics. Click Open metrics in Grafana to open the related dashboards in Grafana.

    CAUTION:

    If you have installed Service Mesh Manager in Anonymous mode, you won’t be able to access the Metrics and Traces dashboards from the UI. Clicking the Open metrics in Grafana or Open Jaeger tracing icon in anonymous mode causes the RBAC: access denied error message.

To display the logs of the pod, click the Show pod logs icon icon. The pod logs are displayed at the bottom of the screen.

Note: In multi-cluster scenarios, live log streaming is available only for pods running on the primary cluster.

Show pod logs

Node details

To check the health of a node, select a node in the pod details view. The node view is the deepest layer of the drill-down view and shows information about a Kubernetes node.

Details of a node

The following details of the node that the pod is running on are displayed:

  • CLUSTER: The name of the Kubernetes cluster the node belongs to.
  • OS: The operating system running on the node, for example: linux amd64 (Ubuntu 18.04.4 LTS)
  • STARTED: The date when the node was started.
  • TAINTS: The list of Kubernetes taints assigned to the node.
  • LABELS: The list of Kubernetes labels assigned to the node.
  • Conditions: The status of the node, for example, disk and memory pressure, or network and kubelet status.
  • Pods: The list of pods currently running on the node.
  • Events: Recent events related to the node.
  • Metrics: Dashboards of the most important metrics. Click Open metrics in Grafana to open the related dashboards in Grafana.

    CAUTION:

    If you have installed Service Mesh Manager in Anonymous mode, you won’t be able to access the Metrics and Traces dashboards from the UI. Clicking the Open metrics in Grafana or Open Jaeger tracing icon in anonymous mode causes the RBAC: access denied error message.

WorkloadEntry details

Details of a WorkloadEntry

The following details of the pod are displayed:

  • NAMESPACE: The namespace the pod belongs to.
  • CLUSTER: The name of the Kubernetes cluster the pod belongs to.
  • NETWORK: The name of the network the virtual machine running the workload belongs to.
  • ADDRESS: The IP address of the virtual machine.
  • PORTS: The open ports and related protocols of the WorkloadGroup.
  • LABELS: The list of Kubernetes labels assigned to the resource.
  • Events: Recent events related to the resource.
  • Metrics: Dashboards of the most important metrics. Click Open metrics in Grafana to open the related dashboards in Grafana.

2.5.5 - Services

You can drill-down from the MENU > TOPOLOGY page by selecting a service or a workload in the Istio service mesh. You can trace back an issue from the top-level service mesh layer by navigating deeper in the stack, and see the status and most important metrics of your Kubernetes controllers, pods, and nodes.

List of services

The MENU > SERVICES page contains information about the services in your service mesh. Above the list of services, there is a summary dashboard about the state of your services, showing the following information:

List of services List of services

The list displays the services (grouped by namespaces), and a timeline of the metrics-based health score of each service. You can filter the list to show only the selected namespaces, and display historical data by adjusting the timeline.

  • To display the details of a service, click the name of the service.
  • To open the Grafana dashboards related to the service, click Open metrics in Grafana .
  • To run tracing with Jaeger click Open Jaeger tracing .

    CAUTION:

    If you have installed Service Mesh Manager in Anonymous mode, you won’t be able to access the Metrics and Traces dashboards from the UI. Clicking the Open metrics in Grafana or Open Jaeger tracing icon in anonymous mode causes the RBAC: access denied error message.
  • To display the detailed health metrics of a service, click the health indicator of the service for the selected period. Health metrics of a service Health metrics of a service

From the services list, you can drill down through the following levels to the underlying resources of the infrastructure: List of services > Mesh Service > Service > Pod > Node.

Mesh Service details

Details of a mesh service Details of a mesh service

  • Mesh Service: The multi-cluster mesh service available in the current mesh. Select a service from the SERVICE field to display the details of the service in any of the attached clusters.

  • Namespace: The namespace the service belongs to.

  • APP: The application exposed using the service.

  • PORTS: The ports where the service is accessible, for example:

    http         8080/TCP → 8080
    grpc         8082/TCP → 8082
    tcp          8083/TCP → 8083
    
  • HEALTH: Indicates the health score of the workload. Click the chart to display more details.

  • Service Level Objectives: The details of the Service Level Objectives (SLOs) defined for the service. Click on an SLO to display its details.

  • Services: The list of services belonging to the mesh service.

  • Metrics: Dashboards of the most important metrics. Click on a service to display its details.

    • To open the related Grafana dashboards, click Open metrics in Grafana .
    • To run tracing with Jaeger click Open Jaeger tracing .

Display service details

Select a Service from the list to display its details.

  • To run tracing with Jaeger click Open Jaeger tracing . In case of a multi-cluster setup, you can select which cluster’s data to display.

Details of a service Details of a service

The following details of the service are displayed:

  • Service: The services exposed in this resource. To display the details of the service, click the name of the service.

  • Namespace: The namespace the service belongs to.

  • CLUSTER: The name of the Kubernetes cluster the service belongs to.

  • SELECTOR: The label selector used to select the set of Pods targeted by the Service.

  • PORTS: The ports where the service is accessible, for example:

    http         8080/TCP → 8080
    grpc         8082/TCP → 8082
    tcp          8083/TCP → 8083
    
  • TYPE: The ServiceType indicates how your service is exposed, for example, ClusterIP or LoadBalancer.

  • CLUSTER IP: The IP address corresponding to the ServiceType.

  • CREATED: The date when the service was started.

  • LABELS: The list of Kubernetes labels assigned to the resource.

  • Pods: The list of pods running this service, including their name, number of containers in the pod, and their status. Click on the name of the pod to display the details of the pod.

  • Events: Recent events related to the service resource.

Display pod details

To check the details of a pod, click the name of the pod in the PODS section.

Details of a pod

The following details of the pod are displayed:

  • Namespace: The namespace the pod belongs to.
  • CLUSTER: The name of the Kubernetes cluster the pod belongs to.
  • NODE: The hostname of the node the pod is running on, for example, ip-192-168-1-1.us-east-2.compute.internal. Click on the name of the node to display the details of the node.
  • IP: The IP address of the pod.
  • STARTED: The date when the pod was started.
  • LABELS: The list of Kubernetes labels assigned to the resource.
  • Containers: The list of containers in the pod. Also includes the Name, Image, and Status of the container.
  • Events: Recent events related to the resource.
  • Metrics: Dashboards of the most important metrics. Click Open metrics in Grafana to open the related dashboards in Grafana.

    CAUTION:

    If you have installed Service Mesh Manager in Anonymous mode, you won’t be able to access the Metrics and Traces dashboards from the UI. Clicking the Open metrics in Grafana or Open Jaeger tracing icon in anonymous mode causes the RBAC: access denied error message.

To display the logs of the pod, click the Show pod logs icon icon. The pod logs are displayed at the bottom of the screen.

Note: In multi-cluster scenarios, live log streaming is available only for pods running on the primary cluster.

Show pod logs

Display node details

To check the health of a node, select a node in the pod details view. The node view is the deepest layer of the drill-down view and shows information about a Kubernetes node.

Details of a node

The following details of the node that the pod is running on are displayed:

  • CLUSTER: The name of the Kubernetes cluster the node belongs to.
  • OS: The operating system running on the node, for example: linux amd64 (Ubuntu 18.04.4 LTS)
  • STARTED: The date when the node was started.
  • TAINTS: The list of Kubernetes taints assigned to the node.
  • LABELS: The list of Kubernetes labels assigned to the node.
  • Conditions: The status of the node, for example, disk and memory pressure, or network and kubelet status.
  • Pods: The list of pods currently running on the node.
  • Events: Recent events related to the node.
  • Metrics: Dashboards of the most important metrics. Click Open metrics in Grafana to open the related dashboards in Grafana.

    CAUTION:

    If you have installed Service Mesh Manager in Anonymous mode, you won’t be able to access the Metrics and Traces dashboards from the UI. Clicking the Open metrics in Grafana or Open Jaeger tracing icon in anonymous mode causes the RBAC: access denied error message.

2.5.6 - Gateways

The MENU > GATEWAYS page of the Service Mesh Manager web interface allows you to:

Note: Service Mesh Manager uses the concept of IstioMeshGateways, a declarative representation of Istio ingress and egress gateway services and deployments. With the help of IstioMeshGateways, you can set up multiple gateways in a cluster, and use them for different purposes.

To create a new ingress gateway, see Create ingress gateway.

List gateways

To list the gateways of your service mesh, navigate to MENU > GATEWAYS.

List of Istio ingress gateways List of Istio ingress gateways

For each gateway, the following information is shown:

  • Name: The name of the gateway.
  • Namespace: The namespace the gateway belongs to.
  • Cluster: The cluster the gateway belongs to. Mainly useful in multi-cluster scenarios.
  • Type: Type of the gateway. Ingress gateways define an entry point into your Istio mesh for incoming traffic, while egress gateways define an exit point from your Istio mesh for outgoing traffic.
  • Open ports: The ports the gateway accepts connections on.
  • Hosts: Number of hosts accessible using the gateway.
  • Routes: Number of routing rules configured for the ingress traffic.
  • Error rate: The number of errors during the last polling interval, for 5XX errors. Client-side errors (with 4XX status code) are not included.
  • Requests per second: The number of requests per second during the last polling interval.
  • Status: Status of the gateway.

Click the name of a gateway to display the details of the gateway (grouped into several tabs: Overview and host configuration, Routes, Deployment and Service).

To display the YAML configuration of MeshGateways, Gateways, or VirtualServices, click the name of the gateway in the list, then click the Show YAML configuration icon next to their name.

YAML configuration of a gateways YAML configuration of a gateways

YAML configuration of a gateways YAML configuration of a gateways

Monitor upstream traffic

Service Mesh Manager collects upstream metrics like latencies, throughput, RPS, or error rate from Prometheus, and provides a summary for each gateway. It also sets up a Grafana dashboard and displays appropriate charts in-place.

To monitor the upstream traffic of your Istio gateways, complete the following steps.

  1. Open the Service Mesh Manager web interface, and navigate to MENU > GATEWAYS.

  2. From the list of gateways, click the gateway you want to monitor.

    List of Istio ingress gateways List of Istio ingress gateways

  3. On the OVERVIEW tab, scroll down to the METRICS section. The most important metrics of the gateway are displayed on the Service Mesh Manager web interface (for example, upstream requests per second and error rate).

    Note: You can also view the details of the service or the deployment related to the gateway.

    Istio ingress gateway metrics Istio ingress gateway metrics

    Click Open metrics in Grafana to open the related dashboards in Grafana.

    CAUTION:

    If you have installed Service Mesh Manager in Anonymous mode, you won’t be able to access the Metrics and Traces dashboards from the UI. Clicking the Open metrics in Grafana or Open Jaeger tracing icon in anonymous mode causes the RBAC: access denied error message.

Gateway deployment and service details

To display the details, events, and most important metrics of the deployment and service related to a gateway, navigate to MENU > GATEWAYS > <Gateway-to-inspect>, then click SERVICE or DEPLOYMENT.

Details of the gateway service Details of the gateway service

Details of the gateway deployment Details of the gateway deployment

2.5.6.1 - Create ingress gateway

Overview

Ingress gateways define an entry point into your Istio mesh for incoming traffic.

Multiple ingress gateways in Istio

You can configure gateways using the Gateway and VirtualService custom resources of Istio, and the IstioMeshGateway CR of Service Mesh Manager.

  • The Gateway resource describes the port configuration of the gateway deployment that operates at the edge of the mesh and receives incoming or outgoing HTTP/TCP connections. The specification describes a set of ports that should be exposed, the type of protocol to use, TLS configuration – if any – of the exposed ports, and so on. For more information about the gateway resource, see the Istio documentation.
  • The VirtualService resource defines a set of traffic routing rules to apply when a host is addressed. Each routing rule defines matching criteria for the traffic of a specific protocol. If the traffic matches a routing rule, then it is sent to a named destination service defined in the registry. For example, it can route requests to different versions of a service or to a completely different service than was requested. Requests can be routed based on the request source and destination, HTTP paths and header fields, and weights associated with individual service versions. For more information about VirtualServices, see the Istio documentation.
  • Service Mesh Manager provides a custom resource called IstioMeshGateway and uses a separate controller to reconcile gateways, allowing you to use multiple gateways in multiple namespaces. That way you can also control who can manage gateways, without having permissions to modify other parts of the Istio mesh configuration.

Using IstioMeshGateway, you can add Istio ingress or egress gateways in the mesh and configure them. When you create a new IstioMeshGateway CR, Service Mesh Manager takes care of configuring and reconciling the necessary resources, including the Envoy deployment and its related Kubernetes service.

Note: Service Mesh Manager automatically creates an ingress gateway called smm-ingressgateway and an istio-meshexpansion-cp-v115x. The smm-ingressgateway serves as the main entry point for the services of Service Mesh Manager, for example, the dashboard and the API, while the meshexpansion gateway is used in multi-cluster setups to ensure communication between clusters for the Istio control plane and the user services.

Do not use this gateway for user workloads, because it is managed by Service Mesh Manager, and any change to its port configuration will be overwritten. Instead, create a new mesh gateway using the IstioMeshGateway custom resource.

Prerequisites

Auto sidecar injection must be enabled for the namespace of the service you want to make accessible.

Create ingress gateway

To create a new ingress gateway and expose a service from the command line, complete the following steps. Alternatively, you can create an ingress gateway using the dashboard.

  1. If you haven’t already done so, create and expose the service you want to make accessible through the gateway.

    For testing, you can download and apply the following echo service:

    
    
    kubectl apply -f echo.yaml
    

    Expected output:

    deployment.apps/echo created
    service/echo created
    
  2. Create a new ingress gateway using the IstioMeshGateway resource.

    1. Download the following resource and adjust it as needed for your environment:

      CAUTION:

      By the default, the IstioMeshGateway pod is running without root privileges, therefore it cannot use ports under 1024. Either use ports above 1024 as targetports (for example, 8080 instead of 80) or run the gateway pod with root privileges by setting spec.runAsRoot: true in the IstioMeshGateway custom resource.
      
      

      For details on the IstioMeshGateway resource, see the IstioMeshGateway CR reference.

    2. Apply the IstioMeshGateway resource. Service Mesh Manager creates a new ingress gateway deployment and a corresponding service, and automatically labels them with the gateway-name and gateway-type labels and their corresponding values.

      kubectl apply -f meshgw.yaml
      

      Expected output:

      istiomeshgateway.servicemesh.cisco.com/demo-gw created
      
    3. Get the IP address of the gateway. (Adjust the name and namespace of the IstioMeshGateway as needed for your environment.)

      kubectl -n default get istiomeshgateways demo-gw
      

      The output should be similar to:

      NAME      TYPE      SERVICE TYPE   STATUS      INGRESS IPS       ERROR   AGE    CONTROL PLANE
      demo-gw   ingress   LoadBalancer   Available   ["3.10.16.232"]           107s   {"name":"cp-v115x","namespace":"istio-system"}
      
    4. Create the Gateway and VirtualService resources to configure listening ports on the matching gateway deployment. Make sure to adjust the hosts fields to the external hostname of the service. (You should manually set an external hostname that points to these addresses, but for testing purposes you can use for example nip.io, which is a domain name that provides wildcard DNS for any IP address.)

      
      
      kubectl apply -f gwvs.yaml
      

      Expected output:

      gateway.networking.istio.io/echo created
      virtualservice.networking.istio.io/echo created
      
  3. Access the service on the external address.

    curl -i echo.3.10.16.232.nip.io
    

    The output should be similar to:

    HTTP/1.1 200 OK
    date: Mon, 07 Mar 2022 19:22:15 GMT
    content-type: text/plain
    server: istio-envoy
    x-envoy-upstream-service-time: 1
    
    Hostname: echo-68578cf9d9-874rz
    ...
    

Create ingress gateway using dashboard

  1. To create an ingress gateway from Calisti dashboard, navigate to MENU > GATEWAYS page. Click CREATE NEW. An interface to select the Ingress gateway resource template opens.

    Create ingress gateway

    Ingress gateway

  2. Select Ingress in the Template dropdown. Ingress gateway template

  3. Here, you can edit the ingress gateway resource. To validate the correctness of the YAML resource, click ValidateValidate . For details on the IstioMeshGateway resource, see the IstioMeshGateway CR reference.

  4. Once the YAML file is validated and there are no errors, to create the ingress gateway, click Create.

Edit ingress gateway using dashboard

  1. To edit a particular ingress gateway in your service mesh, click the Edit Edit icon at the end of the row. Edit ingress gateway

  2. Edit the selected gateway YAML, and validate the YAML. For details on the IstioMeshGateway resource, see the IstioMeshGateway CR reference.

  3. To apply the changes to the YAML, click Apply.

Delete ingress gateway using dashboard

  1. To delete a particular ingress gateway in your service mesh, click the Delete Delete icon at the end of the row.

    Delete

  2. If you are absolutely sure that you want to delete the selected ingress gateway, click Delete on the pop-up.

    CAUTION:

    Deleting the resource is irreversible and cannot be undone, as Calisti doesn’t store the old resource files.

IstioMeshGateway CR reference

This section describes the fields of the IstioMeshGateway custom resource.

apiVersion (string)

Must be servicemesh.cisco.com/v1alpha1

kind (string)

Must be IstioMeshGateway

spec (object)

The configuration and parameters of the IstioMeshGateway.

spec.type (string, required)

Type of the mesh gateway. Ingress gateways define an entry point into your Istio mesh for incoming traffic, while egress gateways define an exit point from your Istio mesh for outgoing traffic. Possible values:

  • ingress
  • egress

spec.istioControlPlane (object, required)

Specifies the istiocontrolplane cr the istio-proxy connects to by a namespaced name. When upgrading to a new Istio version (thus to a new control plane), this should be upgraded.

For example:

spec:
  istioControlPlane:
    name: cp-v115x
    namespace: istio-system

spec.deployment (object)

Configuration options for the Kubernetes istio-proxy deployment. Metadata like labels and annotations can be set here for the deployment or pods as well, in spec.deployment.metadata.annotations or spec.deployment.podMetadata.annotations.

spec.service (object, required)

Configuration options for the Kubernetes service. Annotations can be set here as well as in spec.service.metadata.annotations, they are often useful in cloud loadbalancer cases, for example to specify some configuration for AWS.

Example for Google Cloud Engine (GKE):

This example shows how to create an internal load balancer with a static ip address. The first step is to reserve a static ip address that can then be used by the load balancer.

More info about how to reserve a static internal ip address can be found at: Reserving a static internal IP address

More annotations and their description are available here: LoadBalancer Service parameters

IstioMeshGateway modifications:

  service:
    ports:
      - name: tcp-status-port
        port: 15021
        protocol: TCP
        targetPort: 15021
      - name: http
        port: 80
        protocol: TCP
        targetPort: 8080
    type: LoadBalancer
    loadBalancerIP: "10.100.100.100"
    metadata:
      annotations:
        networking.gke.io/load-balancer-type: "Internal"

Gateway modifications:

spec:
  selector:
    gateway-name: demo-gw
    gateway-type: ingress
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*"

VirtualService modifications:

  spec:
  hosts:
  - "*"
  gateways:
  - echo 
  http:
  - route:
    - destination:
        port:
          number: 80
        host: echo.default.svc.cluster.local

Example for Azure Kubernetes Service (AKS): This example shows how to modify the TCP idle timeout for a loadbalancer.

By default, it’s set to 4 minutes. If a period of inactivity is longer than the timeout value, there’s no guarantee that the TCP or HTTP session is maintained between the client and your cloud service.

More annotations and their description are available here: Azure LoadBalancer annotations

IstioMeshGateway modifications:

  service:
    ports:
      - name: tcp-status-port
        port: 15021
        protocol: TCP
        targetPort: 15021
      - name: http
        port: 80
        protocol: TCP
        targetPort: 8080
    type: LoadBalancer
    metadata:
      annotations:
        service.beta.kubernetes.io/azure-load-balancer-tcp-idle-timeout: "10"

Example for Amazon Web Services (AWS): This example shows how to create a load balancer to specify which nodes to include in the target group registration for instance target type.

More annotations and their description are available here: AWS Load Balancer Controller Annotations

IstioMeshGateway modifications:

  service:
    ports:
      - name: tcp-status-port
        port: 15021
        protocol: TCP
        targetPort: 15021
      - name: http
        port: 80
        protocol: TCP
        targetPort: 8080
    type: LoadBalancer
    metadata:
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
        service.beta.kubernetes.io/aws-load-balancer-target-node-labels: k8s-app=echo

spec.runAsRoot (true | false)

Whether to run the gateway in a privileged container. If not running as root, only ports higher than 1024 can be opened on the container. Default value: false

2.5.6.2 - Create egress gateway

Egress traffic

Traffic that’s outbound from a pod that has an Istio sidecar also passes through that sidecar’s container, (more precisely, through Envoy). Therefore, the accessibility of external services depends on the configuration of that Envoy proxy.

By default, Istio configures the Envoy proxy to enable requests for unknown services. Although this provides a convenient way of getting started with Istio, it’s generally a good idea to put stricter controls in place.

Egress traffic without egress gateway Egress traffic without egress gateway

Allow only registered access

You can configure Service Mesh Manager to permit access only to specific external services. For details, see External Services.

Egress gateway

Egress gateways define an exit point from your Istio mesh for outgoing traffic. Egress gateways also allow you to apply Istio features on the traffic that exits the mesh, for example monitoring, routing rules, or retries.

Egress traffic using egress gateway Egress traffic using egress gateway

Why do you need egress gateways? For example:

  • Your organization requires some, or all, outbound traffic to go through dedicated nodes. These nodes could be separated from the rest of the nodes for the purposes of monitoring and policy enforcement.
  • The application nodes of a cluster don’t have public IPs, so the in-mesh services that run on them cannot access the internet directly. Allocating public IPs to the egress gateway nodes and routing egress traffic through the gateway allows for controlled access to external services.

CAUTION:

Using an egress gateway doesn’t restrict outgoing traffic, it only routes it through the egress gateway. To limit access only to selected external services, see External Services.

Create egress gateway

To create an egress gateway from the command line and route egress traffic through it, complete the following steps. Alternatively, you can create an egress gateway using the dashboard.

Note: The YAML samples work with the Service Mesh Manager demo application. Adjust their parameters (for example, namespace, service name, and so on) as needed for your environment.

CAUTION:

Using an egress gateway doesn’t restrict outgoing traffic, it only routes it through the egress gateway. To limit access only to selected external services, see External Services.
  1. Create an egress gateway using the IstioMeshGateway resource. For details on the IstioMeshGateway resource, see the IstioMeshGateway CR reference.

    CAUTION:

    By the default, the IstioMeshGateway pod is running without root privileges, therefore it cannot use ports under 1024. Either use ports above 1024 as targetports (for example, 8080 instead of 80) or run the gateway pod with root privileges by setting spec.runAsRoot: true in the IstioMeshGateway custom resource.

    For testing, you can download and apply the following resource to create a new egress gateway deployment and a corresponding service in the smm-demo namespace.

    
    
    kubectl apply -f egress-meshgateway.yaml
    

    Expected output:

    istiomeshgateway.servicemesh.cisco.com/egress-demo created
    
  2. Create a Gateway resource for the egress gateway. The Gateway resource connects the Istio configuration resources and the deployment of a matching gateway. Apply the following Gateway resource to configure the outbound port (80 in the previous example) on the egress gateway that you have defined in the previous step.

    
    
    kubectl apply -f egress-gateway.yaml
    

    Expected output:

    gateway.networking.istio.io/egress-demo created
    
  3. Define a VirtualService resource to direct traffic from the sidecars to the egress gateway.

    Apply the following VirtualService to direct traffic from the sidecars to the egress gateway, and also from the egress gateway to the external service. Edit the VirtualService to match the external service you want to permit access to.

    
    
    kubectl apply -f egress-virtualservice.yaml
    

    Expected output:

    virtualservice.networking.istio.io/httpbin-egress created
    
  4. Test access to the external service.

    If you have installed the Service Mesh Manager demo application and used the examples in the previous steps, you can run the following command to start requests from the notifications-v1 workload to the external http-bin service:

    kubectl -n smm-demo set env deployment/notifications-v1 'REQUESTS=http://httpbin.org/get#1'
    
    • If everything is set up correctly, the new egress gateway appears on the MENU > GATEWAYS page. The new egress gateway on the <strong>MENU &gt; GATEWAYS</strong> page The new egress gateway on the <strong>MENU &gt; GATEWAYS</strong> page
    • If there is egress traffic, the gateway appears on the MENU > GATEWAYS page (make sure to select the relevant namespace). Note that the traffic from the gateway to the external service is visible only if you create a ServiceEntry resource for the external service. The new egress gateway on the <strong>MENU &gt; TOPOLOGY</strong> page The new egress gateway on the <strong>MENU &gt; TOPOLOGY</strong> page
  5. If needed, permit access only to specific external services. For details, see External Services.

Create egress gateway using dashboard

  1. To create an egress gateway from Calisti dashboard, navigate to MENU > GATEWAYS. Click CREATE NEW. An interface to select the Egress gateway resource template opens.

    Create egress gateway

    Egress gateway

  2. Select Egress in the Template dropdown. Egress gateway template

  3. Here, you can edit the egress gateway resource. To validate the correctness of the YAML resource, click ValidateValidate . For details on the IstioMeshGateway resource, see the IstioMeshGateway CR reference.

  4. Once the YAML file is validated and there are no errors, to create the egress gateway, click Create.

Edit egress gateway using dashboard

  1. To edit a particular egress gateway in your service mesh, navigate to MENU > GATEWAYS. Click the Edit Edit icon at the end of the row of the gateway you want to edit. Edit egress gateway

  2. Edit the selected gateway YAML, and validate the YAML. For details on the IstioMeshGateway resource, see the IstioMeshGateway CR reference.

  3. To apply the changes to the YAML, click Apply.

Delete egress gateway using dashboard

  1. To delete a particular egress gateway in your service mesh, navigate to MENU > GATEWAYS. Click the Delete Delete icon at the end of the row of the gateway you want to delete.

    Delete

  2. If you are absolutely sure that you want to delete the selected egress gateway, click Delete on the pop-up.

    CAUTION:

    Deleting the resource is irreversible and cannot be undone, as Calisti doesn’t store the old resource files.

IstioMeshGateway CR reference

This section describes the fields of the IstioMeshGateway custom resource.

apiVersion (string)

Must be servicemesh.cisco.com/v1alpha1

kind (string)

Must be IstioMeshGateway

spec (object)

The configuration and parameters of the IstioMeshGateway.

spec.type (string, required)

Type of the mesh gateway. Ingress gateways define an entry point into your Istio mesh for incoming traffic, while egress gateways define an exit point from your Istio mesh for outgoing traffic. Possible values:

  • ingress
  • egress

spec.istioControlPlane (object, required)

Specifies the istiocontrolplane cr the istio-proxy connects to by a namespaced name. When upgrading to a new Istio version (thus to a new control plane), this should be upgraded.

For example:

spec:
  istioControlPlane:
    name: cp-v115x
    namespace: istio-system

spec.deployment (object)

Configuration options for the Kubernetes istio-proxy deployment. Metadata like labels and annotations can be set here for the deployment or pods as well, in spec.deployment.metadata.annotations or spec.deployment.podMetadata.annotations.

spec.service (object, required)

Configuration options for the Kubernetes service. Annotations can be set here as well as in spec.service.metadata.annotations, they are often useful in cloud loadbalancer cases, for example to specify some configuration for AWS.

Example for Google Cloud Engine (GKE):

This example shows how to create an internal load balancer with a static ip address. The first step is to reserve a static ip address that can then be used by the load balancer.

More info about how to reserve a static internal ip address can be found at: Reserving a static internal IP address

More annotations and their description are available here: LoadBalancer Service parameters

IstioMeshGateway modifications:

  service:
    ports:
      - name: tcp-status-port
        port: 15021
        protocol: TCP
        targetPort: 15021
      - name: http
        port: 80
        protocol: TCP
        targetPort: 8080
    type: LoadBalancer
    loadBalancerIP: "10.100.100.100"
    metadata:
      annotations:
        networking.gke.io/load-balancer-type: "Internal"

Gateway modifications:

spec:
  selector:
    gateway-name: demo-gw
    gateway-type: ingress
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*"

VirtualService modifications:

  spec:
  hosts:
  - "*"
  gateways:
  - echo 
  http:
  - route:
    - destination:
        port:
          number: 80
        host: echo.default.svc.cluster.local

Example for Azure Kubernetes Service (AKS): This example shows how to modify the TCP idle timeout for a loadbalancer.

By default, it’s set to 4 minutes. If a period of inactivity is longer than the timeout value, there’s no guarantee that the TCP or HTTP session is maintained between the client and your cloud service.

More annotations and their description are available here: Azure LoadBalancer annotations

IstioMeshGateway modifications:

  service:
    ports:
      - name: tcp-status-port
        port: 15021
        protocol: TCP
        targetPort: 15021
      - name: http
        port: 80
        protocol: TCP
        targetPort: 8080
    type: LoadBalancer
    metadata:
      annotations:
        service.beta.kubernetes.io/azure-load-balancer-tcp-idle-timeout: "10"

Example for Amazon Web Services (AWS): This example shows how to create a load balancer to specify which nodes to include in the target group registration for instance target type.

More annotations and their description are available here: AWS Load Balancer Controller Annotations

IstioMeshGateway modifications:

  service:
    ports:
      - name: tcp-status-port
        port: 15021
        protocol: TCP
        targetPort: 15021
      - name: http
        port: 80
        protocol: TCP
        targetPort: 8080
    type: LoadBalancer
    metadata:
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
        service.beta.kubernetes.io/aws-load-balancer-target-node-labels: k8s-app=echo

spec.runAsRoot (true | false)

Whether to run the gateway in a privileged container. If not running as root, only ports higher than 1024 can be opened on the container. Default value: false

2.5.6.3 - Manage host and port configuration

Service Mesh Manager understands the Gateway CRs of Istio and the gateway’s service configuration in Kubernetes (with the help of the MeshGateway CR), so it can display information about ports, hosts, and protocols that are configured on a specific gateway.

  1. Open the Service Mesh Manager web interface, and navigate to MENU > GATEWAYS.

  2. From the list of gateways, click the gateway you want to monitor.

  3. You can see the host and port configurations on the OVERVIEW tab, in the Ports & Hosts section.

    Istio ingress gateway hosts Istio ingress gateway hosts

    The following information is shown about each entry point.

    • GATEWAY NAME: The name of the gateway.
    • GATEWAY NAMESPACE: The namespace the gateway belongs to.
    • PORT: The list of open ports on the gateway.
    • PROTOCOL: The protocols permitted on the gateway.
    • HOSTS: The host selector that determines which hosts are accessible using the gateway.
    • TLS: The TLS settings applying to the gateway.
  • To modify an existing route, click Edit , change the settings as needed, then click APPLY.
  • To delete a route, click Delete .
  • To create a new entry point, click CREATE NEW.

Create new ingress entry point

You can set up a new entry point for your Istio ingress gateways, and Service Mesh Manager translates your configuration to declarative custom resources.

  1. Navigate to MENU > GATEWAYS > <Gateway-to-modify> > OVERVIEW.

  2. In the Ports & Hosts section, click CREATE NEW.

    Istio ingress gateway create new entry point

  3. Set the parameters of the entry point. As a minimum, you must set the port number for the entry point, and the protocol (for example, HTTP, HTTPS, or GRPC) that is accepted at the entry point.

    Note: DNS resolution is not managed by Service Mesh Manager. Once you’ve configured ingress for a particular service, Service Mesh Manager will display the IP of the ingress gateway service. Do not forget to create the corresponding DNS records that point to this IP.

  4. Click CREATE.

Gateway TLS settings

When setting up a service on a gateway with TLS, you need to configure a certificate for the host(s). You can do that by bringing your own certificate, putting it down in a Kubernetes secret, and configuring it for a gateway server. This works for simple use cases, but involves lots of manual steps when obtaining or renewing a certificate. Automated Certificate Management Environments (ACME) automates these kinds of interactions with the certificate provider.

ACME is most widely used with Let’s Encrypt and - when in a Kubernetes environment - cert-manager. Service Mesh Manager helps you set up cert-manager, and you can quickly obtain a valid Let’s Encrypt certificate through the dashboard with a few clicks.

Note: For details on using your own domain name with Let’s Encrypt, see Using Let’s Encrypt with your own domain name. To set TLS encryption for a gateway, complete the following steps.

  1. Navigate to MENU > GATEWAYS > <Gateway-to-modify> > OVERVIEW.

  2. In the Ports & Hosts section, click Edit in the row of the gateway you want to modify.

  3. Set PORT PROTOCOL to HTTPS.

    TLS on the gateway with Let&rsquo;s Encrypt TLS on the gateway with Let&rsquo;s Encrypt

  4. Decide how you want to provide the certificate for the gateway.

    • To use Let’s Encrypt, select USE LET’S ENCRYPT FOR TLS, then enter a valid email address into the CONTACT EMAIL field. The provided email address will be used to notify about expirations and to communicate about any issues specific to your account.
    • To use a custom certificate, upload a certificate as a Kubernetes secret, then set the name of the secret in the TLS SECRET NAME field. Note that currently you cannot upload the certificate from the Service Mesh Manager UI, use regular Kubernetes tools instead.
  5. Click CREATE.

2.5.6.4 - Routes and traffic management with Virtual Services

Note: This section describes the routing rules of ingress gateways. To configure routing rules for in-mesh services, see Routing.

One of the main reasons to use Istio gateways instead of native Kubernetes ingress is that you can use VirtualService to configure the routing of incoming traffic, just like for in-mesh routes. You can apply Istio concepts to incoming requests, like redirects, rewrites, timeouts, retries, or fault injection.

Calisti displays the VirtualServices and their related configuration on the Gateways page, and gives you the ability to configure routing. Calisti provides various Virtual Service templates for route rules and traffic management.

The MENU > GATEWAYS > <Gateway-to-inspect> > VIRTUAL SERVICES page displays the following information about the VirtualServices of the gateway.

  • VIRTUAL SERVICE: The name of the VirtualService resource for the gateway. To display the YAML configuration of the VirtualService, click the Show YAML configuration icon next to its name.
  • GATEWAYS: The names of gateways and sidecars that apply these routes.
  • HOSTS: The host selector that determines which hosts are accessible using the route.
  • MATCH: The route applies only to requests matching this expression.
  • DESTINATION: The destinations of the routing rule.
  • ACTIONS: Any special action related to the route (for example, rewrite).
  • PROTOCOL: The protocol permitted in the route.

Istio ingress gateway routes Istio ingress gateway routes

Create a virtual service

To create a new routing rule, create a new VirtualService from the dashboard, complete the following steps. You can also edit or delete VirtualServices, and you can also view the full YAML configuration of the virtual service. The new rule created with the VirtualService resource matches every incoming request.

Note: Rules are evaluated in top-down order. For more details, see Rule precedence.

  1. Navigate to MENU > GATEWAYS > <Gateway-to-inspect> > VIRTUAL SERVICES page.

  2. Select VIRTUAL SERVICE > CREATE NEW.

  3. Select a template based on your need.

    • By default, the new rule matches every incoming request. When you specify multiple match arguments, they have a logical OR relationship: the rule matches any request that fits one of the match rules. Within a match rule, you can specify multiple rules that have an AND relation. That way, you can match requests against a specific URL and an HTTP method, for example.

      For example, using the following template, you can create a rule that matches only requests where the URL path starts with /ratings/v2/ and the request contains a custom end-user header with the value jason. To add custom matches to select only specific traffic for the rule based on scheme, method, URI, host, port, or authority, use HTTP Request Template.

      HTTP request template

    • You can route the requests to a specific service. To route a portion of the traffic to a different destination, select HTTP Route Destination Template and use the weight parameter to split the traffic between multiple destination services.

      HTTP route destination

    • Alternatively, you can use the HTTP Redirect template to redirect the traffic to a specific URI. Redirect rules overwrite the Path portion of the URL with this value. Note that the entire path is replaced, irrespective of the request URI being matched as an exact path or prefix.

      HTTP redirect template

    • Set the timeout and retry options as needed for your environment using HTTP Retry Template.

      HTTP retry template

    • Set the rewrite option to rewrite specific parts of the HTTP request before forwarding the request to the destination using HTTP Rewrite Template.

      HTTP rewrite template

  4. Click Create. The new rule appears on the VIRTUAL SERVICES tab.

Edit a virtual service

  1. To edit a particular virtual service in your service mesh, click the Edit Edit icon at the end of the row. Edit virtual service

  2. Modify the selected virtual service YAML, and validate the YAML.

  3. To apply the modifications to the YAML configuration, click Apply.

Delete a virtual service

  1. To delete a particular virtual service in your service mesh, click the Delete Delete icon at the end of the row.

    Delete

  2. If you are absolutely sure that you want to delete the selected virtual service, click Delete on the pop-up.

    CAUTION:

    Deleting the resource is irreversible and cannot be undone, as Calisti doesn’t store the old resource files.

Ingress with your own domain

Once you have have completed the steps in Using Let’s Encrypt with your own domain name, create a Tls route virtual service using the templates provided. Complete the following steps.

  1. Navigate to MENU > GATEWAYS > <Gateway-to-inspect> > VIRTUAL SERVICES
  2. To create a new ingress VirtualService using your own domain, click CREATE NEW.
  3. Select the Tls Route Template from the Template dropdown. Now use the gateway, host, and port number you provided during setting up an encrypted HTTPS port under your own domain name for your services. Tls route
  4. Once you modify and enter the host, port number and other parameters in the template, validate the resource’s correctness.
  5. Click CREATE.

2.5.6.5 - Using Let's Encrypt with your own domain name

The following procedure shows you how to set up an encrypted HTTPS port under your own domain name for your services, and obtain a matching certificate from Let’s Encrypt.

This requires solving the ACME HTTP-01 challenge, and this involves routing an HTTP request from the ACME server (the Certificate Authority) to the cert-manager challenge-solver pod.

Complete the following steps.

  1. Open the Service Mesh Manager web interface, and navigate to MENU > GATEWAYS > OVERVIEW.

  2. Select the gateway you want to secure. Note that the SERVICE TYPE of the gateway must be LoadBalancer. The load balancer determines the IP address(es) to be used for the ACME HTTP-01 challenge. In the following example, it’s istio-meshexpansion-gateway-cp-v115x.

    gateways gateways

  3. Point your domain name to the IP address or DNS name found in the ADDRESS field.

  4. Configure the ingress gateway.

    1. In the Ports & Hosts section, click CREATE NEW in the upper right corner.

      create create

    2. Select the HTTPS protocol in the ports and protocol dropdown.

    3. Select the port number (probably 443) and enter the port name you want to accept incoming connections on.

    4. Enter your domain name into the HOSTS field. To enter multiple domain names, use Enter.

    5. Select Use Let’s Encrypt for TLS to get a certificate for your domain from Let’s Encrypt.

    6. Enter your email address. This address is forwarded to Let’s Encrypt and is used for ACME account management.

    7. Click CREATE.

    8. Two more items appear in the Ports & Hosts list for your domain name:

      • One on the HTTPS port (for example, 443) for the incoming connection requests.
      • The other on port 80 for solving the ACME HTTP-01 challenge.

      A warning icon shows if the HTTPS port is not valid yet.

      gateways gateways

  5. Wait while the certificate arrives. After a short while, the item with port 80 and protocol HTTP disappears, and a green check mark appears next to HTTPS. This shows that the certificate has been issued and is used to secure your domain:

    gateways gateways

  6. Set up routing for your service. Use the gateway, host, and port number you provided in this procedure. For details, see routing and traffic management.

    gateways gateways

  7. Test that your service can be accessed and that it shows the proper certificate.

2.5.7 - Traffic tap

The traffic tap feature of Service Mesh Manager enables you to monitor live access logs of the Istio sidecar proxies. Each sidecar proxy outputs access information for the individual HTTP requests or HTTP/gRPC streams.

The access logs contain information about the:

  • reporter proxy,
  • source and destination workloads,
  • request,
  • response, as well as the
  • timings.

Note: For workloads that are running on virtual machines, the name of the pod is the hostname of the virtual machine.

Traffic tap using the UI

Traffic tap is also available from the dashboard. To use it, complete the following steps.

  1. Select MENU > TRAFFIC TAP.

  2. Select the reporter (namespace or workload) from the REPORTING SOURCE field.

    Service Mesh Manager tap Service Mesh Manager tap

  3. Click FILTERS to set additional filters, for example, HTTP method, destination, status code, or HTTP headers.

  4. Click START STREAMING.

  5. Select an individual log to see its details:

    Service Mesh Manager tap Service Mesh Manager tap

  6. After you are done, click PAUSE STREAMING.

Traffic tap using the CLI

These examples work out of the box with the demo application packaged with Service Mesh Manager. Change the service name and namespace to match your service.

To watch the access logs for an individual namespace, workload or pod, use the smm tap command. For example, to tap the smm-demo namespace, run:

smm tap ns/smm-demo

The output should be similar to:

✓ start tapping max-rps=0
2022-04-25T10:56:47Z outbound frontpage-v1-776d76965-b7w76 catalog-v1-5864c4b7d7-j5cmf "http GET / HTTP11" 200 121.499879ms "tcp://10.10.48.169:8080"
2022-04-25T10:56:47Z outbound frontpage-v1-776d76965-b7w76 catalog-v1-5864c4b7d7-j5cmf "http GET / HTTP11" 200 123.066985ms "tcp://10.10.48.169:8080"
2022-04-25T10:56:47Z inbound bombardier-66786577f7-sgv8z frontpage-v1-776d76965-b7w76 "http GET / HTTP11" 200 145.422013ms "tcp://10.20.2.98:8080"
2022-04-25T10:56:47Z outbound frontpage-v1-776d76965-b7w76 catalog-v1-5864c4b7d7-j5cmf "http GET / HTTP11" 200 129.024302ms "tcp://10.10.48.169:8080"
2022-04-25T10:56:47Z outbound frontpage-v1-776d76965-b7w76 catalog-v1-5864c4b7d7-j5cmf "http GET / HTTP11" 200 125.462172ms "tcp://10.10.48.169:8080"
2022-04-25T10:56:47Z inbound bombardier-66786577f7-sgv8z frontpage-v1-776d76965-b7w76 "http GET / HTTP11" 200 143.590923ms "tcp://10.20.2.98:8080"
2022-04-25T10:56:47Z outbound frontpage-v1-776d76965-b7w76 catalog-v1-5864c4b7d7-j5cmf "http GET / HTTP11" 200 121.868301ms "tcp://10.10.48.169:8080"
2022-04-25T10:56:47Z inbound bombardier-66786577f7-sgv8z frontpage-v1-776d76965-b7w76 "http GET / HTTP11" 200 145.090036ms "tcp://10.20.2.98:8080"
...

Filter on workload or pod

You can tap into specific workloads and pods, for example:

  • Tap the bookings-v1 workload in the smm-demo namespace:

    smm tap --ns smm-demo workload/bookings-v1
    
  • Tap a pod of the bookings app in the smm-demo namespace:

    POD_NAME=$(kubectl get pod -n smm-demo -l app=bookings -o jsonpath="{.items[0]..metadata.name}")
    smm tap --ns smm-demo pod/$POD_NAME
    

At large volume it’s difficult to find the relevant or problematic logs, but you can use filter flags to display only the relevant lines, for example:

# Show only server errors
smm tap ns/smm-demo --method GET --response-code 500,599

The output can be similar to:

2020-02-06T14:00:13Z outbound frontpage-v1-57468c558c-8c9cb bookings:8080 " GET / HTTP11" 503 173.099µs "tcp://10.10.111.111:8080"                               2020-02-06T14:00:18Z outbound frontpage-v1-57468c558c-8c9cb bookings:8080 " GET / HTTP11" 503 157.164µs "tcp://10.10.111.111:8080"                               2020-02-06T14:00:19Z outbound frontpage-v1-57468c558c-4w26k bookings:8080 " GET / HTTP11" 503 172.541µs "tcp://10.10.111.111:8080"                               2020-02-06T14:00:15Z outbound frontpage-v1-57468c558c-8c9cb bookings:8080 " GET / HTTP11" 503 165.05µs "tcp://10.10.111.111:8080"                                2020-02-06T14:00:15Z outbound frontpage-v1-57468c558c-8c9cb bookings:8080 " GET / HTTP11" 503 125.671µs "tcp://10.10.111.111:8080"                               2020-02-06T14:00:19Z outbound frontpage-v1-57468c558c-8c9cb bookings:8080 " GET / HTTP11" 503 101.701µs "tcp://10.10.111.111:8080"

You can also change the output format to JSON, and use the jq command line tool to further filter or map the log entries, for example:

# Show pods with a specific user-agent
smm tap ns/smm-demo -o json | jq 'select(.request.userAgent=="fasthttp") | .source.name'

The output can be similar to:

"payments-v1-7c955bccdd-vt2pg"
"bookings-v1-7d8d76cd6b-f96tm"
"bookings-v1-7d8d76cd6b-f96tm"
"payments-v1-7c955bccdd-vt2pg"
"bookings-v1-7d8d76cd6b-f96tm"

2.5.8 - Managing Kafka clusters

The Streaming Data Manager dashboard is integrated into the Service Mesh Manager dashboard, allowing you to manage and overview your Apache Kafka deployments, including brokers, topics, ACLs, and more. For details on using the dashboard features related to Apache Kafka, see Dashboard.

2.5.9 - Istio resources

Calisti provides a UI in the dashboard where you can view and configure Istio custom resources with ease. Some examples of the usage of the Istio resources are traffic routing configurations, inbound and outbound mesh traffic configurations, Authorization Policy for workloads in the mesh configurations, and adding service entries.

The MENU > ISTIO RESOURCES page of the Service Mesh Manager web interface allows you to:

List Istio resources

To view the list of the Istio resources of your service mesh, navigate to MENU > ISTIO RESOURCES.

List the Istio resources List the Istio resources

For each resource, the following information is shown:

  • Name: The name of the resource.
  • Namespace: The namespace the resource belongs to.
  • Cluster: The cluster the resource belongs to. Mainly useful in multi-cluster scenarios.
  • Type: Type of the Istio custom resources. Here is a list of the resource types:
    • AuthorizationPolicy
    • DestinationRule
    • Gateway
    • IstioMeshGateway
    • PeerAuthentication
    • VirtualServices

To view the details of a particular resource, click the edit icon edit icon edit icon at the end of the row.

Filter Istio resources

To filter the Istio resources of your service mesh, navigate to MENU > ISTIO RESOURCES.

On this page, you can filter the Istio resources by Namespace in which they exist and the Type of the resource. Once you select the fields by which you need to filter the resources, the matching results are displayed.

Filter Istio resources Filter Istio resources

Note: You can filter the resources by multiple Namespace and resource Type.

2.5.9.1 - Create Istio resources

To create new Istio resources for your service mesh, navigate to MENU > ISTIO RESOURCES, click CREATE NEW, and select an Istio resource from the listed Type in the resource selector.

Create new resources Create new resources

Resource selector Resource selector

Depending on your selection, a YAML editor with a pre-populated resource template is displayed. Here, you can customize the YAML and validate the correctness of the syntax and content before creating the custom resource. For resource-specific details on creating the different resources, see the respective section on this page.

Authorization policy

Using the Authorization policy, you can add access control on workloads in the mesh. This policy supports CUSTOM, DENY, and ALLOW actions for access control. For more information, see the Istio documentation.

  1. To create an Authorization policy in your service mesh, select the Authorization policy from the resource selector. An editor with the pre-populated template for the Authorization policy custom resource is displayed.

    Authorization policy YAML editor Authorization policy YAML editor

  2. To choose a particular resource template in the Authorization policy, click on the Template dropdown.

    Authorization policy template dropdown Authorization policy template dropdown

  3. Edit the selected template. To validate the resource’s correctness, click Validate Validate Validate icon.

  4. To create and apply the Authorization policy, click Create.

Destination rule

Destination rule defines policies to configure:

  • load balancing
  • connection pool size from sidecar
  • outlier detection
  • load balancing with subset and sticky sessions

To learn more about how to create the destination rules for in-mesh traffic, see Circuit Breaking.

  1. To create a Destination rule in your service mesh, select the Destination rule from the resource selector. An editor with the pre-populated template for the Destination rule custom resource is displayed.

    Destination rule YAML editor Destination rule YAML editor

  2. To choose a particular resource template in the Destination rule, click on the Template dropdown.

    Destination rule template dropdown Destination rule template dropdown

  3. Edit the selected template. To validate the resource correctness, click Validate Validate Validate icon.

  4. To create and apply the Destination rule, click Create.

Gateway

The Gateway resource describes the port configuration of the gateway deployment that operates at the edge of the mesh and receives incoming or outgoing HTTP/TCP connections. The specification describes a set of ports that should be exposed, the type of protocol to use, TLS configuration – if any – of the exposed ports, and so on.

  1. To create a Gateway resource in your service mesh, select Gateway from the resource selector. An editor with the pre-populated template for the Gateway resource is displayed.

    Gateway  YAML editor Gateway  YAML editor

  2. To choose a particular resource template in Gateway, click on the Template dropdown.

    Gateway policy template dropdown Gateway policy template dropdown

  3. Edit the selected template. To validate the resource’s correctness, click Validate Validate Validate icon.

  4. To create and apply the Gateway resource, click Create.

Mesh gateway

Service Mesh Manager provides a custom resource called IstioMeshGateway. It uses a separate controller to reconcile gateways allowing you to use multiple gateways in multiple namespaces. That way, you can also control who can manage gateways, without having permissions to modify other parts of the Istio mesh configuration. To learn more about mesh gateway in Calisti see Gateways.

  1. To create a Mesh gateway in your service mesh, select Mesh gateway from the resource selector. An editor with the pre-populated template for the Mesh gateway is displayed.

    Mesh gateway YAML editor Mesh gateway YAML editor

    Note: For more information refer Create ingress and Create egress gateways documentation

  2. To choose a particular resource template in the Mesh gateway, click the Template dropdown.

    Mesh gateway template dropdown Mesh gateway template dropdown

  3. Edit the selected template. To validate the resource correctness, click the Validate Validate Validate icon.

  4. To create and apply the Mesh gateway, click Create.

Peer authentication

Peer authentication determines if and how the traffic is routed to the sidecar.

  1. To create a Peer authentication in your service mesh, select Peer authentication from the resource selector. An editor with the pre-populated template for the Peer authentication is displayed.

    Peer authentication YAML editor Peer authentication YAML editor

  2. To choose a particular resource template in Peer authentication, click on the Template dropdown.

    Peer authentication template dropdown Peer authentication template dropdown

  3. Edit the selected template. To validate the resource’s correctness, click the Validate Validate Validate icon.

  4. To create and apply the Peer authentication resource, click Create.

Virtual services

The VirtualService resource defines a set of traffic routing rules to apply when a host is addressed. Each routing rule defines matching criteria for the traffic of a specific protocol. If the traffic matches a routing rule, then it is sent to a named destination service defined in the registry.

  1. To create a Virtual service in your service mesh, select Virtual services from the resource selector. An editor with the pre-populated template for the Virtual services is displayed.

    Virtual services YAML editor Virtual services YAML editor

  2. To choose a particular resource template in Virtual service, click on the Template dropdown.

    Virtual services template dropdown Virtual services template dropdown

  3. Edit the selected template. To validate the resource’s correctness, click the Validate Validate Validate icon.

  4. To apply the Virtual services, click Create.

2.5.9.2 - Manage Istio resources

In this section, you can learn how to manage the Istio resources available in your service mesh.

Navigate to MENU > ISTIO RESOURCES to see the Istio resources in your service mesh. You can edit, delete, and validate individual Istio resources on this page.

Edit Istio resources

  1. To edit a particular Istio resource in your service mesh, click the Edit Edit Edit icon at the end of the row.

  2. Edit the selected resource, and validate the YAML.

  3. To apply the resource, click Apply.

Delete Istio resources

  1. To delete a particular Istio resource of your service mesh, click the Delete Delete Delete icon at the end of the row.

    Delete Delete

  2. If you are absolutely sure that you want to delete this resource, click Delete on the pop-up.

    CAUTION:

     Deleting the resource is irreversible and cannot be undone, as Calisti doesn't store the old resource files.
    

Validate Istio resources

Calisti dashboard now provides a UI to edit the Istio resources. Calisti validates the correctness of the syntax and the data types of the YAML syntactically and semantically, so you can be sure that the YAML described is a valid Istio resource.

  1. Create or edit an Istio resource.

  2. Once done editing or while editing a resource YAML, you may see some validation errors. Click the Validate Validate Validate icon; if there are any errors, then you can see the error and line number at the top of the editor.

    Validate Validate

    While validating the YAML file, you can see the lines with error are underlined. To check the error, hover over the underlined lines, an error message is displayed, as shown in the following illustrations. Validate Validate

    Validate Validate

  3. If you try to save the YAMl file with errors, then a pop-up with validation failed warning opens.
    Validation error Validation error

    To confirm your choice of continuing with errors and saving the YAML, click Confirm. Or click Cancel to fix the errors.

  4. Fix the errors and click Apply or Create to save the resource.

2.5.10 - Configure the dashboard

2.5.10.1 - Exposing the Dashboard

By default, Service Mesh Manager relies on Kubernetes' built-in authentication and proxying capabilities to allow our users to access the Dashboard. In some cases, it makes sense to allow developers to access the Dashboard via a public URL, to make distributing Service Mesh Manager client binaries easier.

You can download the Service Mesh Manager client binaries from the login page:

Download the CLI Download the CLI

Or alternatively, the deployment can use an OIDC-compliant External Provider for authentication so that there’s no need for downloading and installing the CLI binary.

Expose the dashboard

While planning to expose the dashboard, consider the following:

  1. Does the Kubernetes cluster running Service Mesh Manager support LoadBalancer typed services natively? If not, see exposing via NodePort.
  2. Where to terminate the TLS connections? (Should it be terminated by Istio inside the cluster, or should it be terminated by an external LoadBalancer?)
  3. How to manage the TLS certificate for the dashboard? (Do you want to use Let’s Encrypt for certificates, or does your organization have its own certificate authority?)

For some of the examples, we assume that the externalDNS controller is installed and functional on the cluster. If not, make sure to manually set up the required DNS record based on your deployment.

This document covers a few scenarios to address the setups based on the answers to the previous questions.

In this scenario, we are assuming that:

  1. Your Kubernetes cluster supports LoadBalancer typed services to expose services externally.
  2. You use Istio to terminate the TLS connections inside the cluster.
  3. You want to use Let’s Encrypt to manage the certificates.
  4. The externalDNS controller is operational on the cluster.

The dashboard will be exposed on the domain name smm.example.org. To expose Service Mesh Manager on that URL, add the following to the Service Mesh Manager ControlPlane resource:

cat > enable-dashboard-expose.yaml <<EOF
spec:
  smm:
   exposeDashboard:
      meshGateway:
        enabled: true
        service:
          annotations:
            external-dns.alpha.kubernetes.io/hostname: smm.example.org.
        tls:
          enabled: true
          letsEncrypt:
            dnsNames:
            - smm.example.org
            enabled: true
            # server: https://acme-staging-v02.api.letsencrypt.org/directory
EOF
kubectl patch controlplane --type=merge --patch "$(cat enable-dashboard-expose.yaml )" smm
  • If you are using Service Mesh Manager in Operator Mode, then the Istio deployment is updated automatically.
  • If you are using the imperative mode, run the smm operator reconcile command to apply the changes.

The dashboard is now available on the https://smm.example.org/ URL.

Note: When externalDNS is not present on the cluster, make sure that the external name of the MeshGateway service is assigned to the right DNS name. Otherwise, Certificate requests will fail. To check the IP address/name of the service, run the kubectl get service smm-ingressgateway-external --namespace smm-system command. The output should be similar to:

NAME                          TYPE           CLUSTER-IP      EXTERNAL-IP                                                               PORT(S)                                      AGE
smm-ingressgateway-external   LoadBalancer   10.10.157.144   afd8bac546b1e46faab0e284fa0dc5da-580525876.eu-north-1.elb.amazonaws.com   15021:30566/TCP,80:32436/TCP,443:30434/TCP   20h

Terminate TLS on the LoadBalancer

To terminate TLS on the LoadBalancer, in the Service Mesh Manager ControlPlane resource you must set the .spec.smm.exposeDashboard.meshGateway.tls.enabled value to false.

If the Kubernetes Service requires additional annotations to enable TLS, add these annotations to the ControlPlane resource. For example, for AWS/EKS you can use the following settings to terminate TLS with AWS Certificate Manager:

cat > enable-dashboard-expose.yaml <<EOF
spec:
  smm:
   exposeDashboard:
      meshGateway:
        enabled: true
        service:
          annotations:
            service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
            service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:{region}:{user id}:certificate/{id}
            service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
            external-dns.alpha.kubernetes.io/hostname: smm.example.org.
        tls:
          enabled: true
          externalTermination: true
EOF
kubectl patch controlplane --type=merge --patch "$(cat enable-dashboard-expose.yaml )" smm
  • If you are using Service Mesh Manager in Operator Mode, then the Istio deployment is updated automatically.
  • If you are using the imperative mode, run the smm operator reconcile command to apply the changes.

Note: In the previous example, the externalTermination: true instructs Service Mesh Manager to expose a plain HTTP endpoint on port 443 so that the external LoadBalancer can terminate TLS for that port too.

Using NodePort

In this setup the LoadBalancer is managed externally. Each worker node will expose the set ports and you can create a LoadBalancer by pointing it to all the worker node’s relevant port.

To enable NodePort-based exposing of the SMM service, run the following command. This example exposes the HTTP on all worker node’s 40080 port, and HTTPS on port 40443.

Note: The HTTPS port is only available if the TLS settings are explicitly enabled, this example omits that part. Either use the TLS settings from the LoadBalancer example, or check the section on user-provided TLS settings.

cat > enable-dashboard-expose.yaml <<EOF
spec:
  smm:
   exposeDashboard:
      meshGateway:
        enabled: true
        service:
          type: NodePort
          nodePorts:
            http: 40080
            https: 40443
EOF
kubectl patch controlplane --type=merge --patch "$(cat enable-dashboard-expose.yaml )" smm

After that, you can access the Calisti dashboard on the https://<IP-address-of-the-Kubernetes-node>:40443 port.

Tip: You can also set up a LoadBalancer and configure to DNS provider to set a domain name that points to the nodeIP:nodePort of the Kubernetes node. How to do that depends on the Kubernetes and/or DNS provider you are using. Consult their documentation for details.

  • If you are using Service Mesh Manager in Operator Mode, then the Istio deployment is updated automatically.
  • If you are using the imperative mode, run the smm operator reconcile command to apply the changes.

Expose using custom TLS credentials

You can provide a custom TLS secret in the secret called my-own-secret in the smm-system namespace. The following command configures the system to use that for in-cluster TLS termination:

cat > enable-dashboard-expose.yaml <<EOF
spec:
  smm:
   exposeDashboard:
      meshGateway:
        enabled: true
        tls:
          enabled: true
          credentialName: "my-own-secret"
EOF
kubectl patch controlplane --type=merge --patch "$(cat enable-dashboard-expose.yaml )" smm
  • If you are using Service Mesh Manager in Operator Mode, then the Istio deployment is updated automatically.
  • If you are using the imperative mode, run the smm operator reconcile command to apply the changes.

Known limitations in HTTP access

As a security measure, Service Mesh Manager operates only over HTTPS when exposed via an external URL. Make sure that somewhere in the traffic chain some component (Istio or LoadBalancer) terminates the TLS connections, otherwise every login attempt to the dashboard will fail.

2.5.10.2 - OIDC authentication

Service Mesh Manager allows for authenticating towards an OIDC External Provider instead of relying on the kubeconfig based authentication. This is useful when your organization already has an existing OIDC Provider that is used for user authentication on the Kubernetes clusters.

Since Service Mesh Manager does not require the Kubernetes cluster to be relying on OIDC authentication, you (or the operator of the cluster) might need to set up additional Groups in the cluster (for details, see the Setting up user permissions).

If your organization uses a central authentication database which is not OIDC compliant, check out Dex. Dex can act as an OIDC provider and supports LDAP, GitHub, or any OAuth2 identity provider as a backend. For an example on setting up Service Mesh Manager to use GitHub authentication using Dex, see Using Dex for authentication.

Note: Even if OIDC is enabled in Service Mesh Manager, you can access Service Mesh Manager from the command line by running smm dashboard. This is a fallback authentication/access method in case the OIDC provider is down.

Prerequisites

Before starting to set up OIDC authentication, make sure that you have already:

Enable OIDC authentication

To enable the OIDC authentication, patch the ControlPlane resource with the following settings:

cat > oidc-enable.yaml <<EOF
spec:
  smm:
    auth:
      oidc:
        enabled: true
        client:
          id: ${OIDC_CLIENT_ID}
          issuerURL: https://${IDENTITY_PROVIDER_EXTERNAL_URL}
          secret: ${OIDC_CLIENT_SECRET}
EOF

Where:

  • ${OIDC_CLIENT_ID} is the client id obtained from the External OIDC Provider of your organization.
  • ${OIDC_CLIENT_SECRET} is the client secret obtained from the External OIDC Provider of your organization.
  • ${IDENTITY_PROVIDER_EXTERNAL_URL} is the URL of the External OIDC Provider of your organization.
  • If you are using Service Mesh Manager in Operator Mode, then the Istio deployment is updated automatically.
  • If you are using the imperative mode, run the smm operator reconcile command to apply the changes.

After this change, the dashboard will allow logging in using an External OIDC Provider:

Login using OIDC

Set up user and group mappings

After completing the previous step, the users will be able to authenticate via OIDC. However, Service Mesh Manager needs to map them to Kubernetes users. As Service Mesh Manager uses Kubernetes RBAC for access control, it relies on the same mapping as the Kubernetes API Server’s OIDC authentication backend.

You can use the following settings in the ControlPlane resource :

spec:
  smm:
    auth:
      oidc:
        username:
            claim:  # Claim to take the username from
            prefix: # Append this prefix to all usernames
        groups:
            claim:  # Claim to take the user's groups from
            prefix: # Append this prefix to all group names the user has
        requiredClaims:
            <CLAIM>: "<VALUE>"  # Only allow authentication if the given claim has the specified value

If the target cluster has OIDC enabled, the following table helps mapping the OIDC options of the API server to the settings of Service Mesh Manager:

API Server Setting Description ControlPlane setting
--oidc-issuer-url URL of the provider which allows the API server to discover public signing keys. Only URLs which use the https:// scheme are accepted. This URL should point to the level below .well-known/openid-configuration .spec.smm.auth.client.issuerURL
--oidc-client-id A client id that all tokens must be issued for. .spec.smm.auth.client.id
A client secret that all tokens must be issued for. .spec.smm.auth.client.secret
--oidc-username-claim JWT claim to use as the user name. By default sub, which is expected to be a unique identifier of the end user. .spec.smm.auth.username.claim
--oidc-username-prefix Prefix prepended to username claims to prevent clashes with existing names (such as system:users). For example, the value oidc: will create usernames like oidc:jane.doe. If this flag isn’t provided and –oidc-username-claim is a value other than email, the prefix defaults to the value of –oidc-issuer-url. Use the - value to disable all prefixing. .spec.smm.auth.username.prefix
--oidc-groups-claim JWT claim to use as the user’s group. If the claim is present, it must be an array of strings. .spec.smm.auth.groups.claim
--oidc-groups-prefix Prefix prepended to group claims to prevent clashes with existing names (such as system:groups). For example, the value oidc: will create group names like oidc:engineering and oidc:infra. .spec.smm.auth.groups.prefix
--oidc-required-claim A key=value pair that describes a required claim in the ID Token. If set, the claim is verified to be present in the ID Token with a matching value. .spec.smm.auth.requiredClaims
  • If you are using Service Mesh Manager in Operator Mode, then the Istio deployment is updated automatically.
  • If you are using the imperative mode, run the smm operator reconcile command to apply the changes.

Set up user permissions

Note: This step is only required if the target cluster does not already have OIDC authentication set up. If the Kubernetes cluster’s OIDC authentication settings are matching the ones set in the ControlPlane resource, no further action is needed.

By default, when using OIDC authentication, users and groups cannot modify the resources in the target cluster, so you need to create the ClusterRoleBinding right for these Groups or Users.

The groups a given user belongs to is shown in the right hand menu on the user interface:

Menu with group information

In this example, the username is oidc:test@example.org and the user belongs to only one group, called oidc:example-org:test.

If the Kubernetes Cluster is not using OIDC for authentication, create the relevant ClusterRoleBindings against these Groups and Users.

2.5.10.2.1 - Using Dex for authentication

Dex is an identity service that uses OpenID Connect to drive authentication for other apps.

Dex acts as a portal to other identity providers through “connectors.” This lets Dex defer authentication to LDAP servers, SAML providers, or established identity providers like GitHub, Google, and Active Directory. Clients write their authentication logic once to talk to Dex, then Dex handles the protocols for a given backend.

This section shows you how to set up GitHub authentication using Service Mesh Manager. To set up other authentication backends such as Active Directory or LDAP, see the DEX Connectors documentation.

Enable GitHub authentication

As GitHub is an OAuth 2 provider, Service Mesh Manager requires a bridge between OAuth 2 (or any other authentication backend) and OIDC.

Prerequisites

Before starting to set up GitHub authentication to Service Mesh Manager, make sure that you have already:

You need the following information to follow this guide:

  • GITHUB_CLIENT_ID: The Client ID from the GitHub OAuth 2 registration.
  • GITHUB_CLIENT_SECRET: The Client Secret from the GitHub OAuth 2 registration.
  • GITHUB_ORG_NAME: The name of the GitHub organization to authenticate against. If you want to support multiple organizations, consult the Dex manual.
  • GITHUB_ADMIN_TEAM_NAME: The name of the GitHub team that contains the users who receive administrative privileges.
  • DEX_EXTERNAL_URL: The URL where Dex will be exposed. This must be separate from the dashboard URL.
  • SMM_DASHBOARD_URL: The URL where the Service Mesh Manager dashboard is exposed.
  • OIDC_CLIENT_SECRET: The secret to be used between Dex and the Service Mesh Manager authentication backend (can be any random string).

To follow the examples, export these values as environment variables from your terminal, as these will be needed in multiple steps:

export GITHUB_CLIENT_ID=<***>
export GITHUB_CLIENT_SECRET=<***>
export GITHUB_ORG_NAME=my-github-org
export GITHUB_ADMIN_TEAM_NAME=admin
export DEX_EXTERNAL_URL=dex.example.org
export SMM_DASHBOARD_URL=smm.example.org
export OIDC_CLIENT_SECRET=$(openssl rand -base64 32) # or any random string

Create namespace for Dex

Dex will be installed into its own namespace for isolation. Create the namespace for it:

kubectl create ns dex

Dex will be exposed externally using Istio, so enable Istio sidecar injection on the namespace:

kubectl label ns dex istio.io/rev=cp-v115x.istio-system

Create MeshGateway for Dex

GitHub needs to access Dex to invoke the OAuth 2 callback, so that Dex can understand what was the result of the authentication on the GitHub side.

Create an externally available MeshGateway:

cat > dex-meshgateway.yaml <<EOF
apiVersion: servicemesh.cisco.com/v1alpha1
kind: IstioMeshGateway
metadata:
    labels:
        app.kubernetes.io/instance: dex
        app.kubernetes.io/name: dex-ingress
    name: dex-ingress
    namespace: dex
spec:
    istioControlPlane:
        name: cp-v115x
        namespace: istio-system
    deployment:
      metadata:
        labels:
          app.kubernetes.io/instance: dex
          app.kubernetes.io/name: dex-ingress
          gateway-name: dex-ingress
          gateway-type: ingress
      replicas:
        max: 1
        min: 1
        count: 1
    service:
      metadata:
        annotations:
          external-dns.alpha.kubernetes.io/hostname: ${DEX_EXTERNAL_URL}.
      ports:
      - name: http2
        port: 80
        protocol: TCP
        targetPort: 8080
      - name: https
        port: 443
        protocol: TCP
        targetPort: 8443
      type: LoadBalancer
    type: ingress
---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  labels:
    app.kubernetes.io/instance: dex
    app.kubernetes.io/name: dex-ingress
  name: dex-ingress
  namespace: dex
spec:
  selector:
    app.kubernetes.io/instance: dex
    app.kubernetes.io/name: dex-ingress
  servers:
  - hosts:
    - '*'
    port:
      name: http
      number: 80
      protocol: HTTP
  - hosts:
    - '*'
    port:
      name: https
      number: 443
      protocol: HTTPS
    tls:
      credentialName: dex-ingress-tls
      mode: SIMPLE
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  labels:
    app.kubernetes.io/instance: dex
    app.kubernetes.io/name: dex-ingress
  name: dex-ingress
  namespace: dex
spec:
  gateways:
    - dex-ingress
  hosts:
    - '*'
  http:
  - match:
    - uri:
        prefix: /
    route:
    - destination:
        host: dex
        port:
          number: 80
EOF
kubectl apply -f dex-meshgateway.yaml

Get certificates for Dex

The secret referenced in the MeshGateway resource is not yet available. To secure the communication between the end-user’s browser and your Dex installation, enable the Let’s Encrypt support for gateways in Service Mesh Manager:

cat > certs.yaml <<EOF
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: dex-issuer
  namespace: dex
spec:
  acme:
    email: noreply@cisco.com
    preferredChain: ""
    privateKeySecretRef:
      name: smm-letsencrypt-issuer
    server: https://acme-v02.api.letsencrypt.org/directory
    solvers:
    - http01:
        ingress:
          class: nginx
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: dex-tls
  namespace: dex
  annotations:
    acme.smm.cisco.com/gateway-selector: |
      {
        "app.kubernetes.io/instance": "dex",
        "app.kubernetes.io/name": "dex-ingress"
      }
spec:
  dnsNames:
  - ${DEX_EXTERNAL_URL}
  duration: 2160h0m0s
  issuerRef:
    group: cert-manager.io
    kind: Issuer
    name: dex-issuer
  privateKey:
    algorithm: RSA
    encoding: PKCS1
    size: 2048
  renewBefore: 360h0m0s
  secretName: dex-ingress-tls
  usages:
  - server auth
  - client auth
EOF
kubectl apply -f certs.yaml

After executing the previous commands, check that the Certificate has been successfully issued by running the kubectl get certificate command. The output should be similar to:

NAME      READY   SECRET            AGE
dex-tls   True    dex-ingress-tls   24h

If the READY column shows True, then the Certificate has been issued. If not, refer to the Cert Manager documentation for troubleshooting the issue.

Provision Dex

Now you can install Dex onto the namespace using helm. First create a file called dex-values.yaml for the Dex installation:

cat > dex-values.yaml <<EOF
---
config:
  issuer: https://${DEX_EXTERNAL_URL}
  storage:
    type: kubernetes
    config:
      inCluster: true

  connectors:
    - type: github
      id: github
      name: GitHub
      config:
        clientID: $GITHUB_CLIENT_ID
        clientSecret: "$GITHUB_CLIENT_SECRET"
        redirectURI: https://${DEX_EXTERNAL_URL}/callback
        orgs:
        - name: $GITHUB_ORG_NAME
        loadAllGroups: true

  oauth2:
    skipApprovalScreen: true

  staticClients:
    - id: smm-app
      redirectURIs:
        - "https://${SMM_DASHBOARD_URL}/auth/callback"
      name: 'Cisco Service Mesh Manager'
      secret: ${OIDC_CLIENT_SECRET}

service:
  enabled: true
  ports:
    http:
      port: 80

    https:
      port: 443
EOF

Run the following commands to install Dex using these values:

helm repo add dex https://charts.dexidp.io
helm install -n dex dex -f dex-values.yaml dex/dex

Verify that Dex has started successfully by running the kubectl get pods -n dex command. The output should be similar to:

NAME                           READY   STATUS    RESTARTS   AGE
dex-6d879bb86d-pxtvm           2/2     Running   1          20m
dex-ingress-6885b4f747-c5l96   1/1     Running   0          24m

Configure SMM to use OIDC provider

Enable Dex as an OIDC provider to Service Mesh Manager by patching the ControlPlane resource:

cat > smm-oidc-enable.yaml <<EOF
spec:
  smm:
    auth:
      oidc:
        enabled: true
        client:
          id: smm-app
          issuerURL: https://${DEX_EXTERNAL_URL}
          secret: ${OIDC_CLIENT_SECRET}
        groups:
          claim: groups
          prefix: 'oidc:'
        username:
          claim: email
          prefix: 'oidc:'
EOF
kubectl patch --type=merge --patch "$(cat smm-oidc-enable.yaml )" controlplane smm
  • If you are using Service Mesh Manager in Operator Mode, then the Istio deployment is updated automatically.
  • If you are using the imperative mode, run the smm operator reconcile command to apply the changes.

Create user mapping

After logging in, the users will be mapped to have the:

  • Username of oidc:<email-of-the-github-user>, and the
  • groups of oidc:$$GITHUB_ORG_NAME:<team-name> for each of the GitHub Teams the user is a member of.

By default, these users and groups cannot modify the resources in the target cluster, so you need to create the ClusterRoleBinding right for these Groups or Users. For example, to grant administrative access to the users in the $GITHUB_ADMIN_TEAM_NAME GitHub Team, run the following command:

cat > allow-admin-access.yaml <<EOF
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: oidc-admin-access
subjects:
- kind: Group
  name: 'oidc:$GITHUB_ORG_NAME:$GITHUB_ADMIN_TEAM_NAME'
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io
EOF
kubectl apply -f allow-admin-access.yaml

The groups a given user belongs to is shown in the right hand menu on the user interface:

Menu with group information Menu with group information

In this example, the username is oidc:test@example.org and the user belongs to only one group, called oidc:example-org:test.

Verify login

To test that the login works, navigate to the URL where the Service Mesh Manager dashboard is exposed ($SMM_DASHBOARD_URL), and select Sign in with OIDC.

Login using OIDC Login using OIDC

2.6 - Deploy custom application into the mesh

After you have one or more clusters attached to the mesh, here are some best practices to deploy applications on multiple clusters.

Deploy demo application

If you just want to get started with any demo application in a multi-cluster mesh, the easiest is to install the built-in Service Mesh Manager demo application.

  1. You can deploy the demo application from a command line.

    • The following command deploys every service of the demo application to the primary Calisti cluster.

      smm demoapp install
      
    • To deploy the demo application in a distributed way to multiple clusters, run:

      smm demoapp install -s frontpage,catalog,bookings
      smm -c <PEER_CLUSTER_KUBECONFIG_FILE> demoapp install -s movies,payments,notifications,analytics,database --peer
      

    After installation, the demo application automatically starts generating traffic, and the dashboard shows you the data flow. (If it doesn’t, run the smm demoapp load start command, or Generate load on the UI. If you want to stop generating traffic, run smm demoapp load stop.)

  2. Open the dashboard and look around.

    smm dashboard
    

Deploy custom application

Here is how you can deploy your own application into the service mesh with Service Mesh Manager. The procedure covers both single-cluster and multi-cluster meshes.

  1. Create the namespace where you would like to run your applications on every cluster. The examples use the test namespace:

    kubectl create ns test
    
  2. In the cluster where Service Mesh Manager is installed, enable sidecar injection in that namespace. This places an istio.io/rev label and sets it to the appropriate Istio control plane. Use one of the following methods to enable sidecar injection:

    • Use the Service Mesh Manager CLI tool:

      smm sidecar-proxy auto-inject on test
      

      If there are multiple control planes, use the --controlplane <name-of-the-controlplane> flag to specify which one to use. Typically that’s the one your other applications use, or the newer one if you are testing application deployment after a Calisti update that included a new Istio control plane.

    • Use kubectl:

      kubectl label ns test istio.io/rev=cp-v115x.istio-system
      
    • Use the Service Mesh Manager dashboard.

    • Using GitOps methods: See Restart Demo applications for an example.

    In a multi-cluster mesh, Service Mesh Manager adds the same label to this namespace on all other clusters. (If not, check the istio-operator pod logs on the particular cluster for any potential issues.)

  3. Deploy your application on the clusters as you would usually do.

    In a multi-cluster mesh make sure to deploy all Kubernetes service resources on all clusters attached to the mesh, even if pods are only present on a subset of the clusters. This is needed for Istio to do proper routing across clusters.

    For an example on how to deploy the Calisti demo application using GitOps methods, see the Service Mesh Manager GitOps installation guides. - Single-cluster scenario - Multi-cluster mesh scenario

  4. Make sure that sidecar pods are indeed injected to your application pods.

    If not, check the official Istio documentation for potential issues.

  5. Send traffic to your applications, then open the dashboard and look around.

    smm dashboard
    

2.7 - Mesh Management

2.7.1 - Multi-cluster - single mesh

Multi-cluster overview

Service Mesh Manager is able to construct an Istio service mesh that spans multiple clusters. In this scenario you combine multiple clusters into a single service mesh that you can manage from either a single or from multiple Istio control planes.

multi-cluster multi-cluster

Single mesh scenarios are best suited to use cases where clusters are configured together, sharing resources, and are generally treated as one infrastructural component within an organization.

Istio clusters and SMM clusters

When you are working with Service Mesh Manager in a multi-cluster scenario, you must understand the following concepts:

  1. Every Istio cluster you attach to the mesh is either a remote Istio cluster or a primary Istio cluster. Remote Istio clusters don’t have a separate Istio control plane, while primary Istio clusters do. To understand the difference between the remote Istio and primary Istio clusters, see the Istio control plane models document.
  2. When you install Service Mesh Manager on a cluster, it installs a primary Istio cluster. This cluster is effectively the primary Service Mesh Manager cluster.
  3. Even if you add multiple primary Istio clusters to the mesh, Service Mesh Manager runs only on the primary Service Mesh Manager cluster (even though some of its components are replicated to the other clusters).
  4. You can deploy Service Mesh Manager in an active-passive model. The active Service Mesh Manager control plane has all components installed on a primary Istio cluster. The passive Service Mesh Manager control plane has only a limited number of components installed on a primary or remote Istio cluster. Only one Service Mesh Manager control plane is active, all other Service Mesh Manager control planes are passive.

This means that when using the Service Mesh Manager CLI (for example, to attach or detach a new cluster), you must run it in the context of the active Service Mesh Manager cluster, even if there are multiple primary Istio clusters in the mesh.

Creating a multi-cluster mesh

Read the multi-cluster installation guide for details on how to set up a multi-cluster mesh.

2.7.1.1 - Cluster network

A multi-cluster mesh connects multiple clusters into a single service mesh. The topology of the mesh – how the different clusters are grouped into networks and how each cluster connects to the mesh – determines how the clusters connect to each other and how the pods, services, and workloads can access resources in other clusters.

Communication between clusters

In a multi-cluster mesh, every cluster belongs to a specific network. Clusters belonging to the same mesh can access the services of each other, but how this happens depends on which network the cluster belongs to.

  • If the clusters belong to the same network, their pods can access each other directly over a flat network, without using a cluster gateway.
  • If the clusters belong to different networks, the services of the cluster can be accessed only through the gateway of the cluster. Since Service Mesh Manager assigns each cluster to its own network by default, this is the default behavior.

The networkName label of the cluster determines which network the cluster belongs to. By default, every cluster belongs to its own network, where the name of the network is the name of the cluster.

Note: If the name of the cluster cannot be used as a Kubernetes resource name (for example, because it contains the underscore, colon, or another special character), you must manually specify a name to use when you are attaching the cluster to the service mesh. For example:

smm istio cluster attach <PEER_CLUSTER_KUBECONFIG_FILE> --name <KUBERNETES_COMPLIANT_CLUSTER_NAME> --active-istio-control-plane

Otherwise, the following error occurs when you try to attach the cluster:

could not attach peer cluster: graphql: Secret "example-secret" is invalid: metadata.name: Invalid value: "gke_gcp-cluster_region": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.'**

You can specify the network of the cluster when you are attaching the cluster to the mesh.

Assigning clusters to different networks allows you to optimize the topology of your mesh network. Depending on your cloud provider, there might be differences in cross-cluster latencies and transfer costs between the different connection types.

Network connectivity requirements

For a multi-cluster scenario the necessary networking configurations are listed in this section.

  • If the clusters belong to the same network, then network connectivity should be fine nothing else needs to be done.
  • If the clusters belong to different networks, and the endpoints in the networks are publicly accessible without restrictions, then again nothing needs to be done.
  • If the clusters belong to different networks, but there are restrictions on which endpoints can be accessed, at least the following endpoints must be accessible for a proper multi-cluster setup with Service Mesh Manager:
    • From all clusters:
    • From the primary cluster(s):
      • All peer clusters' k8s API server address
      • All IP addresses or host names of the meshexpansion-gateway LoadBalancer type services on the peer clusters on port 15443
    • From peer clusters:
      • IP address or host name of the meshexpansion-gateway LoadBalancer type service on the primary cluster(s) on ports 15443,15012
      • IP address or host name of the meshexpansion-gateway LoadBalancer type service on the primary cluster where Service Mesh Manager is installed on ports 50600,59411

CAUTION:

To change the network of a cluster already attached to the mesh, you have to detach and then re-attach the cluster. Simply updating the networkName label is NOT enough. To detach a cluster, see Detach a cluster from the mesh.

2.7.1.2 - Attach a new cluster to the mesh

Service Mesh Manager automates the process of creating the resources necessary for the peer cluster, generates and sets up the kubeconfig for that cluster, and attaches the cluster to the mesh.

Note: If you are using Service Mesh Manager with a commercial license in a multi-cluster scenario, Service Mesh Manager automatically synchronizes the license to the attached clusters. If the peer cluster already has a license, it is automatically deleted and replaced with the license of the primary Service Mesh Manager cluster. Detaching a peer cluster automatically deletes the license from the peer cluster.

To attach a new cluster to the service mesh managed by Service Mesh Manager, complete the following steps. For an overview of the network settings of the cluster, see Cluster network.

Prerequisites

  • The Service Mesh Manager CLI tool installed on your computer.
  • Access to the KUBECONFIG file of the cluster you want to attach to the service mesh.
  • Access to the KUBECONFIG file of the cluster that runs the primary Service Mesh Manager service.
  • Network connectivity properly configured between the participating clusters.

Steps

  1. Find out the name of the network you want to attach the cluster to.

    • By default, every cluster belongs to its own network, where the name of the network is the name of the cluster.
    • If you want to attach the cluster to an existing network, you must manually specify the name of the network when you are attaching the cluster to the service mesh using the --network-name option in the next step.

    If you have to specify the network name manually, note the name of the network you want to use. You can check the existing network names using the smm istio cluster status command.

  2. On the primary Service Mesh Manager cluster, attach the peer cluster to the mesh using one of the following commands.

    Note: To understand the difference between the remote Istio and primary Istio clusters, see the Istio control plane models section in the official Istio documentation. The short summary is that remote Istio clusters do not have a separate Istio control plane, while primary Istio clusters do.

    The following commands automate the process of creating the resources necessary for the peer cluster, generate and set up the kubeconfig for that cluster, and attach the cluster to the mesh.

    • To attach a remote Istio cluster with the default options, run:

      smm istio cluster attach <PEER_CLUSTER_KUBECONFIG_FILE>
      
    • To attach a primary Istio cluster (one that has an active Istio control plane installed), run:

      smm istio cluster attach <PEER_CLUSTER_KUBECONFIG_FILE> --active-istio-control-plane
      

      Note: If the name of the cluster cannot be used as a Kubernetes resource name (for example, because it contains the underscore, colon, or another special character), you must manually specify a name to use when you are attaching the cluster to the service mesh. For example:

      smm istio cluster attach <PEER_CLUSTER_KUBECONFIG_FILE> --name <KUBERNETES_COMPLIANT_CLUSTER_NAME> --active-istio-control-plane
      

      Otherwise, the following error occurs when you try to attach the cluster:

      could not attach peer cluster: graphql: Secret "example-secret" is invalid: metadata.name: Invalid value: "gke_gcp-cluster_region": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.'**
      
    • To override the name of the cluster, run:

      smm istio cluster attach <PEER_CLUSTER_KUBECONFIG_FILE> --name <kubernetes-compliant-cluster-name>
      
    • To specify the network name, run:

      smm istio cluster attach <PEER_CLUSTER_KUBECONFIG_FILE> --network-name <network-name>
      

    Note: If you are using Service Mesh Manager with a commercial license in a multi-cluster scenario, Service Mesh Manager automatically synchronizes the license to the attached clusters. If the peer cluster already has a license, it is automatically deleted and replaced with the license of the primary Service Mesh Manager cluster. Detaching a peer cluster automatically deletes the license from the peer cluster.

  3. Wait until the peer cluster is attached. Attaching the peer cluster takes some time, because it can be completed only after the ingress gateway address works. You can verify that the peer cluster is attached successfully with the following command:

    smm istio cluster status
    

    The process is finished when you see Available in the Status field of all clusters.

  4. (Optional) Open the Service Mesh Manager dashboard and verify that the new peer cluster is visible on the MENU > TOPOLOGY page.

2.7.1.3 - Detach a cluster from the mesh

To detach a cluster from the service mesh managed by Service Mesh Manager, complete the following steps.

Prerequisites

  • The Service Mesh Manager CLI tool installed on your computer.
  • Access to the KUBECONFIG file of the cluster you want to detach from the service mesh.
  • Access to the KUBECONFIG file of the cluster that runs the primary Service Mesh Manager service.

Steps

  1. On the primary Service Mesh Manager cluster, detach the peer cluster from the mesh by running the following command.

    smm istio cluster detach <PEER_CLUSTER_KUBECONFIG_FILE>
    
  2. Wait until the peer cluster is detached. You can check the status of peer clusters by running the following command:

    smm istio cluster status
    
  3. (Optional) Navigate to the MENU > MESH page of the Service Mesh Manager dashboard and verify that the cluster you have detached is not shown in the Clusters list.

2.7.1.4 - Cluster registry controller

Service Mesh Manager uses the cluster registry controller to synchronize any Kubernetes resources across the clusters in a multi-cluster setup. That way, the necessary resources are automatically synchronized, so the multi-cluster topologies of Istio and the multi-cluster features (for example, observability, multi-cluster topology view, tracing, traffic tapping) of Service Mesh Manager work in a multi-cluster environment.

In addition, you can use the resource synchronization capabilities of Service Mesh Manager to synchronize any Kubernetes resources on demand between the clusters of your mesh.

Overview

When installing Service Mesh Manager in imperative mode from the command line, Service Mesh Manager automatically deploys the cluster registry controller to every cluster of the mesh, and creates the Cluster CRs, with default values that are suitable for most common scenarios.

The Cluster resource represents a Kubernetes cluster. The cluster registry controller fills the status of the Cluster CR with cluster related metadata, and distributes the Cluster CRs to all participating Kubernetes clusters. In addition, the credentials for all clusters are automatically distributed to all clusters (these are usually stored in Kubernetes secrets) to help bootstrap the cluster group itself.

Note: You have to manually configure the Cluster CR or the operator’s Helm values file if your clusters have some unique networking requirements, for example, by setting the KubernetesAPIEndpoints of the cluster.

In such a multi-cluster setup, here is how the cluster registry controller works:

  • The controller only writes to the local cluster where it is deployed to
  • The controller only reads from peer clusters

By default, the required resources are kept in sync between all clusters. You can define your own ResourceSyncRule resources to sync other Kubernetes resources between these clusters. The ResourceSyncRules can be further adjusted to specify from which clusters and to which clusters certain resource should be synced.

Service Mesh Manager operator mode

When you are using Service Mesh Manager in operator mode in a multi-cluster environment, note the following points:

  1. You must explicitly enable the cluster registry in the ControlPlane CR or the operator’s Helm values file.

    Replace <cluster-name> with the name of your cluster. The cluster name format must comply with the RFC 1123 DNS subdomain/label format (alphanumeric string without “_” or “.” characters). Otherwise, you get an error message starting with: Reconciler error: cannot determine cluster name controller=controlplane, controllerGroup=smm.cisco.com, controllerKind=ControlPlane

    spec:
      clusterName: <cluster-name>
      clusterRegistry:
        enabled: true
        namespace: cluster-registry
    
  2. To create trust between the clusters, you must exchange the Secret CRs of the clusters. For an example, see GitOps - multi-cluster installation.

Networking requirements

The cluster registry controller instances running on the clusters must be able to reach the API server of every other cluster in the cluster group, so every cluster can read the relevant resources from the other clusters.

The cluster registry controller pod connects directly to Kubernetes API server of the peer clusters. This works automatically if the API servers are publicly available. Otherwise, configure a reachable endpoint for them in the Cluster CR spec. (For security reasons, we recommend to make the API server addresses available only from the IP ranges of the peer clusters.)

ResourceSyncRule example usage

Sync everywhere

  1. Create a sample secret on the third cluster, which will be copied around:

    apiVersion: v1
    kind: Secret
    metadata:
      name: test-secret
    data: {}
    
  2. Create a ResourceSyncRule on the first cluster to synchronize the secret to all clusters:

    apiVersion: clusterregistry.k8s.cisco.com/v1alpha1
    kind: ResourceSyncRule
    metadata:
      name: test-secret-sink
    spec:
      groupVersionKind:
        kind: Secret
        version: v1
      rules:
      - match:
        - objectKey:
          name: test-secret
          namespace: cluster-registry
    

    This ResourceSyncRule resource itself and the secret resource as well should appear shortly on all clusters of the cluster group.

    At this point, if a secret from any of the clusters (except from the one where it originates from) is deleted or modified, it will be synced back immediately by the cluster registry controller.

Sync to a set of clusters

Cluster registry controller can be configured to sync only to specific clusters in the cluster group (instead of all of them). To do that, you must add an annotation to the cluster where you don’t want to sync to.

  1. Add the following annotation to the ResourceSyncRule on the first cluster:

    annotations:
      cluster-registry.k8s.cisco.com/resource-sync-disabled: "true"
    
  2. Delete the ResourceSyncRule from the second cluster.

    The ResourceSyncRule resource will not be recreated because of the annotation, which was just added.

    If the annotation is not added as described in the previous step, then the ResourceSyncRule will be recreated.

  3. Delete the test-secret from the second cluster.

    The secret will not be recreated because the ResourceSyncRule resource does not exist on the second cluster.

Sync from a set of clusters

Cluster registry controller can be configured, to only sync from specific clusters in the cluster group (instead of all of them). To do that, you must create a ClusterFeature resource on the clusters where you want to sync from and add a clusterFeatureMatch field to the ResourceSyncRule resources on the clusters where you want to sync to.

  1. Add the following field to the ResourceSyncRule spec on the first cluster:

    clusterFeatureMatch:
    - featureName: test-secret-feature
    

    This causes that the secret will only be synced from clusters where there are ClusterFeature resources defined.

    At this point, there is no ClusterFeature present on any cluster, so if the secret would be deleted now from the first cluster, it would not be recreated.

  2. Apply the following ClusterFeature to the third cluster:

    apiVersion: clusterregistry.k8s.cisco.com/v1alpha1
    kind: ClusterFeature
    metadata:
      name: test-secret-source
    spec:
      featureName: test-secret-feature
    
  3. Delete the test-secret from the first cluster.

    It should be recreated now, because it can sync the secret from the third cluster.

RBAC considerations

The cluster registry controller only writes to local clusters and only reads from peer clusters. By default, it has access to read namespace, node and secret resources. If you want to sync other resources, expand the RBAC rules of the operator as needed (it uses aggregated ClusterRoles).

  • On the cluster, where the resources are read from (usually where ClusterFeature resources are present) a ClusterRole should be defined with the correct read roles and the following label should be added:

    labels:
      cluster-registry.k8s.cisco.com/reader-aggregated: "true"
    
  • On the cluster, where the resources are written to (usually where ResourceSyncRule resources are present) a ClusterRole should be defined with the correct write roles and the following label should be added:

    labels:
      cluster-registry.k8s.cisco.com/controller-aggregated: "true"
    

2.7.2 - Traffic management

2.7.2.1 - Circuit Breaking

Circuit Breaking is a pattern for creating resilient microservices applications. In microservices architecture, services are deployed across multiple nodes or clusters and have different response times or failure rate. Downstream clients need to be protected from excessive slowness of upstream services. Upstream services, in turn, must be protected from being overloaded by a backlog of requests.

A circuit breaker can have three states:

  • Closed: requests succeed or fail until the number of failures reach a predetermined threshold, with no interference from the breaker. When the threshold is reached, the circuit breaker opens.
  • Open: the circuit breaker trips the requests, which means that it returns an error without attempting to execute the call
  • Half open: the failing service is given time to recover from its broken behavior. If requests continue to fail in this state, then the circuit breaker is opened again and keeps tripping requests. Otherwise, if the requests succeed in the half open state, then the circuit breaker will close and the service will be allowed to handle requests again.

Circuit breaking in Istio Circuit breaking in Istio

Service Mesh Manager is using Istio’s - and therefore Envoy’s - circuit breaking feature under the hood.

Circuit breaking using the dashboard

Set circuit breaking

To configure a circuit breaker for a service, complete the following steps.

  1. Select the service on the TOPOLOGY or the SERVICES page. Destination rule overview Destination rule overview

  2. Select the DESTINATION RULE tab, then click CREATE NEW. Destination rule Destination rule

  3. If you have already configured a circuit breaker using the destination rule for the service and want to modify it, click Edit .

  4. Configure the template and parameters of the circuit breaker:

Set outlier detection

Outlier detection controls the eviction of unhealthy services from the load-balancing pool. OutlierDetection controls the number of errors before a service is ejected from the connection pool, and using he template the minimum ejection duration and maximum ejection percentage can be set.

  • To set the outlier detection for a service, select the Outlier Detection template in the Template dropdown. Outlier detection Outlier detection
  • Modify the YAML configuration to set the parameters for the outlier detection. Outlier Outlier
  • Once the YAML configuration is modified per the service needs, to validate the correctness of the YAML resource, click ValidateValidate .
  • Verify that there are no errors. Then, to create the circuit breaking with the destination rule for the selected service, click Create.

Set connection pool

Connection pool settings sets the volume of connections that can be configured for a service. ConnectionPoolSettings controls the maximum number of requests, pending requests, retries or timeout.

  • To set the connection pool for a service, select the Connection Pool template in the Template dropdown. Connection pool Connection pool
  • Modify the YAML configuration to set the parameters for the connection pool. connection connection
  • Once the YAML configuration is modified per the service needs, to validate the correctness of the YAML resource, click ValidateValidate .
  • Verify that there are no errors. Then, to create the connection pool breaking with the destination rule for the selected service, click Create.

Monitor circuit breaking

With this configuration you’ve just set, when traffic begins to flow from two connections simultaneously, the circuit breaker starts to trip requests. In the Service Mesh Manager UI, you can see the tripped requests in the graph’s red edges on the MENU > TOPOLOGY page. Click on the service to see two live Grafana dashboards, which specifically show the circuit breaker trips and help you learn more about the errors involved.

The first dashboard details the percentage of total requests that were tripped by the circuit breaker. When there are no circuit breaker errors, and your service works as expected, this graph shows 0%. Otherwise, it shows the percentage of the requests that were tripped by the circuit breaker.

The second dashboard provides a breakdown of the trips caused by the circuit breaker by source. If no circuit breaker trips occurred, there are no spikes in this graph. Otherwise, it shows which service caused the circuit breaker to trip, when, and how many times. You can track for any malicious clients in this graph.

Circuit Breaking trip Circuit Breaking trip

These are live Grafana dashboards customized to display circuit breaker-related information.

Remove circuit breaking

To remove circuit breaking, navigate to the service, select Destination rule for circuit breaker, and click Delete .

2.7.2.2 - External Services

An Istio service mesh has a few different ways of reaching services that are external to the mesh. External services are everything that are not defined in Istio’s internal service registry, that is, services which are outside of the mesh. By default Istio permits requests to unknown or external services. While using permissive configuration for testing purposes is ok, in a production environment a stricter configuration might be necessary.

Control access to external services Control access to external services

Note: Service Mesh Manager is using Istio’s - and therefore Envoy’s - egress control feature under the hood.

Change the default policy

You can change the default policy for outbound traffic by running the smm istio outbound-traffic-policy <setting> command.

  • To restrict outbound traffic to known endpoints, run the following command.

    smm istio outbound-traffic-policy restricted
    

    Expected output:

    mesh wide outbound traffic policy is set to 'REGISTRY_ONLY'
    

    To permit access to an external service, see Allow access only to registered services.

  • To permit all outbound traffic without restrictions, run the following command. (This is the default setting.)

    smm istio outbound-traffic-policy allowed
    

    Expected output:

    mesh wide outbound traffic policy is set to 'ALLOW_ANY'
    

    Note: Running smm istio outbound-traffic-policy returns the current setting of the traffic policy. If you haven’t changed the outbound traffic policy yet, it returns “mesh wide outbound traffic policy is not found”, which means that the default Istio setting is used, which is ALLOW_ANY (permits outbound traffic without any restrictions).

Allow access only to registered services

To allow access only to registered external services, complete the following steps.

Note: Accessing external HTTPS services comes with a few constrains.

  • All the HTTP-related information like method, URL path, response code, is encrypted so Istio cannot see and cannot monitor that information for HTTPS.
  • Service Mesh Manager’s dashboard shows HTTPS as TCP since detailed HTTP-related information is not available.
  1. Change the default outbound traffic policy to block unknown services.

    smm istio outbound-traffic-policy restricted
    
  2. Create ServiceEntry resources for the services you want to permit access to.

    ServiceEntry resources add additional entries into Istio’s internal service registry, so that auto-discovered services in the mesh can access/route to these manually specified services. A service entry describes the properties of a service (DNS name, VIPs, ports, protocols, endpoints). These services could be external to the mesh (for example, web APIs) or mesh-internal services that are not part of the platform’s service registry. For more information, see the documentation of the ServiceEntry resource.

    For example, the following command creates a ServiceEntry resource that allows HTTP access to the httpbin.org site from the smm-demo namespace.

    kubectl apply -f - <<EOF
    apiVersion: networking.istio.io/v1alpha3
    kind: ServiceEntry
    metadata:
      name: httpbin.org
      namespace: smm-demo
    spec:
      hosts:
      - httpbin.org
      - www.httpbin.org
      ports:
      - number: 80
        name: http
        protocol: HTTP
      resolution: DNS
      location: MESH_EXTERNAL
      EOF
    
  3. (Optional) Test that your pods can access the external service. For example, if you have installed the SMM demo application, you can change the notifications-v1 deployment by running:

    kubectl -n smm-demo set env deployment/notifications-v1 'REQUESTS=http://httpbin.org/get#1'
    

    Expected output:

    deployment.extensions/notifications-v1 env updated
    

    Once the notifications pods are restarted, the Service Mesh Manager Dashboard displays outgoing calls to httpbin.org

Note: To route outgoing traffic through an egress gateway, see Create egress gateway.

Remove access to an external service

To remove access to an external service, delete the ServiceEntry resource of the service, for example:

kubectl delete serviceentry -n smm-demo httpbin.org

Expected output:

serviceentry.networking.istio.io "httpbin.org" deleted

Traffic optimization with SD-WAN

Modern microservices applications rely on an efficient network. Also, these microservices often communicate not only among themselves but also with external services, which in many cases are offered by a third party. Optimizing the network between the local application components and the remote services they might be consuming is critical.

Fortunately, in the Istio service mesh these external dependencies are well defined, making it possible for modern Software-Defined Wide Area Network (SD-WAN) solutions to automatically consume information about those external application dependencies and optimize the connectivity between the service mesh and the external services.

Integrate Istio with SD-WAN Integrate Istio with SD-WAN

Why use SD-WAN with Istio

Istio allows you to improve the security of your infrastructure by managing access to external services. In addition to that, integrating Istio with an SD-WAN solution provides the following benefits:

  • Traffic optimization: automatically select the best path for the traffic
  • Minimized external service latency: Latency can be optimized per service and per location
  • Increased external service availability: SD-WAN provides transparent path failover in case of an error
  • No extra Istio configuration is needed: After an initial setup everything is automatic, based on the Istio ServiceEntry custom resources with MESH_EXTERNAL values

Integrate Istio with SD-WAN

To integrate your Service Mesh Manager deployment with Cisco SD-WAN, follow the documentation of the open source Egress-Watcher. The integration consists of the following high level steps:

  1. Configure Cisco vManage (a component of Cisco SD-WAN).
  2. Download and install Egress-Watcher on your primary Service Mesh Manager cluster.
  3. Configure Egress-Watcher.

2.7.2.3 - Fault Injection

Fault injection is a system testing method which involves the deliberate introduction of network faults and errors into a system. It can be used to identify design or configuration weaknesses, and to ensure that the system can handle faults and recover from error conditions.

With Service Mesh Manager, you can inject failures at the application layer to test the resiliency of the services. You can configure faults to be injected into requests that match specific conditions to simulate service failures and higher latency between services. There are two types of failures:

  • Delay adds a time delay before forwarding the requests, emulating various failures such as network issues, an overloaded upstream service, and so on.

  • Abort aborts the HTTP request attempts and returns error codes to a downstream service, giving the impression that the upstream service is faulty.

Service Mesh Manager uses Istio’s - and therefore Envoy’s - fault injection feature under the hood.

Fault injection using the UI

Set fault injection

To inject fault and test the resiliency of your service, complete the following steps.

  1. Select the service on the TOPOLOGY or the SERVICES page.

    Virtual service overview Virtual service overview

  2. Select VIRTUAL SERVICE, then click CREATE NEW.

    Virtual service Virtual service

  3. If you have already configured a fault injection configuration using the VirtualService for the service and want to modify it, click Edit .

  4. Configure the template and parameters of the fault injection:

Injecting an abort fault

HTTP abort fault injection prematurely aborts the request and returns the amount pre-specified error code describe in httpStatus. You can specify the percentage of the requests to be aborted.

  • To set the HTTP aborts, select the Fault Injection Abort Template in the Template dropdown.

    Abort Abort

  • Modify the YAML configuration to set the parameters for the abort fault injection.

    Abort Abort

  • Once the YAML configuration is modified per the service needs, to validate the correctness of the YAML resource, click ValidateValidate .

  • Verify that there are no errors. Then, to create the abort fault injection with the virtual service rule for the selected service, click Create.

Injecting a delay fault

Delay fault injection injects latency in the request forwarding path. You can specify the percentage of the requests to be delayed and the fixedDelay to indicate the number of delays.

  • To introduce network latency or overload the upstream traffic, select the Fault Injection Delay Template in the Template dropdown.

    Delay Delay

  • Modify the YAML configuration to set the parameters for the delay fault injection.

    Delay Delay

  • Once the YAML configuration is modified per the service needs, to validate the correctness of the YAML resource, click ValidateValidate .

  • Verify that there are no errors. Then, to create the delay fault injection with the virtual service rule for the selected service, click Create.

Remove fault injection

To remove fault injection, navigate to the service, then select VIRTUAL SERVICE and click Delete .

2.7.2.4 - Ingress gateways

Ingress gateways define an entry point into your Istio mesh for incoming traffic.

Multiple ingress gateways in Istio

You can configure gateways using the Gateway and VirtualService custom resources of Istio, and the IstioMeshGateway CR of Service Mesh Manager.

  • The Gateway resource describes the port configuration of the gateway deployment that operates at the edge of the mesh and receives incoming or outgoing HTTP/TCP connections. The specification describes a set of ports that should be exposed, the type of protocol to use, TLS configuration – if any – of the exposed ports, and so on. For more information about the gateway resource, see the Istio documentation.
  • The VirtualService resource defines a set of traffic routing rules to apply when a host is addressed. Each routing rule defines matching criteria for the traffic of a specific protocol. If the traffic matches a routing rule, then it is sent to a named destination service defined in the registry. For example, it can route requests to different versions of a service or to a completely different service than was requested. Requests can be routed based on the request source and destination, HTTP paths and header fields, and weights associated with individual service versions. For more information about VirtualServices, see the Istio documentation.
  • Service Mesh Manager provides a custom resource called IstioMeshGateway and uses a separate controller to reconcile gateways, allowing you to use multiple gateways in multiple namespaces. That way you can also control who can manage gateways, without having permissions to modify other parts of the Istio mesh configuration.

Using IstioMeshGateway, you can add Istio ingress or egress gateways in the mesh and configure them. When you create a new IstioMeshGateway CR, Service Mesh Manager takes care of configuring and reconciling the necessary resources, including the Envoy deployment and its related Kubernetes service.

Note: Service Mesh Manager automatically creates an ingress gateway called smm-ingressgateway and an istio-meshexpansion-cp-v115x. The smm-ingressgateway serves as the main entry point for the services of Service Mesh Manager, for example, the dashboard and the API, while the meshexpansion gateway is used in multi-cluster setups to ensure communication between clusters for the Istio control plane and the user services.

Do not use this gateway for user workloads, because it is managed by Service Mesh Manager, and any change to its port configuration will be overwritten. Instead, create a new mesh gateway using the IstioMeshGateway custom resource.

Service Mesh Manager provides gateway management and observability, allowing you to:

For details on how to set up gateways using Service Mesh Manager, Gateways.

2.7.2.5 - Mirroring

Traffic mirroring or shadowing is a feature that can be used to test new versions of a service with real traffic before rolling it out to the users with minimal risk or to monitor and audit traffic of existing services. Mirroring sends a copy of live traffic to a mirrored service.

Mirroring using the UI

To configure mirroring from the dashboard, complete the following steps.

  1. Select the service on the MENU > SERVICES or the MENU > TOPOLOGY page.

  2. Select DESTINATION RULE > CREATE NEW.

  3. Use the Subsets template to configure subsets for the service to use as destinations to route and mirror traffic.

    subset subset

  4. Select VIRTUAL SERVICE > CREATE NEW.

    Virtual service Virtual service

  5. To configure the mirroring of traffic, start with the HTTP Destination Route Template. The mirror configuration is scoped to the route configuration. In the following example, 100% of the traffic to the bookings service is sent to the v1 workloads. The mirroring configuration indicates that 100% of the traffic should also be mirrored to the v2 workloads.

    Configure Configure

  6. Click Create. The mirroring rule is listed in the virtual service tab.

    Display Display

2.7.2.6 - Restrict Outbound Traffic of Workloads

By default, each Envoy proxy receives information about every workload in the mesh. This can result in high memory usage in the Envoy proxies. Service Mesh Manager can help you limit the allowed outbound connections of a workload or a whole namespace to reduce the memory requirements, especially in larger meshes.

You can set a restriction manually, or you can rely on Service Mesh Manager to give you a recommended configuration based on the current network traffic. For details on how this works, see our blog post about the sidecar resource.

Service Mesh Manager is using Istio’s - and therefore Envoy’s - sidecar feature under the hood.

Restrict outbound traffic from the command line

The following sections describe how to manage outbound traffic restrictions using the smm command-line tool. If you want to use the Service Mesh Manager web interface instead, see Restrict outbound traffic using the UI.

Get an outbound traffic restriction recommendation

  1. To get a recommendation for a specific workload based on the current traffic, run the following command. (Replace the smm-demo namespace and the name of the workload as needed for your environment.)

    smm sidecar-proxy egress recommend smm-demo --workload payments-v1
    

    Sample output:

    Recommended sidecar egress rules for smm-demo/payments-v1
    
    Sidecar               Selector        Hosts                                                        Bind  Port  Capture Mode
    smm-demo-rmoy8  app="payments"  ./notifications.smm-demo.svc.cluster.local                   -
                        version="v1"    istio-system/istio-telemetry.istio-system.svc.cluster.local
    

    In this case, the recommended configuration only allows connections from the smm-demo/payments-v1 workload to the istio-telemetry service in the istio-system namespace and to the notifications service in the current namespace (from the perspective of the workload).

  2. To get a recommendation for the whole namespace (smm-demo in this case), run the following command.

    smm sidecar-proxy egress recommend smm-demo
    

    Sample output:

    Recommended sidecar egress rules for namespace smm-demo
    
    Sidecar               Selector  Hosts           Bind  Port  Capture Mode
    smm-demo-zy8fq            istio-system/*        -
                                    ./*
    

    In this case, the recommendation restricts connections to the current and istio-system namespaces.

  3. To apply the recommendations, run the same command again with the --apply switch, for example:

    smm sidecar-proxy egress recommend smm-demo --workload payments-v1 --apply
    

    or

    smm sidecar-proxy egress recommend smm-demo --apply
    

Restrict outbound traffic using the UI

To restrict outbound traffic using the Service Mesh Manager web interface, complete the following steps. If you want to use the Service Mesh Manager command line tool instead, see Restrict outbound traffic from the command line.

  1. Navigate to MENU > TOPOLOGY, or to MENU > WORKLOADS.

    • To restrict outbound traffic for a workload, select a workload.
    • To set outbound traffic restrictions for a namespace, click the name of the namespace (shown in capitals, for example, SMM-DEMO.

    Restrict outbound traffic for a namespace Restrict outbound traffic for a namespace

  2. Click PROXY CONFIG > Override rule .

  3. To get rule recommendations based on live traffic, click Automatic recommendation.

    Proxy config &gt; Automatic recommendation Proxy config &gt; Automatic recommendation

  4. To add new rule manually, click Add, then select the destination namespace and service where you want to permit traffic.

  5. To activate your changes, click Apply. The restrictions you configured are shown on the PROXY CONFIG page.

    Egress proxy rules Egress proxy rules

2.7.2.7 - Routing

Note: This section describes the routing rules of in-mesh services. To configure routing rules for ingress gateways, see Routes and traffic management with Virtual Services.

One of the top features of Service Mesh Manager is the ability to fully configure how traffic flows in the service mesh. This kind of routing works in the application layer, and lets you configure sophisticated rules based on URIs, ports, or headers.

In Istio, routing is mostly described in Virtual Services, and then translated to Envoy configuration. Service Mesh Manager covers almost everything that you can describe with Virtual Services, but comes with easy to use templates for the various routing needs.

You can add routing or redirect rules for requests that match certain criteria, and configure different rules like retry policies, request timeouts, or fault injection in VirtualServices resource using the templates provided.

Rule precedence

Note the following points about how Service Mesh Manager evaluates the routing rules:

  • Rules are evaluated in top-down order.
  • Rules that match on any traffic are always the last to help avoid rule shadowing.
  • Changing the order of rules is not supported in Service Mesh Manager.

When you specify multiple MATCH arguments, they have a logical OR relationship: the rule matches any request that fits one of the match rules. Within a match rule, you can specify multiple rules that have an AND relation. That way you can match requests against a specific URL and an HTTP method, for example.

Set routing using the UI

To create a new routing rule, create a new VirtualService from the dashboard, and complete the following steps. You can also edit or delete VirtualServices, and you can also view the full YAML configuration of the virtual service. The new rule created with VirtualService resource matches every incoming request.

Note: Rules are evaluated in top-down order. For more details, see Rule precedence.

  1. Select the service on the MENU > SERVICES or the MENU > TOPOLOGY page.

  2. Select VIRTUAL SERVICE > CREATE NEW.

  3. Select a template based on your need. Learn more about the VirtualServices resource templates here.

    • By default, the new rule matches every incoming request. When you specify multiple match arguments, they have a logical OR relationship: the rule matches any request that fits one of the match rules. Within a match rule, you can specify multiple rules that have an AND relation. That way, you can match requests against a specific URL and an HTTP method, for example.

      For example, using the following template, you can create a rule that matches only requests where the URL path starts with /ratings/v2/ and the request contains a custom end-user header with the value jason. To add custom matches to select only specific traffic for the rule based on scheme, method, URI, host, port, or authority, use HTTP Request Template.

      HTTP request template

    • You can route the requests to a specific service. To route a portion of the traffic to a different destination, select HTTP Route Destination Template and use the weight parameter to split the traffic between multiple destination services.

      HTTP route destination

    • Alternatively, you can use the HTTP Redirect template to redirect the traffic to a specific URI. Redirect rules overwrite the Path portion of the URL with this value. Note that the entire path is replaced, irrespective of the request URI being matched as an exact path or prefix.

      HTTP redirect template

    • Set the timeout and retry options as needed for your environment using HTTP Retry Template.

      HTTP retry template

    • Set the rewrite option to rewrite specific parts of the HTTP request before forwarding the request to the destination using HTTP Rewrite Template.

      HTTP rewrite template

  4. Click Create. The new rule appears on the VIRTUAL SERVICES tab. You can later edit or delete the routing rule by clicking the Edit or Delete icons, respectively.

Virtual service templates

Read below to learn the usage for the different templates in VIRTUAL SERVICES.

Cors policy

To set the Cross-Origin Resource Sharing (CORS) policy, use the Cors policy template. With the CORS policy template, you can set parameters to allow a server to indicate any origins (domain, scheme, or port).

Cors policy

Delegate

To set routing rules to forward the traffic by a delegate VirtualService to a specific page, use the Delegate template.

Delegate template

Delegate route

The Delegate route template can be used to forward the traffic whose name is specified in the Delegate template.

Delegate route template

Destination route

To set the Destination or a service to which the requests are sent after processing a routing rule, use the Destination route template.

Destination route template

Fault injection abort

To inject faults in a service to test the resiliency by introducing HTTP abort fault, use the Fault injection abort template. To learn more, see Abort fault injection

Fault injection abort template

Fault injection delay

To inject faults in a service to test the resiliency by injecting delays in microservices, use the Fault injection delay template. To learn more, see Delay fault injection

Fault injection delay template

Header manipulation

To specify the header manipulation rules for route destinations, use the Header manipulation template.

Header manipulation template

HTTP direct response

To send fixed status with the body of requests and direct responses to the clients, use the HTTP direct response template.

HTTP direct response template

HTTP redirect

You can use the HTTP Redirect template to redirect the traffic to a specific URI. Redirect rules overwrite the Path portion of the URL with this value. Note that the entire path is replaced, irrespective of the request URI being matched as an exact path or prefix.

HTTP redirect template

HTTP request

The new rule matches every incoming request. When you specify multiple match arguments, they have a logical OR relationship: the rule matches any request that fits one of the match rules. Within a match rule, you can specify multiple rules that have an AND relation. That way, you can match requests against a specific URL and an HTTP method.

HTTP request template

HTTP retry

Set TIMEOUT and RETRY options as needed for your environment using HTTP Retry Template.

HTTP retry template

HTTP rewrite

Set rewrite option to rewrite specific parts of the HTTP request before forwarding the request to the destination using HTTP Rewrite Template. HTTP rewrite template

HTTP route destination

You can route the requests to a specific service. To route a portion of the traffic to a different destination, select HTTP Route Destination Template and use the weight parameter to split the traffic between multiple destination services. Learn more about the route destination here.

HTTP route destination

TCP route

To set match conditions and actions for routing TCP traffic, use ‘Tls Route Template’. Learn more about the Tls route here.

Tcp route template

TLS route

To create ingress with your own domain, use the Tls Route Template. Learn more about the Tls route here.

Tls rewrite template

2.7.2.8 - Sidecar injection

Service Mesh Manager allows you to configure automatic sidecar injection on the namespace level.

For details on configuring the sidecar proxies, see Restrict Outbound Traffic of Workloads.

Set sidecar injection using the UI

To set automatic sidecar injection using the web interface, complete the following steps.

  1. Navigate to MENU > TOPOLOGY.

  2. Click the name of the namespace you want to modify, for example, SMM-DEMO. A sidebar opens.

    Enable automatic sidecar injection Enable automatic sidecar injection

  3. Click DISABLE or ENABLE to disable or enable automatic sidecar injection.

    Note: Automatic sidecar injection happens when the pod is created. Changing this setting does not affect existing pods. To update existing pods, delete them manually, or update all deployments of the namespace by running kubectl rollout restart deployment -n <name-of-namespace>

Set sidecar injection from the command line

To enable automatic sidecar injection on a namespace, run the following command:

smm sidecar-proxy auto-inject on <name-of-namespace>

Expected output:

INFO[0006] auto sidecar injection successfully set to namespace

Note: Automatic sidecar injection happens when the pod is created. Changing this setting does not affect existing pods. To update existing pods, delete them manually, or update all deployments of the namespace by running kubectl rollout restart deployment -n <name-of-namespace>

CAUTION:

Adding the istio-injection label to the namespace does not trigger sidecar injection, because Service Mesh Manager uses versioned control planes. We recommend using the smm sidecar-proxy auto-inject command. Alternatively, you can set the istio.io/rev=cp-v115x.istio-system label manually by running:

kubectl label ns <namespace-to-label> istio.io/rev=cp-v115x.istio-system

To disable automatic sidecar injection on a namespace, run the following command:

smm sidecar-proxy auto-inject off <name-of-namespace>

Expected output:

INFO[0006] auto sidecar injection removed from to namespace

OpenShift

To invoke the istio-cni plugin on an OpenShift cluster, a NetworkAttachmentDefinition object must be present in the namespace which has sidecar-proxy enabled. The Calisti UI and the smm sidecar-proxy command automatically deploys a NetworkAttachmentDefinition instance to the namespace where you configure automatic sidecar injection.

To verify that the NetworkAttachmentDefinition object has been successfully deployed, run:

kubectl get network-attachment-definition -n <name-of-namespace>

2.7.3 - Istio configuration validation

If you’re an active Istio user, then there’s a good chance that Istio’s configuration reference is bookmarked in your browser, and that you’ve read the pages on VirtualServices, and ServiceEntries over and over, but still have to struggle to set up even simple configurations in your mesh.

Istio’s custom resource configuration is very powerful and flexible, but infamous for being overly complex. At its best, its YAML consists of lists of lists, cross-references, conflicting fields, and wildcards.

Even though Istio’s maintainers are aware of this hyper-complexity, and - at least in the last few releases - have tried to bring user friendliness into focus, Istio still routinely strands us in quagmires of minutia and uncertainty. We’re down to ~25 custom resources from ~50 a year ago, and some now have useful CLI features like istioctl analyze, but we feel that there’s more to be done.

That’s why we’ve added our own validation subsystem. The Service Mesh Manager service mesh platform maintains total compatibility with upstream Istio, but also extends its feature set, while avoiding lock-in through a new abstraction layer. A good example of this is its validation subsystem, which takes Istio’s validation system to a whole new level. It does this by considering the cluster state, as a whole, rather than just Istio’s configuration.

Istio configuration validation in Service Mesh Manager

Validation results can be seen on the MAIN MENU > OVERVIEW page of the UI:

Validation Validation

Click Show YAML configuration icon to display the configuration file. In case of validation errors, the relevant parts are highlighted

Validation Validation

You can check the validation results from the command line as well:

smm analyze
✓ 0 validation errors found

You can also run the validation for a specific namespace, for example:

smm analyze --namespace istio-system

The analyze command can also produce JSON output, for example:

smm analyze --namespace istio-system -o json

A sample error output in JSON:

{
  "gateway.networking.istio.io:master:istio-system:demo-gw-demo1": [
    {
      "checkID": "gateway/reused-cert",
      "istioRevision": "cp-v115x.istio-system",
      "subjectContextKey": "gateway.networking.istio.io:master:istio-system:demo-gw-demo1",
      "passed": false,
      "error": {},
      "errorMessage": "multiple gateways configured with same TLS certificate"
    }
  ],
  "gateway.networking.istio.io:master:istio-system:demo-gw-demo2": [
    {
      "checkID": "gateway/reused-cert",
      "istioRevision": "cp-v115x.istio-system",
      "subjectContextKey": "gateway.networking.istio.io:master:istio-system:demo-gw-demo2",
      "passed": false,
      "error": {},
      "errorMessage": "multiple gateways configured with same TLS certificate"
    }
  ]
}

Validation examples

Service Mesh Manager performs a lot of validation checks for various aspects of the configuration, both syntactically and semantically. The validation checks are constantly curated and new checks added with every release. A few examples will be presented in this post to show how helpful this feature is.

Sidecar injection template validation

This check validates whether there are any pods within the environment that runs with an outdated sidecar proxy image or configuration. In this example the global configuration setting of the sidecar proxy image was changed from banzaicloud/istio-proxyv2:1.7.3-bzc to banzaicloud/istio-proxyv2:1.7.3-bzc.1.

smm analyze --namespace smm-demo

An error output looks like this:

destinationrule smm-demo/movies:
  Cluster: ex7gkhfn49gi5
  Error: missing mesh policy
    Control Plane: cp-v115x.istio-system
    Error ID: destinationrule/enabled-mtls/destinationrule/enabled-mtls/missing-mesh-policy
    Severity: error
    Path: host
    Context:
      hostname: movies

✗ 1 validation error found

This helps operators to get information about outdated proxies within the environment.

Gateway port protocol configuration conflict validation

This example demonstrates a check for the common mistake of setting conflicting port configuration in different Gateway resources, which won’t be denied by Istio’s built-in validation, but can cause unwanted behavior at ingress. The 9443 port for the same ingress gateway has been set to TCP in one resource, and set to TLS in another.

The following YAMLs were applied:

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: demo-gw-port-conflict-01
  namespace: istio-system
spec:
  selector:
    app: demo-gw
    gateway-name: demo-gw
    gateway-type: ingress
  servers:
  - hosts:
    - demo1.example.com
    port:
      name: tcp
      number: 9443
      protocol: TCP
---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: demo-gw-port-conflict-02
  namespace: istio-system
spec:
  selector:
    app: demo-gw
    gateway-name: demo-gw
    gateway-type: ingress
  servers:
  - hosts:
    - demo2.example.com
    port:
      name: tls
      number: 9443
      protocol: TLS
    tls:
      serverCertificate: /certs/cert.pem
      privateKey: /certs/key.pem
      mode: SIMPLE

Check the configuration’s validity by running the CLI tool’s analyze command.

smm analyze --namespace istio-system

The output shows the issue exactly, and provides all the information necessary for the operator to quickly pinpoint the problem in the configuration.

gateway istio-system/demo-gw-port-conflict-01:
    Cluster: master
    Error: Conflicting gateway port protocols
        Control Plane: v115x.istio-system
        Error ID: gateway/port/gateway/port/protocol-conflict
        Path: servers[0]
        Context:
            port: 9443
            protocol: TCP

gateway istio-system/demo-gw-port-conflict-02:
    Cluster: master
    Error: Conflicting gateway port protocols
        Control Plane: cp-v115x.istio-system
        Error ID: gateway/port/gateway/port/protocol-conflict
        Path: servers[0]
        Context:
            port: 9443
            protocol: TLS

✗ 2 validation errors found

Multiple gateways with the same TLS certificate validation

Configuring more than one gateway, using the same TLS certificate, causes browsers that leverage HTTP/2 connection reuse (that is, most browsers) to produce 404 errors when accessing a second host after a connection to another host has already been established.

You can read more about this issue in the Istio docs.

Let’s apply the following resources to demonstrate how this issue works:

apiVersion: servicemesh.cisco.com/v1alpha1
kind: IstioMeshGateway
metadata:
  labels:
    app: demo-gw
  name: demo-gw
  namespace: istio-system
spec:
  istioControlPlane:
    name: cp-v115x
    namespace: istio-system
  deployment:
    metadata:
      labels:
        app: demo-gw
    replicas:
      min: 1
      max: 1
      count: 1
  service:
    ports:
      - name: http2
        port: 80
        protocol: TCP
        targetPort: 8080
      - name: https
        port: 443
        protocol: TCP
        targetPort: 8443
    type: LoadBalancer
  runAsRoot: true
  type: ingress
---
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: selfsigned-issuer
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: example-wildcard-cert
  namespace: istio-system
spec:
  secretName: example-wildcard-cert
  duration: 2160h # 90d
  renewBefore: 360h # 15d
  commonName: "test wildcard certifcate"
  isCA: false
  keySize: 2048
  keyAlgorithm: rsa
  keyEncoding: pkcs1
  usages:
    - server auth
  dnsNames:
  - "*.example.com"
  issuerRef:
    name: selfsigned-issuer
    kind: ClusterIssuer
    group: cert-manager.io
---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: demo-gw-tls-conflict-01
  namespace: istio-system
spec:
  selector:
    app: demo-gw
    gateway-name: demo-gw
    gateway-type: ingress
  servers:
  - hosts:
    - demo1.example.com
    port:
      name: https
      number: 443
      protocol: HTTPS
    tls:
      credentialName: example-wildcard-cert
      httpsRedirect: false
      mode: SIMPLE
---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: demo-gw-tls-conflict-02
  namespace: istio-system
spec:
  selector:
    app: demo-gw
    gateway-name: demo-gw
    gateway-type: ingress
  servers:
  - hosts:
    - demo2.example.com
    port:
      name: https
      number: 443
      protocol: HTTPS
    tls:
      credentialName: example-wildcard-cert
      httpsRedirect: false
      mode: SIMPLE

The following resources were created:

  • an ingress gateway
  • an *.example.com wildcard certificate
  • two Gateway resources, both of which specify the same wildcard cert

Check the configuration’s validity by running the analyze command in the CLI tool.

smm analyze --namespace smm-system
gateway istio-system/demo-gw-demo1:
    Cluster: master
    Error: multiple gateways configured with same TLS certificate
        Control Plane: cp-v115x.istio-system
        Error ID: gateway/reused-cert/gateway/reused-cert
        Path: port[443]
        Context:
            reusedCertificateSecret: secret:master:istio-system:example-wildcard-cert

gateway istio-system/demo-gw-demo2:
    Cluster: master
    Error: multiple gateways configured with same TLS certificate
        Control Plane: cp-v115x.istio-system
        Error ID: gateway/reused-cert/gateway/reused-cert
        Path: port[443]
        Context:
            reusedCertificateSecret: secret:master:istio-system:example-wildcard-cert

✗ 2 validation errors were found

2.8 - Convenience features

2.8.1 - Tracking Service Level Objectives (SLOs)

This section gives you a very brief overview of Service Level Objectives (SLOs), error budgets, and other related concepts, and shows you how to track them using Service Mesh Manager.

For a more detailed introduction to these concepts, see the Tracking and enforcing SLOs on Kubernetes webinar blog post.

Even though Kubernetes is designed for fault tolerance, errors do happen. You can define a level of service and track your compliance to it using service level indicators (SLIs), service level objectives (SLOs), and error budgets. These are based on telemetry (mostly monitoring) information, so the most important thing before adopting an SLO model is to have meaningful, appropriate metrics and a proper and stable monitoring system in place.

Terminology

Let’s review the related terminology.

Service Level Indicator (SLI)

A Service Level Indicator (SLI) is a carefully defined quantitative measure of some aspect of the level of service that is provided. Basically, that’s what you measure as a level of service, for example:

  • the success rate of HTTP requests,
  • the percentage of requests below a certain latency threshold,
  • the fraction of time when a service is available, or
  • any other metrics that somehow describe the state of the service.

It’s usually a good practice to formulate the SLI as the ratio of two numbers: the good events divided by the total events. This way the SLI value is between 0 and 1 (or 0% and 100%), and it’s easily matched to the Service Level Objective (SLO) value that’s usually defined as a target percentage over a given timeframe. The previous examples are all following this practice.

Service Level Objective

The Service Level Objective (SLO) is a target value or range of values for a service level that is measured by an SLI. It defines the minimum level of reliability that the users of your service can expect. For example, if you want to have an 99.9% HTTP success rate, then your SLO is 99.9%.

An important aspect of the SLO is the period where it’s interpreted. An SLO can be defined for a rolling period, or for a calendar window. Usually, an SLO refers to a longer period, like a month, or 4 weeks. It’s always a good practice to continuously improve your SLOs based on the current performance of your system.

Compliance

Compliance measures the current performance of the system and is measured against your SLO. For example, if you have a 99.9% SLO goal for a 4 week period, then compliance is the exact measurement based on the same SLI, for example, 99.98765%.

Error budget

The error budget is a metric that determines how unreliable the service is allowed to be within a period of time.

The remaining error budget is the difference between the SLO and the actual compliance in the current period. If you have an SLO of 99.9% for a certain period, you have an error budget of 0.1% for that same period. If the compliance is 99.92% at the end of the period, it means that the remaining error budget is 20%.

For example: if you expect 10 million requests this month, and you have a 99.9% SLO, then you’re allowed 10.000 requests to fail. These 10.000 requests are your error budget. If a single event causes 2.000 requests to fail, it burned through 20% of your error budget.

Burn rate

Burn rate is how fast the service consumes the error budget, relative to the SLO.

A burn rate of 1 for the whole SLO period means that you’ve steadily burned through exactly 100% of our error budget during that period. A burn rate of 2 means that you’re burning through the budget twice as fast as allowed, so you’ll exhaust your budget at halftime of the SLO period (or you’ll have twice as many failures as allowed by the SLO by the end of the period). The burn rate can be interpreted even for a shorter period than your SLO period, and it is the basis of a good alerting system.

Track SLOs with Service Mesh Manager

Service Mesh Manager displays the Service Level Objectives (SLOs) configured to a service both in the drill down view and on the Services page.

SLO overview SLO overview

As an overview, Service Mesh Manager displays the following information for every SLO configured for the service:

  • NAME: The name and the description on the SLO.
  • CURRENT BURN RATE: The current burn rate of the SLO.
  • ERROR BUDGET CONSUMED: The percentage of the error budget already consumed for the period.
  • ALERTS: The currently firing and the total number of alerts configured for the SLO.

Click on the name of an SLO to display its details. You can also modify an existing SLO, or create a new SLO.

Details of an SLO

Clicking on the name of a Service Level Objective displays its details.

SLO details SLO details

  • Time Window: The time period used for the SLO, for example, 30 days.

  • Type: The type of the SLO: rolling or calendar-based.

  • Target: The target success rate of the SLO.

  • DESCRIPTION: The description of the Service Level Objective.

  • NAMESPACE: The namespace of the service.

  • SERVICE: The name of the service the SLO applies to.

  • LABELS:

  • SLI: Shows an overview of the Service Level Indicator (SLI). Click the Show YAML configuration icon next to the name of the SLI to display its YAML configuration.

  • Alerting Policies: The list of alerting policies defined for the service in Prometheus Alertmanager. To display the alerts in Prometheus, click the Open metrics in Grafana icon next to the Alerting Policies label. For each Alerting Policy, the following information is displayed.

    • NAME: The name of the Alerting Policy.

    • LOOKBACK WINDOWS: The primary and the control time-frame for which the burn rate threshold is calculated.

    • ALERT AFTER: The period for which the burn rate must be above the threshold to trigger the alert.

    • THRESHOLD: The BURN RATE THRESHOLD above which the alert is triggered.

    • SEVERITY: The severity of the alert.

    • STATE: The current state of the alert:

      • inactive: the service works within the specified thresholds
      • pending: the service works outside of the specified thresholds, waiting for the alert after period setting has not yet passed
      • firing: the service is outside of the specified thresholds, an alert is triggered

    You can modify an existing policy, or create a new Alerting Policy. Click the Show YAML configuration icon next to the name of the alerting policy to display its YAML configuration.

    To get an overview of the burn rates of the configured alerting policies, select ALERTING STRATEGY. The diagram shows when the first alert will fire, depending on the magnitude of the outage. ```

    Alerting strategies Alerting strategies

    As burn rates are helpful tools for differentiating the alerting strategy based on the severity of a given outage, for best practices on setting up burn-rate based alerts please see our Burn Rate Based Alerting Demystified

  • Metrics: Displays the Burn rate, the Error Budget Consumed, and the Compliance for the selected time period.

Click the Show YAML configuration icon next to the name of the SLO to display its YAML configuration.

YAML configuration of an SLO YAML configuration of an SLO

Error budget and the health indicator

The health system of Service Mesh Manager is an outlier detection system. A service is considered unhealthy if it behaves differently than it did in the past. This means that it’s possible that a service exceeds its error budget but Service Mesh Manager shows it as healthy. This can happen for the following reasons:

  • Your service has exceeded its error budget in the past, but the issue has been resolved and now your service is behaving as expected.
  • The SLOs are misconfigured, for example, you are over the error budget even though your budget was consumed linearly. In this case there are no outliers in the behavior of the service, therefore it is considered healthy.

Further information

2.8.1.1 - Create new Service Level Objectives (SLOs)

The Service Level Objective (SLO) is a target value or range of values for a service level that defines the minimum level of reliability. To create a new Service Level Objective, complete the following steps.

  1. Navigate to MENU > Services and find the service you want to modify.

  2. Click on the service to display its details.

  3. In the Service Level Objectives section, click CREATE NEW.

    Create new SLO

  4. In the SLI field, select the Service Level Indicator template you want to use. This is just a template, you can modify its parameters as needed for your environment. After selecting an SLI, its description is displayed under the name of the SLI. Click the Show YAML configuration icon next to the name of the SLI to display its YAML configuration. (If you want to create a new SLI using a custom resource, see Service Level Indicator template.)

    Set SLO parameters Set SLO parameters

  5. Enter where you want to measure the SLI into the REPORTER field. Enter source to measure the metric at the source Envoy proxy, or destination to measure the metric at the target Envoy proxy.

  6. Enter the target level of reliability of the SLO into the TARGET field. The diagram shows the currently selected target and the value of the SLI for the selected period.

  7. Select the type of SLO you want to use in the WINDOW TYPE field. Select Calendar for fixed calendar SLOs, or Rolling for continuous periods, and set the length of the window.

  8. Enter a name for the SLO into the Name field, then click CREATE.

Further information

2.8.1.2 - Create new Alerting Policy

Alerting Policies define predictive alerts to ensure compliance to your Service Level Objectives (SLOs) . To create a new Alerting Policy, complete the following steps.

  1. Navigate to MENU > Services and find the service you want to add an Alerting Policy.

  2. Click on the service to display its details.

  3. Select the SLO to which you want to add a new Alerting Policy. If there are no SLOs defined for the service, you must first create a new Service Level objective.

  4. In the Alerting Policies section, click CREATE NEW.

    Create new Alerting Policy

  5. In the SLI field, select the Service Level Indicator template you want to use. This is just a template, you can modify its parameters as needed for your environment. After selecting an SLI, its description is displayed under the name of the SLI. Click the Show YAML configuration icon next to the name of the SLI to display its YAML configuration.

    Set Alerting Policy parameters Set Alerting Policy parameters

  6. Enter the BURN RATE THRESHOLD above which you want to alert.

  7. Enter the time-frame for which to calculate the burn rate threshold into the PRIMARY LOOKBACK WINDOW field.

  8. (Optional) Configure a CONTROL LOOKBACK WINDOW to ensure that the alert is triggered while the error budget is being consumed.

  9. (Optional) To alert only if the burn rate is above the specified threshold for a period, set the ALERT AFTER field. That way you can avoid triggering the alert for short peaks if otherwise the burn rate is normal.

  10. Select the SEVERITY of the alert (ticket or page). You can use this field to route the alert in Prometheus Alertmanager.

  11. Enter a NAME for the Alerting Policy, then click CREATE.

AlertingPolicy CR reference

This section describes the fields of the AlertingPolicy custom resource.


apiVersion (string)

Must be sre.smm.cisco.com/v1alpha1

kind (string)

Must be AlertingPolicy

spec (object)

The configuration and parameters of the resource.

spec.burnrate (object)

Specifies the burn rate of the alerting policy.

conditionMetDuration (string)

To alert only if the burn rate is above the specified threshold for a period, set the conditionMetDuration field. That way you can avoid triggering the alert for short peaks if otherwise the burn rate is normal.

lookBackWindow (string)

The time-frame for which to calculate the burn rate threshold.

secondaryWindow (string)

A control lookback window to ensure that the alert is triggered while the error budget is being consumed.

severity (string)

The severity of the alert (ticket or page). You can use this field to route the alert in Prometheus Alertmanager.

sloRef (object)

The name and namespace of the Service Level Objective that the alerting policy will alert on. For example:

    sloRef:
      name: movies-30d-rolling-availability
      namespace: smm-demo

threshold (string)

The burn rate threshold above which you want to alert.

status (object)

The current state of the resource. This object is managed by Service Mesh Manager.

Further information

2.8.1.3 - Service Level Indicator template

Service Level Indicators (SLIs)) in Service Mesh Manager are based on the ServiceLevelIndicatorTemplate custom resource. This resource describes the PromQL queries that calculate the number of goodEvents and totalEvents using a Go templated string. Note that the ServiceLevelIndicatorTemplate is a namespaced custom resource. To use the Service Level Indicator in a Service Level Objective, you have to reference both the name and the namespace of the Service Level Indicator in the ServiceLevelObjective custom resource.


For example:


To make custom SLI available in Service Mesh Manager, create a custom ServiceLevelIndicatorTemplate resource, then apply it to your cluster. For example:

kubectl apply --namespace smm-demo -f sli-example.yaml
servicelevelindicatortemplate.sre.smm.cisco.com/http-requests-success-rate-demo created

For a more detailed tutorial, see the Defining application level SLOs using Service Mesh Manager blog post.

ServiceLevelIndicatorTemplate CR reference

This section describes the fields of the ServiceLevelIndicatorTemplate custom resource.

apiVersion (string)

Must be sre.smm.cisco.com/v1alpha1

kind (string)

Must be ServiceLevelIndicatorTemplate

spec (object)

The configuration and parameters of the Service Level Indicator.

spec.description (string)

A human-readable description of the SLI. This text appears on the Service Mesh Manager web interface as well. For example:

spec:
  description: |
    Indicates the percentages of successful (non 5xx) HTTP responses compared to all requests

spec.goodEvents and spec.totalEvents (string)

These fields specify the Go templated PromQL queries that return the number of good events and the total number of events, for example, the number of successful HTTP requests and the total number of HTTP requests.

The following variables are available for interpolation:

  • .Params.: The values associated with each parameter when a new ServiceLevelObjective is specified.
  • .Service.Namespace: The namespace of the service this ServiceLevelObject is created on.
  • .Service.Name: Name of the service this ServiceLevelObjective is created on.
  • .SLO.Period: The period on which the SLO is defined.

Service Mesh Manager enforces the best practice of formulating the SLI as the ratio of two numbers: the good events divided by the total events. This way the SLI value will be between 0 and 1 (or 0% and 100%), and it’s easily matched to the SLO value that’s usually defined as a target percentage over a given timeframe. (For further details and examples on defining SLIs, see our Tracking and enforcing SLOs blog post.)

goodEvents: |
  sum(rate(
    istio_requests_total{reporter="{{ .Params.reporter }}", destination_service_namespace="{{ .Service.Namespace }}", destination_service_name="{{ .Service.Name }}",response_code!~"5[0-9]{2}|0"}[{{ .SLO.Period }}]
  ))

totalEvents: |
  sum(rate(
    istio_requests_total{reporter="{{ .Params.reporter }}", destination_service_namespace="{{ .Service.Namespace }}", destination_service_name="{{ .Service.Name }}"}[{{ .SLO.Period }}]
  ))

spec.kind (string)

Specifies the type of the Service Level Indicator. This value also determines which other fields must be set. Possible values:

  • availability: Used for error-rate characteristics.
  • latency: Used for measuring microservice latencies.

spec.parameters (object)

Contains parameters specific to the SLI. These parameter values are specified in the ServiceLevelObjective CR and used when the previous queries are evaluated. For example:

  parameters:
    - default: source
      description: the Envoy proxy that reports the metric (source | destination)
      name: reporter

If you specify advanced: true for a parameter, it appears on the Service Mesh Manager web interface only when the SHOW ADVANCED PARAMETERS option is selected.

You can declare any parameter, that your Service Level Indicator would depend on.

The reason behind this implementation is that Prometheus only provides a histogram_quantile function that yields a given latency percentile. We could have created an SLI Template on top of that, but it felt more natural to express latency SLOs by specifying a simple threshold.

Example: specifying bucket ranges

Specifies a comma-separated list of histogram buckets used to collect timeseries for a corresponding Prometheus metric. For example:

  parameters:
    - advanced: true
      default: '0.5, 1, 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000, 30000, 60000, 300000, 600000, 1800000, 3600000'
      description: comma separated list of histogram buckets used to collect timeseries for a corresponding Prometheus metric
      name: buckets

To help to interpolate values from histograms, in the goodEvents and totalEvents queries you can use the .FloorBucket and .CeilBucket functions to determine which buckets are the closest to a given threshold. For example, in our Duration (latency) SLI Template we use these functions to process histograms in a way that allows you to specify any latency threshold, not just the ones defined as buckets in the envoy-proxy exporter.

Example: threshold parameter

To define the default acceptance threshold for latency values in milliseconds, Service Mesh Manager uses the following parameter definition.

  parameters:
    - default: '100'
      description: acceptance threshold for HTTP latency values in milliseconds
      name: threshold

2.8.1.4 - Create custom Service Level Objective

The ServiceLevelObjective custom resource defines a Service Level Objective (SLO) for a specific service, based on a ServiceLevelIndicatorTemplate resource.


For example, the following ServiceLevelObjective resource defines and 99.9% SLO for the HTTP requests in a 30-day rolling window for the Frontpage service included in Service Mesh Manager’s demo application:


For a more detailed tutorial, see the Defining application level SLOs using Service Mesh Manager blog post.

ServiceLevelObjective CR reference

This section describes the fields of the ServiceLevelObjective custom resource. Note that the ServiceLevelObjective is a namespaced custom resource.

apiVersion (string)

Must be sre.smm.cisco.com/v1alpha1

kind (string)

Must be ServiceLevelObjective

spec (object)

The configuration and parameters of the Service Level Objective.

spec.description (string)

A human-readable description of the SLO. This text appears on the Service Mesh Manager web interface as well. For example:

spec:
  description: |
    HTTP request success rate should be above 99.9% for a rolling period of 30 days

spec.selector (object)

Specifies which service the SLO applies to using its name and namespace, for example:

spec:
  selector:
    name: frontpage
    namespace: smm-demo

spec.sli (object)

Contains a templateRef that specifies which Service Level Indicator template to use in this SLO, and the parameters passed to the template. For the details of the available parameters, see the CR of the respective Service Level Indicator template.

spec.sli.templateRef (object)

Specifies Service Level Indicator template to use in this SLO using its name and namespace, for example:

  sli:
    templateRef:
      name: http-requests-duration
      namespace: smm-system

You can list the available templates by running the following command:

kubectl get slit -A
NAMESPACE          NAME                         AGE
smm-system   grpc-requests-success-rate   3d1h
smm-system   http-requests-duration       3d1h
smm-system   http-requests-success-rate   3d1h

spec.slo (object)

Configures the type and goal of the Service Level Objective. For example, for a 14-day window:

  slo:
    goal: '99.9'
    rolling:
      length: 336h0m0s

spec.slo.calendar (object)

For calendar-based SLO windows, specifies the length of the window: month|week|day, for example:

  slo:
    calendar:
      length: week

spec.slo.goal (string)

The actual goal of the SLO as a floating-point number, for example, “99.9” for 99.9% success level.

spec.slo.rolling (object)

For rolling SLO windows, specifies the length of the window, for example, for a 14-day window:

    rolling:
      length: 336h0m0s

status (object)

The current state of the Service Level Objective. This object is managed by Service Mesh Manager.

2.8.2 - Flagger Canary

Flagger is Progressive Delivery Operator for Kubernetes that is designed to give developers confidence in automating production releases with progressive delivery techniques.

The benefit of using Canary releases is its ability to do capacity testing of the new version in a production environment with a safe rollback strategy if issues are found. It reduces the risk of new software versions in production by gradually shifting traffic to the new version while measuring traffic metrics and running rollout tests.

Flagger can run automated application testing for the following deployment strategies:

  • Canary (progressive traffic shifting)
  • A/B testing (HTTP headers and cookie traffic routing)
  • Blue/Green (Traffic switching mirroring)

The following example shows how to integrate Flagger with Service Mesh Manager to observe Progressive delivery on the Service Mesh Manager dashboard. To demonstrate this, you will learn how to configure and deploy podinfo application for Blue/Green traffic mirror testing, upgrade its version and watch the Canary release on the Service Mesh Manager dashboard.

Setting up Flagger with Service Mesh Manager

  1. Deploy Flagger into the smm-system namespace and connect it to Istio and Prometheus at the Service Mesh Manager Prometheus address as shown in the following command:

    Note: The Prometheus metrics service is hosted at http://smm-prometheus.smm-system.svc.cluster.local:59090/prometheus

    kubectl apply -f https://raw.githubusercontent.com/fluxcd/flagger/main/artifacts/flagger/crd.yaml
    helm repo add flagger https://flagger.app
    helm upgrade -i flagger flagger/flagger \
    --namespace=smm-system \
    --set crd.create=false \
    --set meshProvider=istio \
    --set metricsServer=http://smm-prometheus.smm-system.svc.cluster.local:59090/prometheus
    
  2. Make sure you see the following log message for successful flagger operator deployment in your Service Mesh Manager cluster:

    kubectl -n smm-system logs deployment/flagger
    

    Expected output:

    {"level":"info","ts":"2022-01-25T19:45:02.333Z","caller":"flagger/main.go:200","msg":"Connected to metrics server http://smm-prometheus.smm-system.svc.cluster.local:59090/prometheus"}
    

    At this point flagger is integrated with Service Mesh Manager. You can now deploy your own applications to be used for Progressive Delivery.

Podinfo example with Flagger

Next let’s try out an example from Flagger docs.

  1. Create the “test” namespace and enable sidecar-proxy auto-inject on for this namespace (use the smm binary downloaded from the Service Mesh Manager download page).

    Then deploy the “podinfo” target image that needs to be enabled for canary deployment for load testing during automated canary promotion:

    kubectl create ns test
    smm sidecar-proxy auto-inject on test
    kubectl apply -k https://github.com/fluxcd/flagger//kustomize/podinfo
    
  2. Create IstioMeshGateway service:

    kubectl apply -f - << EOF
    apiVersion: servicemesh.cisco.com/v1alpha1
    kind: IstioMeshGateway
    metadata:
      annotations:
        banzaicloud.io/related-to: istio-system/cp-v115x
      labels:
        app: test-imgw-app
        istio.io/rev: cp-v115x.istio-system
      name: test-imgw
      namespace: test
    spec:
      deployment:
        podMetadata:
          labels:
            app: test-imgw-app
            istio: ingressgateway
      istioControlPlane:
        name: cp-v115x
        namespace: istio-system
      service:
        ports:
          - name: http
            port: 80
            protocol: TCP
            targetPort: 8080
        type: LoadBalancer
      type: ingress
    EOF
    
  3. Add Port and Hosts for IstioMeshGateway using the following gateway configuration.

    kubectl apply -f - << EOF
    apiVersion: networking.istio.io/v1alpha3
    kind: Gateway
    metadata:
      name: public-gateway
      namespace: test
    spec:
      selector:
        app: test-imgw-app
        gateway-name: test-imgw
        gateway-type: ingress
        istio.io/rev: cp-v115x.istio-system
      servers:
        - port:
            number: 80
            name: http
            protocol: HTTP
          hosts:
            - "*"
    EOF
    
  4. Create a Canary custom resource.

    
    
  5. Wait until Flagger initializes the deployment and sets up a VirtualService for podinfo.

    kubectl -n smm-system logs deployment/flagger -f
    

    Expected:

    {"level":"info","ts":"2022-01-25T19:54:42.528Z","caller":"controller/events.go:33","msg":"Initialization done! podinfo.test","canary":"podinfo.test"}
    
  6. Get the Ingress IP from IstioMeshGateway:

    export INGRESS_IP=$(kubectl get istiomeshgateways.servicemesh.cisco.com -n test test-imgw -o jsonpath='{.status.GatewayAddress[0]}')
    echo $INGRESS_IP
    

    The output should be an IP address, for example: 34.82.47.210

  7. Verify that podinfo is reachable from external IP address by running curl http://$INGRESS_IP/

    The output should be similar to:

    {
      "hostname": "podinfo-96c5c65f6-l7ngc",
      "version": "6.0.0",
      "revision": "",
      "color": "#34577c",
      "logo": "https://raw.githubusercontent.com/stefanprodan/podinfo/gh-pages/cuddle_clap.gif",
      "message": "greetings from podinfo v6.0.0",
      "goos": "linux",
      "goarch": "amd64",
      "runtime": "go1.16.5",
      "num_goroutine": "8",
      "num_cpu": "4"
    }
    
  8. Send traffic to the ingress IP. For this setup we will use the hey traffic generator. On macOS, you can install it from the brew package manager:

    brew install hey
    

    You can send traffic from any terminal where the IP address is reachable. This command sends curl requests for 30 mins from two threads, each with 10 requests per second:

    hey -z 30m -q 10 -c 2 http://$INGRESS_IP/
    

    On the Service Mesh Manager dashboard, select MENU > TOPOLOGY, and select the test namespace to see the generated traffic.

    Image of podinfo traffic Image of podinfo traffic

Upgrade Image version

The current pod version is v6.0.0, update it to the next version.

  1. Upgrade the target image to the new version and watch the canary functionality on the Service Mesh Manager dashboard.

    kubectl -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:6.1.0
    

    Expected output:

    deployment.apps/podinfo image updated
    

    You can check the logs as flagger tests and promotes the new version:

    {"msg":"New revision detected! Scaling up podinfo.test","canary":"podinfo.test"}
    {"msg":"Starting canary analysis for podinfo.test","canary":"podinfo.test"}
    {"msg":"Advance podinfo.test canary weight 20","canary":"podinfo.test"}
    {"msg":"Advance podinfo.test canary weight 40","canary":"podinfo.test"}
    {"msg":"Advance podinfo.test canary weight 60","canary":"podinfo.test"}
    {"msg":"Advance podinfo.test canary weight 80","canary":"podinfo.test"}
    {"msg":"Copying podinfo.test template spec to podinfo-primary.test","canary":"podinfo.test"}
    {"msg":"HorizontalPodAutoscaler podinfo-primary.test updated","canary":"podinfo.test"}
    {"msg":"Routing all traffic to primary","canary":"podinfo.test"}
    {"msg":"Promotion completed! Scaling down podinfo.test","canary":"podinfo.test"}
    
  2. Check Canaries status by running the kubectl get canaries -n test -o wide command. The output should be similar to:

    NAME      STATUS         WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Initializing   0        0              30s                 20                         80          2022-04-11T21:25:31Z
    ..
    NAME      STATUS         WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Initialized    0        0              30s                 20                         80          2022-04-11T21:26:03Z
    ..
    NAME      STATUS         WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Progressing    0        0              30s                 20                         80          2022-04-11T21:33:03Z
    ..
    NAME      STATUS         WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Succeeded      0        0              30s                 20                         80          2022-04-11T21:35:28Z
    
  3. Visualize the entire progressive delivery through the Service Mesh Manager dashboard.

    Traffic from “TEST-IMGW-APP” is shifted from “podinfo-primary” to “podinfo-canary” from 20% to 80% (according to the step configured for canary rollouts). The following image shows the incoming traffic on the “podinfo-primary” pod: Image of primary podinfo traffic Image of primary podinfo traffic

    The following image shows the incoming traffic on “podinfo-canary” pod: Image of canary podinfo traffic Image of canary podinfo traffic

You can see that flagger dynamically shifts the ingress traffic to canary deployment in steps and performs conformity tests. Once the tests pass, flagger shifts the traffic back to the primary deployment and updates the version of the primary deployment to the new version.

Finally, Flagger scales down podinfo:6.0.0 and shifts the traffic to podinfo:6.1.0, and makes it a primary deployment.

The following image shows that the canary-image(v6.1.0) was tagged as primary-image(v6.1.0): Image of canary and podinfo traffic Image of canary and podinfo traffic

Automated rollback

To test automated rollback in case a canary fails, complete the following steps.

  1. Generate status 500 and delay by running the following command on the tester pod by running:

    watch "curl -s http://$INGRESS_IP/delay/1 && curl -s http://$INGRESS_IP/status/500"
    
  2. Watch how the Canary release fails. Run kubectl get canaries -n test -o wide

    The output should be similar to:

    NAME      STATUS        WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Progressing   60       1              30s                 20                         80          2022-04-11T22:10:33Z
    ..
    NAME      STATUS        WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Progressing   60       1              30s                 20                         80          2022-04-11T22:10:33Z
    ..
    NAME      STATUS        WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Progressing   60       2              30s                 20                         80          2022-04-11T22:11:03Z
    ..
    NAME      STATUS        WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Progressing   60       3              30s                 20                         80          2022-04-11T22:11:33Z
    ..
    NAME      STATUS   WEIGHT   FAILEDCHECKS   INTERVAL   MIRROR   STEPWEIGHT   STEPWEIGHTS   MAXWEIGHT   LASTTRANSITIONTIME
    podinfo   Failed   0        0              30s                 20                         80          2022-04-11T22:12:03Z
    
    {"msg":"New revision detected! Scaling up podinfo.test","canary":"podinfo.test"}
    {"msg":"Starting canary analysis for podinfo.test","canary":"podinfo.test"}
    {"msg":"Advance podinfo.test canary weight 20","canary":"podinfo.test"}
    {"msg":"Advance podinfo.test canary weight 40","canary":"podinfo.test"}
    {"msg":"Advance podinfo.test canary weight 60","canary":"podinfo.test"}
    {"msg":"Halt podinfo.test advancement request duration 917ms > 500ms","canary":"podinfo.test"}
    {"msg":"Halt podinfo.test advancement request duration 598ms > 500ms","canary":"podinfo.test"}
    {"msg":"Halt podinfo.test advancement request duration 1.543s > 500ms","canary":"podinfo.test"}
    {"msg":"Rolling back podinfo.test failed checks threshold reached 3","canary":"podinfo.test"}
    {"msg":"Canary failed! Scaling down podinfo.test","canary":"podinfo.test"}
    
  3. Visualize the canary rollout on the Service Mesh Manager Dashboard.

    When the rollout steps from 0% -> 20% -> 40% -> 60%, you can observe that the performance degrades for incoming requests > 500ms, causing image rollout to halt. Threshold was set to max 3 attempts, so after trying for three times, rollout was backed off.

    The following image shows the “primary-pod” incoming traffic graph: Image of podinfo traffic Image of podinfo traffic

    The following image shows the “canary-pod” incoming traffic graph: Image of podinfo traffic Image of podinfo traffic

    The following image shows the status of pod health: Image of podinfo traffic Image of podinfo traffic

Cleaning up

To clean up your cluster, run the following commands.

  1. Remove the Gateway and Canary CRs.

    kubectl delete -n test canaries.flagger.app podinfo
    kubectl delete -n test gateways.networking.istio.io public-gateway
    kubectl delete -n test istiomeshgateways.servicemesh.cisco.com test-imgw
    kubectl delete -n test deployment podinfo
    
  2. Delete the “test” namespace.

    kubectl delete namespace test
    
  3. Uninstall the Flagger deployment and delete the canary CRD resource.

    helm delete flagger -n smm-system
    kubectl delete -f https://raw.githubusercontent.com/fluxcd/flagger/main/artifacts/flagger/crd.yaml
    

2.8.3 - Integrated monitoring in SMM

One of the core features of the Istio service mesh is the observability of network traffic. Because all service-to-service communication is going through Envoy proxies, and Istio’s control plane is able to gather logs and metrics from these proxies, the service mesh can give you deep insights about your network.

But Istio needs some additional components to unleash its full potential of observing the mesh. Prometheus collects these metrics from Envoy proxies, and Grafana displays monitoring information on analytics dashboards.

Service Mesh Manager builds an integrated, production-ready environment of these components with a single CLI command. Prometheus is set up to scrape Envoys, and Grafana dashboards are automatically configured for services and workloads in the mesh.

Note: Service Mesh Manager provides an end-to-end monitoring solution built on Service Level Objectives. To start using this feature, see Tracking Service Level Objectives (SLOs).

Test integrated monitoring

The MENU > TOPOLOGY and MENU > SERVICES pages of the Service Mesh Manager UI serve as starting points to diagnose problems within the mesh. You’ll see if error rates are up for a specific service, RPS is down, or latency increases.

Some Grafana dashboards are available directly in the Service view, but for more in-depths analytics, you’ll probably need to dig deeper into Grafana dashboards, or directly into Prometheus metrics. To access a specific Grafana dashboard from Service Mesh Manager, just click the Grafana link (Open metrics in Grafana ) in the Service or Workload view.

CAUTION:

If you have installed Service Mesh Manager in Anonymous mode, you won’t be able to access the Metrics and Traces dashboards from the UI. Clicking the Open metrics in Grafana or Open Jaeger tracing icon in anonymous mode causes the RBAC: access denied error message.

To automate monitoring, you can configure Service Level Objectives (SLOs) and alerts as well. For details, see Tracking Service Level Objectives (SLOs).

To access the Service Mesh Manager UI from your machine, use the smm dashboard command.

Grafana integration Grafana integration

Advanced use-cases

  1. Grafana and Prometheus are installed in the smm-system namespace and are available on an internal ingress gateway. When using smm dashboard, that internal ingress gateway is securely proxied from the cluster, so Service Mesh Manager will be accessible from localhost. Grafana and Prometheus is also available on the ingress gateway, and they are proxied on separate URIs. Grafana is accessible directly on http://127.0.0.1:50500/grafana, while Prometheus is available on http://127.0.0.1:50500/prometheus.

  2. Federating Istio metrics from the Prometheus instance of Service Mesh Manager. Currently it’s not possible to specify an external Prometheus instance to use when installing Service Mesh Manager. If you have other metrics on another Prometheus instance, and want to have them in one place, we suggest to set up federation between the two Prometheus instances.

2.8.4 - Integrated tracing in SMM

Distributed tracing is the process of tracking individual requests throughout their whole call stack in the system.

With distributed tracing in place it is possible to visualize full call stacks, to see which service called which service, how long each call took and how much were the network latencies between them. It is possible to tell where a request failed or which service took too much time to respond.

Service Mesh Manager uses Istio’s - and therefore Envoy’s - distributed tracing feature under the hood.

To collect and visualize this information Istio comes with tools like Jaeger, Zipkin, Lightstep and Datadog. Jaeger is the default tool and so far the most popular one.

Test integrated tracing

  • Jaeger is installed automatically by default when installing Service Mesh Manager.
  • The demo application uses golang services which are configured to propagate the necessary tracing headers.

When utilizing distributed tracing for your own services, you need to implement context propagation for yourself!

  • When load is sent to the application then traces can be perceived right away.
  • Jaeger is exposed through an ingress gateway and the links are present on the UI (both on the graph and list view).

Jaeger link from the graph view:

Distributed tracing graph Distributed tracing graph

Jaeger UI for the demo application:

Distributed tracing Jaeger 2 Distributed tracing Jaeger 2

There you can see the whole call stack in the microservices architecture. You can see when exactly the root request was started and how much each request took. For example, the analytics service took the most time out of the individual requests which is because it does some actual calculations (computes the value of Pi).

2.8.5 - Let's Encrypt support for gateways

Service Mesh Manager provides a streamlined way to use Let’s Encrypt with Istio-native Gateways. Compared to relying on Istio’s Ingress, this solution gives you greater flexibility when defining your own Ingress gateway.

Background

Service Mesh Manager solves this issue by allowing Cert Manager to issue TLS certificates directly to an Istio Gateway resource.

Prerequisites

To set up TLS certificates based on Let’s Encrypt for an Istio Gateway, the following requirements must be met:

  • cert-manager must be enabled for the deployment. Verify that the cert-manager namespace exits.
  • The deployment must be able to allocate a LoadBalancer type service, in addition to the existing Gateway
  • The deployment must expose an http endpoint on port 80
  • The DNS must be set up properly to point to the existing MeshGateway’s Service (we recommend using external-dns).

For example, if you have a namespace called my-app, where the traffic is exposed over TCP port 80 using http with the following MeshGateway and Gateway setup:

apiVersion: servicemesh.cisco.com/v1alpha1
kind: IstioMeshGateway
metadata:
    labels:
        app.kubernetes.io/instance: app
        app.kubernetes.io/name: app-ingress
    name: app-ingress
    namespace: my-app
spec:
    istioControlPlane:
        name: cp-v115x
        namespace: istio-system
    deployment:
       metadata:
          labels:
              app.kubernetes.io/instance: app
              app.kubernetes.io/name: app-ingress
              gateway-name: app-ingress
              gateway-type: ingress
       replicas:
          min: 1
          max: 1
          count: 1
    service:
      metadata:
        annotations:
          external-dns.alpha.kubernetes.io/hostname: my-app.example.org
      ports:
      - name: http2
        port: 80
        protocol: TCP
        targetPort: 8080
      type: LoadBalancer
    runAsRoot: true
    type: ingress
---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  labels:
    app.kubernetes.io/instance: app
    app.kubernetes.io/name: app-ingress
  name: app-ingress
  namespace: my-app
spec:
  selector:
    app.kubernetes.io/instance: app
    app.kubernetes.io/name: app-ingress
  servers:
  - hosts:
    - '*'
    port:
      name: http
      number: 80
      protocol: HTTP

Note: The VirtualService and other Istio resources are omitted from this example, but those are also required for the application to work properly.

Add TLS endpoints to the Gateway

Reconfigure the Gateway and the MeshGateway to support SSL.

  1. Add a new port needs to the MeshGateway resource to accept incoming traffic on port 443 (see the .spec.ports.[name=https] part):

    apiVersion: servicemesh.cisco.com/v1alpha1
    kind: IstioMeshGateway
    metadata:
      labels:
         app.kubernetes.io/instance: app
         app.kubernetes.io/name: app-ingress
      name: app-ingress
      namespace: my-app
    spec:
      istioControlPlane:
         name: cp-v115x
         namespace: istio-system
      deployment:
         metadata:
            labels:
               app.kubernetes.io/instance: app
               app.kubernetes.io/name: app-ingress
               gateway-name: app-ingress
               gateway-type: ingress
         replicas:
            min: 1
            max: 1
            count: 1
      service:
         metadata:
            annotations:
               external-dns.alpha.kubernetes.io/hostname: my-app.example.org
         ports:
         - name: http2
           port: 80
           protocol: TCP
           targetPort: 8080
         - name: https
           port: 443
           protocol: TCP
           targetPort: 8443
         type: LoadBalancer
      runAsRoot: true
      type: ingress
    
  2. Configure the Gateway to accept HTTPS traffic. You have to:

    • Instruct Istio to redirect any non-https traffic to the https endpoint (for details, see Known limitations):

          tls:
            httpsRedirect: true
      
    • And set up a new https endpoint:

        - hosts:
          - '*'
          port:
            name: https
            number: 443
            protocol: HTTPS
          tls:
            credentialName: app-ingressgateway-tls
            mode: SIMPLE
      

    The modified Gateway resource should look something like this:

    apiVersion: networking.istio.io/v1beta1
    kind: Gateway
    metadata:
      labels:
        app.kubernetes.io/instance: app
        app.kubernetes.io/name: app-ingress
      name: app-ingress
      namespace: my-app
    spec:
      selector:
        app.kubernetes.io/instance: app
        app.kubernetes.io/name: app-ingress
      servers:
      - hosts:
        - '*'
        port:
          name: http
          number: 80
          protocol: HTTP
        tls:
          httpsRedirect: true
      - hosts:
        - '*'
        port:
          name: https
          number: 443
          protocol: HTTPS
        tls:
          credentialName: app-ingressgateway-tls
          mode: SIMPLE
    

    Note: The secret in the example does not exist yet. This does not cause a configuration issue, but the TLS endpoint is not yet operational.

Configure cert-manager to provide TLS certificates

Configure Cert Manager to issue the TLS credentials into the app-ingressgateway-tls Secret.

  1. Create an Issuer that represents a connection to the Let’s Encrypt service:

    cat > issuer.yaml <<EOF
    ---
    apiVersion: cert-manager.io/v1
    kind: Issuer
    metadata:
      labels:
        app.kubernetes.io/instance: app
        app.kubernetes.io/name: app-issuer
      name: app-issuer
      namespace: my-app
    spec:
      acme:
        email: noreply@cisco.com
        preferredChain: ""
        privateKeySecretRef:
          name: app-letsencrypt-issuer
        server: https://acme-v02.api.letsencrypt.org/directory
        solvers:
        - http01:
            ingress:
              class: nginx
    EOF
    kubectl apply -f issuer.yaml
    

    Note: You can also use a cluster-wide issuer called ClusterIssuer if multiple Certificates are required for multiple namespaces.

  2. Check that the Issuer is working properly. Run the kubectl get issuer command. The Issuer should be in ready state:

    NAME          READY   AGE
    app-issuer    True    19h
    
  3. Create the Certificate object that instructs Cert Manager to maintain a certificate for a given domain name, and automatically refresh it if needed. You can use the following example, but note the following points:

    • The dnsNames section must contain all the host names for which TLS certificates must be present in the resulting secret. For each domain name listed here, there must be a properly set up DNS record pointing to the LoadBalancer’s external IP for the MeshGateway.

        dnsNames:
        - my-app.example.org
      
    • For Service Mesh Manager to know which Gateway to attach this certificate to, the following annotation must be put on the Certificate. This is a label matcher, that the Let’s Encrypt Operator uses to find the Gateways needed to be configured for TLS.

      metadata:
        annotations:
          acme.smm.cisco.com/gateway-selector: |
            {
              "app.kubernetes.io/instance": "app",
              "app.kubernetes.io/name": "app-ingress"
            }
      
    • The secretName field must match the value of the previously configured TLS secret in the Gateway resource.

    The following yaml includes all three requirements:

    cat > certificate.yaml <<EOF
    ---
    apiVersion: cert-manager.io/v1
    kind: Certificate
    metadata:
      annotations:
        acme.smm.cisco.com/gateway-selector: |
          {
            "app.kubernetes.io/instance": "app",
            "app.kubernetes.io/name": "app-ingress"
          }
      labels:
        app.kubernetes.io/instance: app
        app.kubernetes.io/name: app-certificate
      name: app-certificate
      namespace: my-app
    spec:
      dnsNames:
      - my-app.example.org
      duration: 2160h0m0s
      issuerRef:
        group: cert-manager.io
        kind: Issuer
        name: app-issuer
      privateKey:
        algorithm: RSA
        encoding: PKCS1
        size: 2048
      renewBefore: 360h0m0s
      secretName: app-ingressgateway-tls
      usages:
      - server auth
      - client auth
    EOF
    kubectl apply -f issuer.yaml
    
  4. Verify that the Certificate has been issued, by running the kubectl get certificate command. The secret should be in ready state:

    NAME                          READY   SECRET                   AGE
    app-certificate               True    app-ingressgateway-tls   19h
    

Known limitations

Istio’s http to https redirection allows for little configurability. As a result, the Let’s Encrypt challenge verification fails in the following case:

  • The gateway has named hosts (for example, '*' is not in the hosts list of the Gateway resource) AND
  • Automatic redirection to HTTPS is enabled in the tls settings of the http endpoint

In this case, try to use one of the following workarounds:

  • Use a Gateway that has '*' specified in its hosts list (this might mean that you need external load balancers).
  • Use the Kubernetes Ingress implementation IngressClass of Istio.
  • Disable the automatic https redirection and use application logic to do the redirect.

2.9 - Integrating Virtual Machines into the mesh

Istio service mesh primarily provides its features to Kubernetes-based resources. However, in some cases it makes sense to integrate bare metal machines or virtual machines into the mesh:

  • Temporary integration: To migrate the non-k8s native workloads into the mesh, providing temporary access to the machine’s network resources until the migration is done.
  • Long-term integration: Sometimes is impractical to migrate the given workload into Kubernetes due to its size (for example, when huge, bare metal machines are required) or when the workload is stateful and it is hard to support it on Kubernetes.

Service Mesh Manager provides support for both use-cases building on top of Istio’s support for Virtual Machines.

For an overview of how Service Mesh Manager implements VM Integration based on Istio’s framework, see Istio resources.

Architecture

Service Mesh Manager takes an automation-friendly approach to managing the virtual machines by providing an agent that runs on the machine. This component enables Service Mesh Manager to provide the same observability features for virtual machines as for native Kubernetes workloads, such as Topology view, Service/Workload overview, integrated tracing, or traffic tapping.

The agent continuously maintains the configuration of the machine so that any change in the upstream cluster is reflected in its configuration. This behavior ensures that if the meshexpansion-gateways IP addresses change, the machine retains the connectivity to the mesh.

In case the machine is available for an extended period of time, Istio must to be upgraded on the machines. The upgrade flow is aligned with the Upgrading your business applications that Service Mesh Manager uses for the Istio control plane upgrade: the agent ensures that the host has the latest version of Istio installed and provides a validation warning in case the istio process needs to be restarted.

When the virtual machine is part of the mesh, it is like to a Kubernetes pod. It belongs to a specific namespace, and cannot communicate with other namespaces. The name of the pod is the hostname of the virtual machine.

Ease of use

After a virtual machine has been integrated into the mesh, Service Mesh Manager automatically updates the configuration of the virtual machine to ensure that it remains a part of the mesh and receives every configuration updates it needs to operate in teh mesh. In addition, the observability features available for Kubernetes pods are available for the virtual machines as well, for example:

Getting started

To try out VM integration, we highly recommend using the VM integration quickstart guide.

For more details on Service Mesh Manager’s capabilities for handling machines, see Istio resources.

For detailed examples for more complex use-cases such as migrating an existing workload into the mesh, see the Use-cases section.

2.9.1 - Prerequisites

Before trying to attach a virtual machine to your mesh, make sure to:

You can attach VMs only to active Istio cluster that’s running the Service Mesh Manager controlplane.

Configuration prerequisites

To attach external machines, the Service Mesh Manager dashboard needs to be exposed so that smm-agent can fetch the required configuration data. For details, see Exposing the Dashboard.

Supported operating systems

Right now the following operating systems are verified to be working to be added to the mesh:

  • Ubuntu 20.04+ (64-bit)
  • RedHat Enterprise Linux 8 (64-bit)

However, any operating system using Deb or RPM package managers and systemd as init should be able to execute the same procedure.

Package dependencies

OS Required Packages Example Install Command
Ubuntu curl iptables sudo hostname apt-get install -y curl iptables sudo hostname
RHEL curl iptables sudo hostname yum install -y curl hostname iptables sudo

Network prerequisites

Because of the way Istio operates, the VM is only able to resolve services and DNS names from the same Kubernetes namespace as it’s attached to. This means that communication from the VM to other Kubernetes namespaces is not possible.

Cluster access to VM

The cluster must be able to access the following ports exposed from the VM:

  • TCP ports 16400, 16401
  • Every port you define for the workloadgroup

The Kubernetes clusters in the mesh must be able to access every port on the VM that is used to serve mesh-traffic. For example, if the VM runs a web server on port 80, then port 80 must be accessible from every pod in the member clusters. (The workloadgroup defined for the service should indicate that the service is available via port 80).

Determining the VM’s IP address

From the clusters point of view, the VM’s IP address may not be the IP address that appears on the network interfaces in the VM’s operating system. For example, if the VM is exposed via a load balancer instance of a cloud service provider, then the Service Mesh Manager clusters can reach the VM via the IP address (or IP addresses) of the load balancer.

While it is expected that the administrators integrating VM’s into the service-mesh have the ability to identify the VM’s IP from the point of view of the service-mesh, the fallback behavior of

The smm-agent application queries the https://ifconfig.me/ip site to determine the IP that the public internet sees for the VM. If the IP that the site returns is not the IP that the clusters in the service mesh should use to reach the VM, then set the VM’s IP address to use for the service mesh communication during the smm-agent setup.

Note: This document is not a comprehensive guide on how to expose the VMs' via IP.

VM access to cluster

Istio can work in two distinct ways when it comes to network topologies:

  • If the virtual machine has no direct connection to the pod’s IP addresses, it can rely on a meshexpansion gateway and use the different network approach. Unless latency is of uttermost importance, we highly recommend using this approach as it allows for more flexibility when it comes to attaching VMs for multiple separated networks.
  • If the virtual machine can access the pod’s IP addresses directly, then you can use the same network approach.

Different network

To configure the different network model, the WorkloadGroup’s .spec.network field must be set to a different network than the networks used by the current Istio deployment.

To check which network the existing Istio control planes are attached to, run the following command:

kubectl get istiocontrolplanes -A

The output should be similar to:

NAMESPACE      NAME       MODE     NETWORK    STATUS      MESH EXPANSION   EXPANSION GW IPS                 ERROR   AGE
istio-system   cp-v115x   ACTIVE   network1   Available   true             ["13.48.73.61","13.51.88.187"]           9d

Istio uses the network1 network name, so set the WorkloadGroup’s network setting to something different, such as vm-network-1.

Firewall settings

From the networking perspective the machines should be able to access:

  • the meshexpansion-gateways, and
  • the exposed dashboard ports.
  1. To get the IP addresses of meshexpansion-gateways, check the services in the istio-system namespace:

    kubectl get services -n istio-system istio-meshexpansion-cp-v115x
    

    The output should be similar to:

    NAME                                    TYPE           CLUSTER-IP     EXTERNAL-IP                                                               PORT(S)                                                                                           AGE
    istio-meshexpansion-cp-v115x            LoadBalancer   10.10.82.80    a4b01735600f547ceb3c03b1440dd134-690669273.eu-north-1.elb.amazonaws.com   15021:30362/TCP,15012:31435/TCP,15017:30627/TCP,15443:32209/TCP,50600:31545/TCP,59411:32614/TCP   9d
    
  2. To get the IP addresses of exposed dashboard ports, check the services in the smm-system namespace:

    kubectl get services -n smm-system smm-ingressgateway-external
    

    The output should be similar to:

    smm-ingressgateway-external      LoadBalancer   10.10.153.139   a4dcb5db6b9384585bba6cd45c2a0959-1520071115.eu-north-1.elb.amazonaws.com                   80:31088/TCP
    
  3. Configure your firewalls.

    1. Make sure that the DNS names shown in the EXTERNAL-IP column are accessible from the VM instances.

Same network

To configure the same network model, the WorkloadGroup’s .spec.network field must be set to the same network as the one used by the current Istio deployment.

To check which network the existing Istio control planes are attached to, run the following command:

kubectl get istiocontrolplanes -A

The output should be similar to:

NAMESPACE      NAME       MODE     NETWORK    STATUS      MESH EXPANSION   EXPANSION GW IPS                 ERROR   AGE
istio-system   cp-v115x   ACTIVE   network1   Available   true             ["13.48.73.61","13.51.88.187"]           9d

Istio is using the network1 network name, so set the WorkloadGroup’s network setting to network1, too.

2.9.2 - Quickstart

To get started with integrating virtual machines to your service mesh using Service Mesh Manager, we strongly recommend to add the first virtual machine by following the examples in this documentation and using the Service Mesh Manager demo application. That allows you to get familiar with the process without the added complexity of dealing with a real application.

The high-level steps of the procedure are the following:

  1. If the dashboard of your Service Mesh Manager deployment is not available via a public URL, expose it. Otherwise, the virtual machines won’t be able to retrieve their Istio configuration from the control plane. For details, see Exposing the Dashboard.
  2. Complete the rest of the Prerequisites using the different network approach, and set up your firewall.
  3. Complete the procedure described in Add a virtual machine to the mesh.

2.9.3 - Dashboard

The virtual machines that you integrate into the mesh also become available on the Service Mesh Manager dashboard. Workloads running on virtual machines are treated as regular Kubernetes workloads, with the following differences.

  • On the MENU > TOPOLOGY page, workloads running on virtual machines are shown as workloads with the Virtual machine workload icon in their corner.

    VM Topology VM Topology

  • On the MENU > WORKLOADS page, workloads running on virtual machines are marked with the Virtual machine workload icon.

    VM Workloads VM Workloads

  • When drilling down into the details of workloads, workloads running on virtual machines have a WorkloadEntry level instead of the Pod and Node levels of Kubernetes workloads.

  • On the HEALTH details of the workloads, CPU and memory saturation data is labeled as VM SATURATION: CPU and VM SATURATION: MEMORY.

  • When using traffic tapping to a workload running on virtual machine, the name of the pod in the output is actually the hostname of the virtual machine.

2.9.4 - Use-cases

2.9.4.1 - Add a virtual machine to the mesh

This guide shows you how to manually add a virtual machine to a workload, using the analytics-v1 workload of the Service Mesh Manager demo application as an example.

If you already understand the procedure and want to configure your virtual machine to be automatically added to the mesh, see Autoscaling VM groups.

Prerequisites

  • You already have a virtual machine available.
  • You have root access to the virtual machine.
  • You have completed the Prerequisites.
  • If you are using the different network approach make sure to set up your firewall.
  • If you want to exactly replicate the steps of this guide for testing purposes, install the Service Mesh Manager demo application on your cluster, and the jq tool on your computer.

If you are performing this procedure on a clean installation of the Service Mesh Manager demo application, the topology view of the smm-demo namespace should look similar to this:

Topology of the demo application running on a single cluster

Scale down the analytics service

Note: If you are not using the demo application to test VM integration, skip this step. You can install the demo application on your Service Mesh Manager cluster by running smm demoapp install

This step and the examples in other steps of this guide rely on the analytics-v1 workload of the Service Mesh Manager demo application.

  1. Scale it down to have zero replicas:

    kubectl scale deploy -n smm-demo analytics-v1 --replicas=0
    
  2. Verify that there are no pods belonging to the analytics deployment. The following command should return an empty response:

    kubectl get pods -n smm-demo | grep analytics
    

Add an external workload to the mesh

The attached machines will behave as Kubernetes workloads. This means that it will have a set of labels assigned that could be used by Services to match the machine. All Kubernetes Pods have a service account assigned (usually the default one in the namespace if not specified otherwise). The machine uses this service account to authenticate to the Istio control plane.

To add an external workload to the mesh, create a WorkloadGroup in the namespace where the machine will be attached to. This object represents a group of machines serving the same service. This is analogous to the Kubernetes concept of a Deployment.

For example, to add a virtual machine serving the analytics traffic in the demo application, use the following object:

apiVersion: networking.istio.io/v1alpha3
kind: WorkloadGroup
metadata:
  labels:
    app: analytics
    version: v0
  name: analytics-v0
  namespace: smm-demo
spec:
  metadata:
    labels:
      app: analytics
      version: v0
  probe:
    httpGet:
      path: /
      host: 127.0.0.1
      port: 8080
      scheme: HTTP
  template:
    network: vm-network-1
    ports:
      http: 8080
      grpc: 8082
      tcp: 8083
    serviceAccount: default

For details on these settings, see Exposing the Dashboard.

mTLS settings (optional)

After Istio is started on the virtual machine, Istio takes over the service ports defined in the WorkloadGroup resource. Depending on your settings, it will also start enforcing mTLS on those service ports.

If external (non-mesh) services communicate with the virtual machine, ensure that communication without encryption is permitted on the service ports. To do so, create a PeerAuthentication object in the smm-demo namespace. Make sure that the matchLabels selector only matches the WorkloadGroup, and not any other Kubernetes deployment, to avoid permitting unencrypted communication where it’s not needed. In the following example, the matchLabels selector includes both the app and the version labels.

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: analytics
  namespace: smm-demo
spec:
  mtls:
    mode: PERMISSIVE
  selector:
    matchLabels:
      app: analytics
      version: v0

We recommend setting the mTLS mode to PERMISSIVE as it allows unencrypted traffic. The in-mesh traffic will use mTLS.

Note: In case of any compatibility issues, you can set the mode to DISABLE, but it this case all traffic will be un-encrypted.

Set up the virtual machine

Required packages on the virtual machines

In addition to the OS package dependencies prerequisites, this example requires python3. To install the python3 package, run the following command:

  • On Ubuntu:

    apt-get update && apt-get install -y python3
    
  • On RHEL:

    yum install -y python3
    

Executing the Example VM-based Service Workload

Before you can register the virtual machine, the workload must already be running on the VM. The following instructions start an example HTTP server workload on the virtual machine.

In the example using the demo application, open a terminal on the machine (for example, using SSH), and start a simple web server serving files from the empty-dir (the demo application requires only the availability of http://<pod-id>:8080/**).

Note: The nohup shell command keeps python3 and http.server running when the shell is logged out.

mkdir -p empty-dir
cd empty-dir
nohup python3 -m http.server 8080 &

Collect the required data

To attach the VM to the mesh, you’ll need the following information:

  • The URL of the dashboard
  • The namespace and name of the WorkloadGroup (smm-demo.analytics-v1 in the example)
  • The bearer token of the service account referenced in the .spec.template.serviceAccount of the WorkloadGroup
  • (Optional) The IP address that the clusters in the service mesh can use to access the VM. If this is the same as the IP the public internet sees for the VM, then Service Mesh Manager detects the VM’s IP automatically.

To acquire the bearer token of the ServiceAccount, complete the following steps.

  1. On Kubernetes 1.24 and newer, the token secrets for service accounts are not created automatically. Create the token manually. For details, see the Kubernetes documentation.

  2. Download and run the following script. This script fetches the bearer token for the service account in namespace SA_NAMESPACE with the name of SA_SERVICEACCOUNT and saves it into the ~/bearer-token file.

    
    

    Note: If you are running the script on a OpenShift ROSA cluster, at line 13 in the script, please replace the ‘jq’ output string with '.items[1].data.token | @base64d'

Prepare the virtual machine

To prepare the virtual machine to be attached to the mesh, complete the following steps.

  1. Open a terminal (for example, SSH) to the virtual machine.

  2. Install smm-agent on the virtual machine. The agent ensures that the machine’s Istio configuration is always up-to-date. Run the following command as the root user:

    curl http://<dashboard-url>/get/smm-agent | bash
    

    The output should be similar to:

    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                    Dload  Upload   Total   Spent    Left  Speed
    100  1883  100  1883    0     0  13744      0 --:--:-- --:--:-- --:--:-- 13744
    Detecting host properties:
    - OS: linux
    - CPU: amd64
    - Packager: deb
    - SMM Base URL: http://a6bc8072e26154e5c9084e0d7f5a9c92-2016650592.eu-north-1.elb.amazonaws.com
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                    Dload  Upload   Total   Spent    Left  Speed
    100 20.3M    0 20.3M    0     0  22.2M      0 --:--:-- --:--:-- --:--:-- 22.2M
    Selecting previously unselected package smm-agent.
    (Reading database ... 63895 files and directories currently installed.)
    Preparing to unpack /tmp/smm-agent-package ...
    Unpacking smm-agent (1.9.1~snapshot.202203311042-SNAPSHOT-ab2e8684a) ...
    Setting up smm-agent (1.9.1~snapshot.202203311042-SNAPSHOT-ab2e8684a) ...
    Created symlink /etc/systemd/system/multi-user.target.wants/smm-agent.service → /lib/systemd/system/smm-agent.service.
    ✓ dashboard url set url=<dashboard-url>
    
  3. Specify which WorkloadGroup and namespace you want to attach the machine to by running the following command:

    smm-agent set workload-group <namespace> <workloadgroup>
    

    For example:

    smm-agent set workload-group smm-demo analytics-v0
    

    The output should be similar to:

    ✓ target workload group set namespace=smm-demo, name=analytics-v0
    
  4. Specify the bearer token you have acquired in a previous step (replace <token> with the actual token, not the filename):

    smm-agent set bearer-token <token>
    

    The output should be similar to:

    ✓ bearer token set
    
  5. (Optional) Set the IP address the service mesh should use to access the VM.

    smm-agent set node-ip <VM's IP>
    

    The output should be similar to:

    ✓ node-ip is set ip=<VM's IP>
    
  6. Validate the configuration of smm-agent by running the following command. If the configuration is invalid an error is shown.

    smm-agent show-config
    

    The output should be similar to:

    ✓ dashboard url=http://a6bc8072e26154e5c9084e0d7f5a9c92-2016650592.eu-north-1.elb.amazonaws.com
    ✓ target workload-group namespace=smm-demo, name=analytics-v0
    ✓ no additional labels set
    ✓ bearer token set
    ✓ node-ip is set
    ✓ configuration is valid
    

Attach the virtual machine to the mesh

Now that you have started the workload (HTTP server) and configured smm-agent, you can attach the VM to the mesh. To do so, run a reconciliation on this host. This step will:

  • configure and start Istio, so the virtual machine becomes part of the mesh
  • ensure that the cluster configuration is properly set
  • start smm-agent in the background so that the system is always up-to-date

Run the following command:

smm-agent reconcile

The output should be similar to:

✓ reconciling host operating system
✓ configuration loaded config=/etc/smm/agent.yaml
✓ install-pilot-agent ❯ downloading and installing OS package component=pilot-agent, platform={linux amd64 deb 0xc00000c168}
✓ install-pilot-agent ❯ downloader reconciles with exponential backoff downloader={pilot-agent {linux amd64 deb 0xc00000c168} true  0xc0002725b0}
...
✓ systemd-ensure-smm-agent-running/systemctl ❯ starting service args=[smm-agent]
✓ systemd-ensure-smm-agent-running/systemctl/start ❯ executing command command=systemctl, args=[start smm-agent], timeout=5m0s
✓ systemd-ensure-smm-agent-running/systemctl/start ❯ command executed successfully command=systemctl, args=[start smm-agent], stdout=, stderr=
✓ changes were made to the host operating system
✓ reconciled host operating system

Verify connectivity

If the attachment was successful, a new WorkloadEntry has been created for the new node in the namespace of the WorkloadGroup (if you followed the examples, in the smm-demo namespace). Verify it by completing the following steps.

  1. Check that the new WorkloadEntry exists:

    kubectl get workloadentries -n smm-demo
    

    The output should be similar to:

    NAME                                    AGE     ADDRESS
    analytics-v0-3.68.232.96-vm-network-1   2m40s   3.68.232.96
    
  2. Check the healthiness of the service:

    kubectl describe workloadentries analytics-v0-3.68.232.96-vm-network-1 -n smm-demo
    

    The output should be similar to:

    Name:         analytics-v0-3.68.232.96-vm-network-1
    Namespace:    smm-demo
    Labels:       app=analytics
    ...
    Status:
      Conditions:
        Last Probe Time:       2022-04-01T05:47:47.472143851Z
        Last Transition Time:  2022-04-01T05:47:47.472144917Z
        Status:                True
        Type:                  Healthy
    
    
  3. On the Service Mesh Manager dashboard, navigate to MENU > TOPOLOGY and verify that the VM is visible and that it is getting traffic. If you have performed this procedure on a clean installation of the Service Mesh Manager demo application, the difference on the topology view of the smm-demo namespace is that the analytics-v1 workload is now running on a virtual machine (indicated by the blue icon on the workload), and should look similar to this:

    Topology page with VMs

2.9.4.2 - VM to Kubernetes migration

When migrating an existing workload to the mesh (and Kubernetes), you have to complete the following main steps:

  1. Add the virtual machine to the mesh, so the original workload that is running in the virtual machine is available in the mesh.
  2. Configure traffic shifting that will allow you to route traffic from the virtual machine to the Kubernetes workload.
  3. Add the Kubernetes workload that will replace the virtual machine.
  4. Shift traffic to Kubernetes gradually and test that the Kubernetes workload works properly, even under high load.
  5. Remove the virtual machine when you have successfully completed the migration.

Note: The configuration examples use the analytics-v1 workload of the demo application. Adjust them as needed for your environment.

Add the VM to the mesh

Complete the prerequisites and attach the virtual machine to the mesh as described in Add a virtual machine to the mesh.

Set up traffic shifting

The migration must be a controlled process, especially in case of a production system. This step ensures that the entire traffic goes to the virtual machine even when the Kubernetes workload is started. This method avoids service disruptions and allows you to test the Kubernetes workload, and transfer the traffic gradually. Otherwise, traffic would be split in round-robin fashion between the VM and the Pod.

For details on creating the routing rule, see Routing.

Make sure to set the weight of the routing rule corresponding to the virtual machine to 100.

traffic-shift traffic-shift

Alternatively, create and apply Kubernetes resources similar to the following:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: analytics
  namespace: smm-demo
spec:
  host: analytics.smm-demo.svc.cluster.local
  subsets:
  - labels:
      version: v0
    name: v0
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: analytics-smm-demo-lmj7m
  namespace: smm-demo
spec:
  hosts:
  - analytics.smm-demo.svc.cluster.local
  http:
  - route:
    - destination:
        host: analytics.smm-demo.svc.cluster.local
        port:
          number: 8080
        subset: v0
      weight: 100

Add the Kubernetes workload

Now that you have guaranteed that the traffic will be still flowing to the virtual machine, you can deploy the new workload that is going to replace the virtual machine.

In this example we just upscale the analytics-v1 deployment (that was scaled down as part of Adding a VM to the mesh).

kubectl scale deploy -n smm-demo analytics-v1 --replicas=1

Wait until it’s up:

kubectl get pods -n smm-demo

The output should be similar to:

NAME                                READY   STATUS    RESTARTS   AGE
analytics-v1-7b96898ddc-9czpp       2/2     Running   0          18s
bombardier-66786577f7-tnjll         2/2     Running   0          18h
bookings-v1-7d8d76cd6b-68h6s        2/2     Running   0          18h
catalog-v1-5864c4b7d7-fvnqs         2/2     Running   0          18h
database-v1-65678c5dd6-lr2hh        2/2     Running   0          18h
frontpage-v1-776d76965-zbx67        2/2     Running   0          18h
movies-v1-6f7958c8c4-76ksk          2/2     Running   0          18h
movies-v2-568d4c4f4b-nrtkm          2/2     Running   0          18h
movies-v3-84b4887764-h2bzv          2/2     Running   0          18h
mysql-58458785-d4wx7                2/2     Running   0          18h
notifications-v1-544d6f77f7-jcdq6   2/2     Running   0          18h
payments-v1-7c955bccdd-l2czq        2/2     Running   0          18h
postgresql-75b94cdc9c-h6w64         2/2     Running   0          18h

Note: The workload will not show up on the topology view, as it does not receive any traffic yet.

For production systems, verify that the workload functions as expected before routing traffic to it.

Shifting traffic

Now that the new Kubernetes workload is functional, route a portion of the traffic to it, and verify that it is working as expected. For details on creating the routing rule, see Routing.

traffic-shift traffic-shift

Alternatively, adjust the related Kubernetes resources, for example:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: analytics
  namespace: smm-demo
spec:
  host: analytics.smm-demo.svc.cluster.local
  subsets:
  - labels:
      version: v0
    name: v0
  - labels:
      version: v1
    name: v1
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: analytics-smm-demo-lmj7m
  namespace: smm-demo
spec:
  hosts:
  - analytics.smm-demo.svc.cluster.local
  http:
  - route:
    - destination:
        host: analytics.smm-demo.svc.cluster.local
        port:
          number: 8080
        subset: v0
      weight: 90
    - destination:
        host: analytics.smm-demo.svc.cluster.local
        port:
          number: 8080
        subset: v1
      weight: 10

Repeat this step to gradually increase the traffic to the Kubernetes workload.

Completing the migration

If you have verified that the mixed setup works, change the traffic shifting to route 100% of the traffic to the Kubernetes workload.

If the Kubernetes workload is handling 100% of the traffic without problems, you can:

2.9.4.3 - Upgrading Istio

Istio regularly gets security updates (patch version updates) and new features (minor/major version updates). Regarding upgrades, Service Mesh Manager uses the same approach for virtual machines integrated into the mesh as for Kubernetes workloads. For details, see Upgrading your business applications.

Patch version updates

In case the Kubernetes deployment is upgraded to a new version of Service Mesh Manager that contains a newer (patch) version of Istio, the smm-agent running on the host will:

  • Automatically upgrade the smm-agent and restart it.
  • Automatically upgrade Istio, but does not restart it.

Upgrading the smm-agent and restarting it ensures that Service Mesh Manager configures Istio the best possible way according to the latest tests. Since smm-agent does not serve live traffic, it does not endanger the availability of the production environment.

Restarting Istio would case a service disruption that is not acceptable in production environments. Given that there’s no standard way of determining how to temporarily drain traffic from a VM, or even to check if the VM has a Highly Availability, you must restart Istio when you see fit, for example, during a dedicated maintenance window.

The Service Mesh Manager dashboard shows the virtual machines that you need to restart as a validation error for the given WorkloadEntry.

The old Istio version keeps running until you restart the VM (or Istio itself). The new version start up automatically after the restart.

To restart Istio, run the following command on the virtual machine:

systemctl stop istio

Minor/major version updates

When the namespace hosting the VM is migrated to a new version of the control plane (see Upgrading your business applications), smm-agent automatically notices that a new version of Istio is available.

At this point it executes the same steps as with patch version updates, but you must restart Istio (or the virtual machine), when traffic characteristics allow for that downtime.

2.9.4.4 - Autoscaling VM groups

Add a virtual machine to the mesh details how to add a VM manually to the mesh. However, Service Mesh Manager also allows for automated addition as part of any autoscaling activity, such as relying on AWS’s AutoScaling Groups or Google Cloud’s Managed Instance Groups.

One way to achieve mesh membership is by just adding the commands mentioned in Add a virtual machine to the mesh to the init/cloud-init script of the VM to start at boot time. However, if the VM image is custom built using packer or any other solution, you can embed an already configured smm-agent into the image.

Prerequisites

Build a VM image for automatic Istio attachment

During the build process of the virtual machine image, execute the following commands as the root user. Replace the parameters between brackets with suitable values for your environment (replace <token> with the actual token, not the filename).

curl http://<dashboard-url>/get/smm-agent | bash
smm-agent set workload-group <namespace> <workloadgroup-name>
smm-agent set bearer-token <token>
systemctl enable smm-agent

This is a subset of the steps required by the manual registration. These steps ensure that:

  • smm-agent is installed on the VM image
  • smm-agent is configured to be able to connect to the mesh
  • smm-agent is scheduled to be started on the next startup (systemctl enable smm-agent)

When a new instance of this VM image starts, smm-agent contacts the Kubernetes cluster running in the mesh, downloads the current version of Istio, and starts it as soon as it’s fully configured.

This approach ensures that newly started VMs run the right version of Istio.

2.9.4.5 - Maintenance

To perform scheduled maintenance on the virtual machine (for example, a restart), you can use one of the following methods to stop traffic to the machine. If you want to completely remove the VM from the mesh, see Remove VM from the mesh.

Shut down the service

This first approach is to shut down the service the WorkloadGroup has health checks defined against. As a result, Istio will not route any traffic to the virtual machine.

De-register the VM from the mesh

Another approach is open a terminal to the virtual machine, and stop the following services to de-register the virtual machine from the mesh:

systemctl stop smm-agent    # smm-agent would restart istio automatically if it's not running
systemctl stop istio

After you have finished the maintenance, re-register the VM.

Re-register the VM to the mesh

After you have performed the required maintenance, complete the following steps.

  1. Run the following command to re-register the VM to the mesh:

    systemctl start smm-agent
    

    The smm-agent will automatically synchronize the Istio configuration from the mesh’s cluster and start Istio.

  2. Verify that the WorkloadEntry for the virtual machine has been re-created in the namespace of the workload.

    Check that the new WorkloadEntry exists:

    kubectl get workloadentries -n smm-demo
    

    The output should be similar to:

    NAME                                    AGE     ADDRESS
    analytics-v0-3.68.232.96-vm-network-1   2m40s   3.68.232.96
    

2.9.4.6 - Remove VM from the mesh

If you have successfully migrated your workload from a virtual machine to a Kubernetes workload, or if the virtual machine is not needed in the mesh anymore, you can uninstall Istio and smm-agent from the virtual machine.

Disconnect the VM from the mesh

Note: If you want to decommission the virtual machine, you can simply delete the instance. Disconnecting from the mesh is only needed if you want to keep using the virtual machine without having smm-agent and Istio running.

  1. To remove Istio from the virtual machine, stop the background services, the smm-agent service must be stopped first to prevent istio and node-exporter restarting by smm-agent:

    systemctl stop smm-agent && systemctl stop istio && systemctl stop smm-node-exporter
    

    The last command will not just stop Istio, but will also cause the VM’s WorkloadEntry to be removed from the VM’s namespace.

  2. Use the package manager of the virtual machine’s operating system to remove the istio-sidecar and the smm-agent packages. For example:

    • On Ubuntu-based systems:

      dpkg -r istio-sidecar smm-agent
      
    • On RedHat-based systems:

      rpm -e istio-sidecar smm-agent
      
  3. Remove smm-agent downloads cache:

    rm -f /var/cache/smm-agent/downloads/*
    

Remove Kubernetes resources

Remove the associated WorkloadGroup and PeerAuthentication objects from your workload’s namespace.

2.9.5 - Istio resources

When adding an external workload to the mesh, there are two crucial Istio resources that are used.

  • WorkloadGroup needs to be created in the namespace where the machine will be attached to. This object represents a group of machines serving the same service. This is analogous to the Kubernetes concept of a Deployment.
  • Each virtual machine attached to the mesh will be represented by a WorkloadEntry object in the workload’s namespace. This is analogous to the Pod concept of Kubernetes.

The VM attachment flow used in Service Mesh Manager relies on the PILOT_ENABLE_WORKLOAD_ENTRY_AUTOREGISTRATION and PILOT_ENABLE_WORKLOAD_ENTRY_HEALTHCHECKS features.

Autoregistration

To understand the autoregistration feature first take a look at a WorkloadGroup resource:

  apiVersion: networking.istio.io/v1alpha3
  kind: WorkloadGroup
  metadata:
    labels:
      app: analytics
      version: v1
    name: analytics-v1
    namespace: smm-demo
  spec:
    metadata:
      labels:
        app: analytics
        version: v1
    template:
      ports:
        http: 8080
      serviceAccount: default

If autoregistration is enabled, the Istio pilot-agent running on the virtual machine connects to the istio-meshexpansion-gateway in the istio-system namespace and presents the specified ServiceAccount’s bearer token (and some registration details that Service Mesh Manager sets automatically) to authenticate itself to the Istio control plane. If the authentication is successful the Istio control plane creates a WorkloadEntry in the cluster, like this:

apiVersion: networking.istio.io/v1beta1
kind: WorkloadEntry
metadata:
  annotations:
    istio.io/autoRegistrationGroup: analytics-v1
    istio.io/connectedAt: "2022-03-31T06:52:14.739292073Z"
    istio.io/workloadController: istiod-cp-v115x-df9f5d556-9kvqs
  labels:
    app: analytics
    hostname: ip-172-31-22-226
    istio.io/rev: cp-v115x.istio-system
    service.istio.io/canonical-name: analytics
    service.istio.io/canonical-revision: v1
    topology.istio.io/network: vm-network-1
  name: analytics-v1-3.67.91.181-vm-network-1
  namespace: smm-demo
  ownerReferences:
  - apiVersion: networking.istio.io/v1alpha3
    controller: true
    kind: WorkloadGroup
    name: analytics-v1
    uid: d01777d5-4294-44e7-a311-3596c2f63bb1
spec:
  address: 1.2.3.4
  labels:
    app: analytics
    hostname: ip-172-31-22-226
    istio.io/rev: cp-v115x.istio-system
    service.istio.io/canonical-name: analytics
    service.istio.io/canonical-revision: v1
    topology.istio.io/network: vm-network-1
  locality: eu-central-1/eu-central-1a
  network: vm-network-1
  serviceAccount: default

Any attached machine that has a corresponding WorkloadEntry resource behaves as a Kubernetes workload, and has a set of labels assigned that could be used by Services to match the machine.

For example, the following Service will route traffic to the virtual machine due to the .spec.selector matching the WorkloadEntry’s labels (.metadata.labels):

apiVersion: v1
kind: Service
metadata:
  name: analytics
  namespace: smm-demo
spec:
  ports:
  - name: http
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    app: analytics
  sessionAffinity: None
  type: ClusterIP

Autoregistration is also crucial when removing Workloads from the mesh. In case the istio sidecar process is stopped on the host, the Istio control plane automatically removes the related WorkloadEntry custom resource. This can be used to temporarily remove a VM for maintenance or troubleshooting purposes from the mesh, but it also ensures that if Istio is uninstalled from the node, it automatically de-registers itself without needing to manually update any Kubernetes resources.

Health checks

The PILOT_ENABLE_WORKLOAD_ENTRY_HEALTHCHECKS setting provided by Istio allows health checks to be defined for VMs. If the health check fails, Istio will not route any traffic to the workload.

In case of Service Mesh Manager, the health checks are defined in the WorkloadGroup resource and our agent running on the VM ensures that Istio uses that setting. For example, the following WorkloadGroup defines an HTTP health check:

apiVersion: networking.istio.io/v1alpha3
kind: WorkloadGroup
metadata:
  labels:
    app: analytics
    version: v1
  name: analytics-v1
  namespace: smm-demo
spec:
  metadata:
    labels:
      app: analytics
      version: v1
  probe:
    httpGet:
      host: 127.0.0.1
      path: /
      port: 8080
      scheme: HTTP
  template:
    network: vm-network-1
    serviceAccount: default

The .spec.probe’s definition is the same as the Probe object of the official Kubernetes API. The defined probe is analogous to the liveliness probe of the Pod: it is checked constantly while Istio is running on the machine. The only difference is that Istio will not restart the VM if the probe fails, instead it stops routing any traffic to the WorkloadEntry.

You can query the status of the health checks from Kubernetes by checking the machine’s WorkloadEntry:

apiVersion: networking.istio.io/v1beta1
kind: WorkloadEntry
metadata:
  annotations:
    istio.io/autoRegistrationGroup: analytics-v1
    istio.io/connectedAt: "2022-03-31T06:52:14.739292073Z"
    istio.io/workloadController: istiod-cp-v115x-df9f5d556-9kvqs
  labels:
    app: analytics
    hostname: ip-172-31-22-226
    istio.io/rev: cp-v115x.istio-system
    service.istio.io/canonical-name: analytics
    service.istio.io/canonical-revision: v1
    topology.istio.io/network: vm-network-1
  name: analytics-v1-3.67.91.181-vm-network-1
  namespace: smm-demo
  ownerReferences:
  - apiVersion: networking.istio.io/v1alpha3
    controller: true
    kind: WorkloadGroup
    name: analytics-v1
    uid: d01777d5-4294-44e7-a311-3596c2f63bb1
spec:
  address: 1.2.3.4
  labels:
    app: analytics
    hostname: ip-172-31-22-226
    istio.io/rev: cp-v115x.istio-system
    service.istio.io/canonical-name: analytics
    service.istio.io/canonical-revision: v1
    topology.istio.io/network: vm-network-1
  locality: eu-central-1/eu-central-1a
  network: vm-network-1
  serviceAccount: default
status:
  conditions:
  - lastProbeTime: "2022-03-31T07:23:07.236758604Z"
    lastTransitionTime: "2022-03-31T07:23:07.236759090Z"
    status: "True"
    type: Healthy

In the status field of the custom resource, the conditions array contains an entry with the type field set to Healthy. If the same objects status is set to True, then the machine is considered healthy and will receive traffic.

2.9.6 - Known limitations

Before trying to attach a virtual machine to the mesh, make sure that you understand the following limitations of the solution:

  • Because of the way Istio operates, the VM is only able to resolve services and DNS names from the same Kubernetes namespace as it’s attached to. This means that communication from the VM to other Kubernetes namespaces is not possible.

  • When you are installing Istio on the node for the first time, network connections might be disrupted (TCP reconnections will happen) as Istio initializes the iptables. Prepare for such a micro-outage when first provisioning Istio on a production node.
  • You can attach VMs only to active Istio cluster that’s running the Service Mesh Manager controlplane.

VMs in active-active scenarios

If you are running Service Mesh Manager in a multi-cluster scenario with multiple active Istio controlplanes, note that:

  • You can attach VMs only to active Istio cluster that’s running the Service Mesh Manager controlplane.

  • The VMs can’t automatically reconnect to the mesh expansion gateway when there is an outage on the cluster to which the VM is connected.

    After the outage, outdated IP addresses remain in the /etc/hosts file of the VM, so the VM can’t connect to any of the mesh expansion gateways. Update the IP addresses manually in /etc/hosts file of the VM to the current IPs of a mesh expansion gateway to restore traffic to the VM. You can get the Ingress IP address of an IstioMeshGateway by running the following command:

    kubectl get istiomeshgateways.servicemesh.cisco.com -n <namespace-of-the-meshgateway> <name-of-the-meshgateway> -o jsonpath='{.status.GatewayAddress[0]}'
    

2.10 - Operations

Service Mesh Manager addresses the whole cloud-native lifecycle of a service mesh based solution by providing various tools starting from day 0 to day 2 operations. Such a solution requires quite many components to provide the core service mesh functionality, for example, tracing, metrics or safe canary-based deployments (just to name a few).

When it comes to Day 2 operations, quite a lot of optimizations need to be done on the deployment, so that it is performant and provides adequate alerting to the end user in case of issues.

This section contains the most common procedures required to ensure the availability of Service Mesh Manager and the underlying system.

2.10.1 - Scaling and performance tuning

Kubernetes provides the features to achieve rapid elasticity on the managed clusters. Service Mesh Manager and Istio rely on the features provided by Kubernetes to support differently-sized deployments ranging from a few hundred Pods and Services to supporting clusters loaded with thousands of them.

When it comes to modern (microservice-based) systems there are two challenges regarding the performance optimization:

  • First, it’s hard or even impossible to tell what are the best numbers and settings for a given deployment. This is due to the fact that you are no longer controlling the underlying hardware: any assumption on how many CPUs a workload requires is a speculation due to the various instruction per clock (IPC) and performance characteristics of modern CPUs. The same is true for Memory requirements: in case of Istio or Service Mesh Manager, the memory usage depends on the number of running Pods and the number of alerts, and due to rapid elasticity, it will most likely change during the lifecycle of the deployed solution or cluster.

  • The other challenge comes with the microservice architecture itself: the end user measured the performance of a given solution by experiencing the speed at which an HTTP call is served. Given that, in the background, multiple services are talking to each other to serve a single request, you do not just need to understand the performance characteristics of each service, but also the end-to-end call flow that resulted in the request being served.

Demo application with failure signals Demo application with failure signals

Service Mesh Manager

To address the challenges described in the previous section, Service Mesh Manager provides:

  • the necessary control to set the right performance parameters for Istio
  • the necessary tools to understand how a single workload behaves
  • the necessary tools (tracing and traffic taps) to examine end-to-end flows real-time

This guide covers all aspects of performance tuning, starting from generic workload performance tuning to how to fine-tune Service Mesh Manager to best suit the given deployment’s performance characteristics:

2.10.1.1 - Finding bottlenecks

Except for out-of-memory errors, most performance issues manifest as either increased response times (or processing times for queue-based systems) or increased error rates. Service Mesh Manager provides built-in observability features that help find out where these issues are.

Usually, investigations for these bottlenecks are triggered from two primary sources:

  • an alert fires on a specific service (see Service Level Objectives for setting this up), or
  • some end-to-end measurement (such as measuring response times at the ingress gateway side) shows increased values.

The first case is easier to deal with, as the faulty service is (most likely) already identified. If this is the case, refer to the scaling a specific Workload section.

Finding the root cause of an issue detected at ingress side

Service Mesh Manager provides two major features to find these bottlenecks in your system: the health subsystem and traces.

Using the health subsystem

The health subsystem provides automated outlier analysis for your Workloads and Services by learning their previous behavior and highlighting any diverging patterns. When trying to find the root cause of bottlenecks in the system, you can use the topology view to pinpoint any issues, by pulling up the topology chart of your solution:

Topology view highlighting an error

In this example, the catalog’s v1 Workload is highly likely the culprit. Of course, this is a simple setup, where it is visible from the topology view what service is problematic.

In microservice environments failures usually propagate either downstream or upstream, like when considering the following topology:

Topology view multiple issues

You can see that the following services might be affected by the issue:

  • frontpage
  • bookings
  • analytics
  • catalog
  • movies

Given that the demoapplication is relatively small compared to a real-life solution consisting of hundreds of microservices, it is possible to check the details of those services and execute the Workload specific scaling tasks on them. However, the best practice is to only check the services that are either at the top of the call-chain or at the end of the call-chain.

In this case this means checking the frontpage, the analytics and the movies services. This is due to the fact, that either some of the microservices in the deeper regions of the call-tree are behaving differently (such as the analytics service being too slow and slowing down the whole transaction processing), or the frontpage’s behavior has changed a bit.

The health framework has, however, a significant limitation: it can only detect outliers that are new. If some issue has been happening for the last week, then the health system had already learned that behavior and thus it would report that as normal.

Using Traces

Service Mesh Manager has built-in support for tracing, with minimal changes required at application level to enable this feature. When the first approach does not work well, use the traffic tap feature by setting the filter to the namespace or Workload that handles the ingress for the cluster.

Given that is already integrated into the system, the operators can always look for requests with slow response times and traces available, and find the service causing the bottleneck there.

This is especially true, if a the performance degradation is caused by not a single service being too slow to respond, but rather by some downstream services being called too many times, which will be obvious from the traces.

2.10.1.2 - Scale a specific Workload

Before getting into specifics on how Service Mesh Manager helps with detecting any issues, let us summarize how scaling works in Kubernetes. When it comes to Pods, there are two basic resources (omitting the storage aspect for now) that can be specified and limited: memory and available CPU shares. The CPU and Memory usage of a Pod is the sum of resources required by the containers inside the Pods.

Detecting issues regarding insufficient memory is usually easy: if a container wants to use more memory than allowed, it goes into OOMKilled status, as the operating system cannot give it more memory. To ensure the container is operational, the first thing to do in such cases is to increase the memory available for the container, so that it can start. Any investigation can continue after the system is already operational.

CPU resources are by nature different from memory: the underlying operating system can rely on the kernel’s scheduler to give less CPU cycles for a given container instead of terminating it. The amount of work that the given container wanted to execute, but did not get the CPU shares to do is called CPU throttle. If a container’s CPU throttle value is more than zero, the container is processing requests and data at a slower rate than it would be able to do due to a CPU starvation issue.

Horizontal and vertical scalability

A workload scales horizontally when adding more Pods to the workload increases the total performance or the number of bandwidth/requests the given workload can serve. An example for such workloads are stateless API servers, where adding one more Pod to the workload increases the throughput almost linearly, as more “workers” are serving the same incoming traffic. Horizontal scalability is desired in Kubernetes environments as most Kubernetes deployments can dynamically provision new Kubernetes worker nodes, thus allowing for more Pods to be executed on the cluster.

A workload scales vertically when its performance can only be increased by allocating more resources (memory, CPU) to the given workload. This is usually the case with database engines (such as MySQL, PosgreSQL, or even Prometheus). For example, for MySQL it is easier to increase the amount of CPU and Memory available for the database workload, than to start setting up multi-primary replication of the database to allow for some horizontal scalability.

The main issue with vertical scalability is that Kubernetes nodes always have a hard limit on the amount of maximum CPUs and Memory available. In case a workload requires more resources than what the nodes can architecturally provide, then the workload is never scheduled, causing an outage. That’s why horizontal scalability is preferred over vertical scalability.

Scale a specific workload

To find the right strategy for scaling a workload, you must first understand if the workload scales horizontally or vertically.

Note: Usually, workloads exhibit both vertical and horizontal scaling properties. For an example on a horizontally scalable service, see Scale the Istio Control Plane. A typical vertically scaling service is Prometheus, see Scale the built-in Prometheus.

Determine memory requirements

If a Workload is running out of memory, it is instantly visible as the pod is killed instantly. In such cases, before starting to analyze the memory usage patterns, try to significantly (by 50%) increase the requests/limits of the workload. This might help to (at least temporarily) recover the system, minimizing the perceived outage by the end user. It also shows if the current incident is caused by insufficient resource allocation, or it is a systematic issue.

The graph showing the memory usage of the Pods can be also used to determine what is the minimum required amount of memory for the Workload to operate properly:

Memory saturation Memory saturation

Since this only shows data for the last few hours, if you don’t know the scaling behavior of service, complete the following tests to better understand it:

  • Generate synthetic load on the service to see how the Pod’s memory utilization changes over time
  • Upscale the Workload to have more replicas, and then apply the same synthetic load, so that it is visible how memory utilization scales

Determine CPU requirements

In case of CPU, the kernel of the node can decide to not give CPU cycles if the given container is over its limit. These starvation issues mean that any operation that the container would want to execute is executed slower, resulting in longer processing times or bigger response times.

To check if this is happening, navigate to the Workload and check the HEALTH > SATURATION: CPU.

CPU saturation CPU saturation

In this example, it is visible that the database container is most likely responding slower than if it could have used the 0.6 vCPU cycles to serve additional requests. In such cases, upscale the container at least with the amount visible in the throttled seconds graph to avoid performance issues.

Auto-scaling a workload vertically

If a Workload only scales vertically, usually it is recommended to find a safe limit where the Workload is performant enough, and set up alerts in case the given metric (Memory or CPU) gets close to the limit.

Another solution is to use a Vertical Pod Autoscaler (VPA), however, this has some limitations and it does not solve the issue of Kubernetes nodes having a finite set of resources.

The most important limitation is that usually stateful services such as databases are vertically scalable, but in order to change the resource limits the Pods must be restarted. If the database in question cannot handle this failover, then a small outage happens.

Auto-scaling a workload horizontally

It is generally recommended to run multiple instances of horizontally scalable Workloads with small CPU and Memory usage. Kubernetes provides built-in support for automatically adjusting the number of the Pods serving the Workloads based on the CPU or Memory metrics.

Using Horizontal Pod Autoscaler

To use the Horizontal Pod Autoscaling support in Kubernetes, complete the following steps.

  1. Install the metrics-server which is responsible for providing resource usage metrics to the autoscaler.

    NOTE: Service Mesh Manager does not install the metrics-server by default, as this is a cluster-global component.

    To verify that the underlying Kubernetes cluster has metrics-server installed, check if it is running in the kube-system namespace (assuming default installation):

    kubectl get pods -n kube-system | grep metrics-server
    metrics-server-588cd8ddb5-k7cz5                                        1/1     Running   0          4d7h
    

    If the cluster does not have metrics-server running, you can install it by running the following command:

    kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
    
  2. If metrics-server is available, the Horizontal Pod Autoscaler (HPA) tries to set the number of replicas for Deployments and StatefulSets so that the actual average memory or CPU utilization is a given percentage of the requests on the Pods of the Workload.

    For this to work, set the requests and limits for memory and CPU for the Pods of the Workload based on the previous sections.

  3. Afterward, use the HorizontalPodAutoscaler resource configure the scaling parameters. The following example configures the HPA to dynamically scale the istiod-cp-v115x Deployment in the current namespace, so that the average CPU utilization of all Pods compared to its resource requests is at 80%.

    apiVersion: autoscaling/v1
    kind: HorizontalPodAutoscaler
    metadata:
      name: istiod-autoscaler-cp-v115x
      namespace: istio-system
    spec:
      maxReplicas: 5
      minReplicas: 1
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: istiod-cp-v115x
      targetCPUUtilizationPercentage: 80
    

NOTE: If the Workload scales up slowly (for example, because of slow startup times), you can increase the limits to allow some additional headroom until the scaling is complete.

2.10.1.3 - Scale the Istio proxy sidecar of a workload

Istio has two major components:

  • The proxy is responsible for handling the incoming traffic, and is injected into the Pods of every Workload.
  • The Control Plane is responsible for coordinating the proxies, by sending them the latest configuration.

This section focuses on the performance characteristics of the proxy. For details on the control plane, see Scale the Istio Control Plane.

For details on the proxy’s performance we recommend checking out the upstream Istio documentation on performance:

Proxy bottlenecks

The proxy sidecar is just a container running in all of the Istio-enabled Workloads. In this sense we highly recommend checking the Health page for any Workload that exhibits high latencies for any CPU throttling in the proxy sidecar, as that might mean performance left on the table.

Please note that the amount of Workloads, Services and if namespace isolation is enabled hugely affects the memory requirements of the proxy as it needs to store all the configuration to access those.

Setting resource limits via Control Plane

Service Mesh Manager provides two ways to change resource limits. The easiest one is to change the ControlPlane resource by running the following commands:

cat > istio-cp-limits.yaml <<EOF
spec:
 meshManager:
   istio:
     proxy:
       resources:
         requests:
           cpu: 50m
           memory: "64M"
         limits:
           cpu: "100m"
           memory: "128M"
EOF

kubectl patch controlplane --type=merge --patch "$(cat istio-cp-limits.yaml )" smm
  • If you are using Service Mesh Manager in Operator Mode, then the Istio deployment is updated automatically.
  • If you are using the imperative mode, run the smm operator reconcile command to apply the changes.

Setting resource limits via the Istio Resource

If the deployment needs more control over the Istio behavior then the IstioControlPlane resource in the istio-system namespace must be changed. Besides any settings (resources and images) defined in the ControlPlane resource any modifications will be preserved even if the operator reconcile command is invoked or if the Service Mesh Manager is deployed in Operator Mode.

For more details on this approach, see our Open Source Istio operator.

2.10.1.4 - Scale the Istio Control Plane

Istio has two major components:

  • The proxy is responsible for handling the incoming traffic, and is injected into the Pods of every Workload.
  • The Control Plane is responsible for coordinating the proxies, by sending them the latest configuration.

This section focuses on the performance characteristics of the Control Plane. For details on the proxies, see Scale the Istio proxy sidecar of a workload.

Find the Control Plane

By default, Service Mesh Manager installs the Control Plane into the istio-system namespace:

kubectl get pods -n istio-system

The output should be similar to:

NAME                                                    READY   STATUS    RESTARTS   AGE
istio-meshexpansion-gateway-cp-v115x-5d647494bb-qnn8b   1/1     Running   0          3h57m
istiod-cp-v115x-749c46569-ld8fn                         1/1     Running   0          3h57m

From this list the istiod-cp-v115x-749c46569-ld8fn Pod is the Control Plane for Istio for version 1.15.3 (v115x). The other pod is used to provide cross-cluster connectivity.

In practice, these Pods are provisioned from an IstioControlPlane resource in the same namespace:

kubectl get istiocontrolplanes -A

The output should be similar to:

NAMESPACE      NAME       MODE     NETWORK    STATUS      MESH EXPANSION   EXPANSION GW IPS   ERROR   AGE
istio-system   cp-v115x   ACTIVE   network1   Available   true             ["172.18.250.1"]           42h 

When using the Service Mesh Manager dashboard for analyzing the resource usage, check these Workloads.

Understand scaling behavior

For the Control Plane, the CPU usage scales horizontally: if you add more pods to Kubernetes cluster, istiod requires more CPU resources to send the configuration to all of those proxies. By adding more istiod Pods, the number of workloads the service mesh can support increases.

On the other hand, each istiod Pod needs to maintain the inventory of all the running Pods, Services, and Istio custom resources, thus as more of these added to the Kubernetes cluster, each Pod will need more memory. So from the memory consumption point of view, the service behaves as if it had vertical scalability.

This is quite a common situation where a service exhibits both vertical and horizontal scalability. In such cases, focus on the horizontal scalability properties, as long as the vertically scaling part (CPU or Memory) can be somewhat contained.

According to the upstream Istio documentation:

When namespace isolation is enabled, a single Istiod instance can support 1000 services, 2000 sidecars with 1 vCPU and 1.5 GB of memory. You can increase the number of Istiod instances to reduce the amount of time it takes for the configuration to reach all proxies.

When deciding on how to scale the workload, it is worth looking at the backing Kubernetes nodes. Usually the nodes are smaller ones: they have a few CPU cores (2-4) and 8-16 GB of memory. For the sake of this example, let’s say we have Kubernetes nodes with 2 vCPUs and 8GB RAM available.

In this case, giving 1 vCPU to Istio allocates 50% of the available CPU resources, while giving it 1.5 GB only reserves 18% of the available memory.

  • If we want to scale vertically, based on memory, we can only double the resource utilization before we exhaust all of the available CPUs on the underlying node.
  • With horizontal scaling with 1 vCPU allocated to each Istio control plane instance, we can add 8 Istios if needed before we exhaust the number of available resources on a single node, assuming that both memory and CPU utilization is scaling linearly.

Set resource limits via Control Plane

Service Mesh Manager provides two ways to change resource limits. The easiest one is to change the ControlPlane resource by running the following commands:

cat > istio-cp-limits.yaml <<EOF
spec:
 meshManager:
   istio:
     pilot:
       resources:
         requests:
           cpu: 500m
           memory: "1500M"
         limits:
           cpu: "1"
           memory: "2000M"
EOF

kubectl patch controlplane --type=merge --patch "$(cat istio-cp-limits.yaml )" smm
  • If you are using Service Mesh Manager in Operator Mode, then the Istio deployment is updated automatically.
  • If you are using the imperative mode, run the smm operator reconcile command to apply the changes.

Note: By default, the Istio Control Plane has a HPA set up with the minimum of 1 and the maximum of 5 Pods.

Set resource limits via the Istio Resource

If the deployment needs more control over the Istio behavior, then the IstioConrolPlane resource in the istio-system namespace must be changed. Besides any settings (resources and images) defined in the ControlPlane resource, any modifications will be preserved even if the operator reconcile command is invoked, or if the Service Mesh Manager is deployed in Operator Mode.

For more details on this approach, see the Open Source Istio operator

2.10.1.5 - Scale SMM Control Plane

Service Mesh Manager follows the microservice architecture pattern. This document details what are the scaling properties of the different microservices Service Mesh Manager is built on.

These services (by default) are installed into the smm-system namespace, thus the procedures described in Scale a specific Workload apply to them.

This document omits a few services as they have minimal resource requirements even for large-scale deployments, thus no tuning is necessary for those.

API services

To provide the dashboard functionality, Service Mesh Manager relies on a set of GraphQL API servers. These servers are only used when the dashboard is being used.

Their Memory usage scales linearly with the number of Workloads, Services, and other Kubernetes objects. Since they cache some Kubernetes objects, we recommend setting their Memory limits based on actual measurements specific to the current workload in the cluster.

Note: As Service Mesh Manager can be used to monitor Workloads that are not part of the Istio mesh, the resource utilization of these services depends on the total number of Kubernetes Workloads, not just the Istio-enabled ones.

Component Usage Resource setting in ControlPlane
smm-health-api Provides Health scores on the dashboard .spec.smm.health.api.resources
smm-sre-api Provides SLO access on the dashboard .spec.smm.sre.api.resources
smm Provides Istio management .spec.smm.application.resources
smm-federation-gateway Aggregates the API server’s APIs .spec.smm.federationGateway.resources

For example, to set the resource requirements of smm-health-api, run the following commands:

cat > change-health-resources.yaml <<EOF
spec:
 smm:
   health:
     api:
       resources:
         requests:
           cpu: 500m
           memory: "1500M"
         limits:
           cpu: "1"
           memory: "2000M"
EOF

kubectl patch controlplane --type=merge --patch "$(cat change-health-resources.yaml )" smm
  • If you are using Service Mesh Manager in Operator Mode, then the Istio deployment is updated automatically.
  • If you are using the imperative mode, run the smm operator reconcile command to apply the changes.

Health controller

The health controller is responsible for collecting outlier detection data for all Services and Workloads in the Cluster running Service Mesh Manager. The health controller is implemented in a way that it cannot scale horizontally. Use the Service Mesh Manager dashboard to find out the right CPU and Memory requirements for this component.

The health controller’s resource requirements can be set in the ControlPlane CR’s .spec.smm.health.controller.resources key as shown in the API services example.

Note: The health controller increases the resource usage of Prometheus by approximately 30%. You can disable the outlier detection system using the .spec.smm.health.enabled setting.

SRE Controller

The SRE controller is responsible for SLO measurement and alerting. The SRE controller is implemented in a way that it cannot scale horizontally. Use the Service Mesh Manager dashboard to find out the right CPU and Memory requirements for this component.

The SRE controller’s resource requirements can be set in the ControlPlane CR’s .spec.smm.sre.controller.resources key as shown in the api services example.

Another component belonging to the alerting subsystem is the smm-sre-alert-exporter that helps Service Mesh Manager visualizing the historical alerting data. This service has small footprint, but if it needs to be scaled it can be done using the .spec.smm.sre.alertExporter.resources settings.

Other components

As you can see from the list of Pods running in the smm-system namespace, Service Mesh Manager uses other components as well. Refer to the definition of the ControlPlane resource to check where to set the resource requirements of those parts.

For details, see The ControlPlane Custom Resource

You can get the current CRD by running the following command:

kubectl get crd controlplanes.smm.cisco.com -o yaml

2.10.1.6 - Scale the built-in Prometheus

Service Mesh Manager relies on Prometheus (and Thanos in HA mode) to store historical values of metrics and calculate Topology, SLOs, or Health scores. Usually for large scale deployments (clusters with many Workloads and Services), Prometheus uses the most resources.

Furthermore, if Prometheus does not have enough resources, the dashboard might become slow or even unresponsive. In such cases, start scaling the Prometheus instance.

Scale Prometheus

Prometheus (as configured in Service Mesh Manager) can only be scaled vertically. To find out the right resource limits, follow the scaling a specific workload procedure.

Note: If CPU throttling is present for Prometheus, then usually this is the primary reason for the slow Service Mesh Manager dashboard.

Monitor resolution

Besides the conventional methods for scaling Prometheus, note that its resource usage depends on the amount of data ingested (among many other things). Service Mesh Manager by default configures Prometheus to have a 5s resolution (that is, to have a data point every five seconds), which is great for initial experimentation and spotting small spikes in the metrics.

For large scale deployments, we highly recommend to decrease the resolution to 15s or 30s, which essentially decreases CPU and Memory usage by 66% (15s) or 83% (30s).

To change the monitoring resolution of Prometheus, run the following commands.

cat > change-prometheus-scraping-frequency.yaml <<EOF
spec:
  smm:
    prometheus:
      scraping:
        frequency:
            interval: 15s
            timeout: 15s
EOF

kubectl patch controlplane --type=merge --patch "$(cat change-prometheus-scraping-frequency )" smm
  • If you are using Service Mesh Manager in Operator Mode, then the Istio deployment is updated automatically.
  • If you are using the imperative mode, run the smm operator reconcile command to apply the changes.

Note: When using persistent storage for Prometheus, the resource utilization will decrease slower than if it had started with an empty database, because Service Level Objectives still access data with higher resolution from the past.

Scale Thanos

For HA setups, Service Mesh Manager uses thanos-query for metric deduplication. The service scales horizontally and is usually CPU bound.

By default, Service Mesh Manager provisions a Horizontal Pod Autoscaler so that it can scale dynamically based on resource usage. To set up the resource requests/limits of Thanos, use the .spec.smm.prometheus.thanos.query.resources of the ControlPlane custom resource as detailed in SMM scaling.

2.10.2 - SLO-based alerting in Production

Introduction

For a production system, its monitoring solution should be the most reliable component in the whole environment, both from data persistence and reliability perspective. The monitoring solution work with reliable data so the engineers can trust its notifications.

When using Service Mesh Manager to measure and monitor your systems' Service Level Objectives (SLOs), you need to configure the following items to have this kind of reliability.

  1. To achieve data consistency, the Prometheus deployment that collects the metrics and measures the SLOs should use persistent volumes. For details, see Set up Persistent Volumes for Prometheus.

  2. To achieve operational reliability, configure Service Mesh Manager to use Prometheus in a highly available (HA) mode. For details, see Set up High Availability for the Monitoring Stack.

  3. Configure Service Mesh Manager to use an Alertmanager deployment.

    • If you already have an Alertmanager deployment, you can configure Service Mesh Manager to use it. For details on setting up the connection to you existing alert manager, see Use an existing Alertmanager.
    • If you don’t have and existing Alertmanager deployment, or you want to use a separate deployment, you can still use the prometheus-operator built in to Service Mesh Manager. For details, see Deploy a new Alertmanager.

2.10.2.1 - Set up Persistent Volumes for Prometheus

Each Prometheus instance needs a persistent volume (PV) to store the recorded metrics.

To set up such persistent volumes, modify the Service Mesh Manager control plane custom resource to allow for acquiring such volumes. Complete the following steps.

  1. Find which storage classes does your Kubernetes cluster support. Log in to your cluster, then run the following command:

    kubectl get storageclass                                    (kubernetes-admin@turip-host/smm-system)
    NAME            PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
    gp2 (default)   kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer   false                  7h17m
    

    Select the appropriate storage class from the output list.

    The previous example shows the default AWS settings for EKS or PKE clusters, in this case it is safe to use gp2 volumes.

  2. Set up the Persistent Volume.

    1. To enable Persistent Volumes for Prometheus, first customize this YAML patch file:

      
      
    2. Save this patch file as persistent-storage.patch.yaml.

    3. Enter the Storage Class name obtained in the previous step into the storageClassName property, and set the storage space required by updating the value of spec.smm.prometheus.storage.spec.resources.requests.storage.

  3. Service Mesh Manager is controlled by a ControlPlane custom resource found in the Service Mesh Manager’s namespace (default: smm-system) named smm. To update the Custom Resource with the patch, run the following command.

    kubectl patch controlplane --type=merge --patch "$(cat persistent-storage.patch.yaml)" smm
    controlplane.smm.cisco.com/smm patched
    
  4. If you are using Service Mesh Manager in operator mode, skip this step.

    Otherwise, execute a reconciliation so Service Mesh Manager updates your Kubernetes cluster to the desired state described by the ControlPlane Custom Resource. Run the following command:

    smm operator reconcile
    
  1. Verify that Persistent Volumes exist. To verify that Prometheus has the physical volumes attached, run the following command:

    kubectl get pv -n smm-system
    NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                                                   STORAGECLASS   REASON   AGE
    pvc-79cc1a86-bcfa-46a9-a0e7-24e38d001d29   50Gi       RWO            Delete           Bound    smm-system/prometheus-smm-prometheus-db-prometheus-smm-prometheus-0   gp2                     7m10s
    

    The list shows that a Physical Volume (named smm-system/prometheus-smm-prometheus-db-prometheus-smm-prometheus-0 in the example) has been created and attached (Bound status).

Next steps

If you haven’t done yet, Set up High Availability for the Monitoring Stack.

2.10.2.2 - Set up High Availability for the Monitoring Stack

To ensure that Prometheus is always available, enable the High Availability mode of Service Mesh Manager. That way the monitoring stack uses multiple Prometheus instances and Thanos Query to ensure metric consistency and high availability. Complete the following steps.

  1. Enable high availability. Service Mesh Manager is controlled by a ControlPlane custom resource found in the Service Mesh Manager’s namespace (default: smm-system) named smm. The following commands change the spec.smm.highAvailability.enabled value to true, indicating that you want to have a Highly Available deployment.

    Run the following commands:

    cat > enable-ha.yaml <<EOF
    spec:
      smm:
        highAvailability:
          enabled: true
    EOF
    
    kubectl patch controlplane --type=merge --patch "$(cat enable-ha.yaml)" smm
    
  2. If you are using Service Mesh Manager in operator mode, skip this step.

    Otherwise, execute a reconciliation so Service Mesh Manager updates your Kubernetes cluster to the desired state described by the ControlPlane Custom Resource. Run the following command:

    smm operator reconcile
    
  1. Verify that High Availability is enabled. To verify that Prometheus is running in Highly Available mode, check the pods running in the smm-system namespace for multiple Prometheus instances. Run the following command.

    kubectl get pods -l app=prometheus -n smm-system
    NAME                                READY   STATUS    RESTARTS   AGE
    prometheus-smm-prometheus-0   5/5     Running   1          21m
    prometheus-smm-prometheus-1   5/5     Running   1          21m
    

    If two running pods are present then the High Availability setup has been completed.

Next steps

Configure Service Mesh Manager to use an existing or a new Alertmanager deployment.

2.10.2.3 - Deploy a new Alertmanager

SLO-based alerting requires you to configure Service Mesh Manager to use an Alertmanager deployment. This procedure describes how to deploy a new Alertmanager for Service Mesh Manager.

Note: If you want to use an existing Alertmanager deployment instead, see Use an existing Alertmanager.

Prerequisites

In this article an example configuration is provided that is suitable for use with Slack alerts. To configure this notification, a Slack account and a Slack Incoming Webhook is required.

To configure additional notification targets, see the Prometheus Alertmanager documentation.

Steps

  1. Set up Alertmanager. You can find a basic alerting setup in this example. Download the file and replace the %SLACK_API_URL% string with your Slack Incoming Webhook URL.

  2. Apply the resulting file by running this command:

    kubectl apply -f alertmanager.yaml -n smm-system
    
  3. Verify that Alertmanager is running. Run the following command:

    kubectl get pods -n smm-system -l app=alertmanager
    NAME                                    READY   STATUS    RESTARTS   AGE
    alertmanager-smm-alertmanager-0   3/3     Running   0          62s
    alertmanager-smm-alertmanager-1   3/3     Running   0          62s
    

    You should see two instances of Alertmanager in Running state.

  4. Configure Prometheus to use this Alertmanager. Service Mesh Manager is controlled by a ControlPlane custom resource found in the Service Mesh Manager’s namespace (default: smm-system) named smm.

    The following command changes the spec.smm.prometheus.alertmanager value to connect to the Alertmanagers started by the YAMLs specified in the previous steps. Run the following commands:

    cat > enable-new-alertmanager.yaml <<EOF
    spec:
      smm:
        prometheus:
          enabled: true
          alertmanager:
          static:
            - host: alertmanager-smm-alertmanager-0.alertmanager-operated.smm-system.svc.cluster.local
              port: 9093
            - host: alertmanager-smm-alertmanager-1.alertmanager-operated.smm-system.svc.cluster.local
              port: 9093
    EOF
    
    kubectl patch controlplane --type=merge --patch "$(cat enable-new-alertmanager.yaml)" smm
    
  5. If you are using Service Mesh Manager in operator mode, skip this step.

    Otherwise, execute a reconciliation so Service Mesh Manager updates your Kubernetes cluster to the desired state described by the ControlPlane Custom Resource. Run the following command:

    smm operator reconcile
    

Example Alertmanager configuration

The following is a basic alerting setup for Alertmanager. Download the file and replace the %SLACK_API_URL% string with the your Slack Incoming Webhook URL.


2.10.2.4 - Use an existing Alertmanager

SLO-based alerting requires you to configure Service Mesh Manager to use an Alertmanager deployment. This procedure describes how to configure Service Mesh Manager to use an existing Alertmanager deployment.

Note: If you don’t have and existing Alertmanager deployment, or you want to use a separate deployment, see Deploy a new Alertmanager.

Prerequisites

Prometheus Alertmanager installed and configured.

Note: We recommend configuring alert grouping in the following way:

   route:
     receiver: 'slack-notifications'
     group_by: [ service, severity ]

That way you can route notifications based on severity, using the severity label of Service Mesh Manager-generated alerts.

Steps

To configure Service Mesh Manager to use an existing Alertmanager deployment, complete the following steps.

  1. Download the following configuration snippet as enable-alertmanager.yaml.

    
    
  2. Replace the your-alert-manager-X-host part with your Alertmanager’s fully qualified domain name, and the your-alert-manager-X-port with the port Alertmanager is listening on.

  3. Service Mesh Manager is controlled by a ControlPlane custom resource found in the Service Mesh Manager’s namespace (default: smm-system) named smm.

    The following command changes the spec.smm.prometheus.alertmanager value to connect to the existing Alertmanagers. Run the following command:

    kubectl patch controlplane --type=merge --patch "$(cat enable-alert-manager.yaml)" smm
    
  4. If you are using Service Mesh Manager in operator mode, skip this step.

    Otherwise, execute a reconciliation so Service Mesh Manager updates your Kubernetes cluster to the desired state described by the ControlPlane Custom Resource. Run the following command:

    smm operator reconcile
    
  1. In case your Alertmanagers are not part of your service mesh setup, create destination rules in the smm-system namespace to allow communication with your Alertmanagers.

2.10.3 - Global DNS

Istio doesn’t have provide a global DNS service. There are many different schemes for supporting DNS with multiple clusters, the latest is to enable the DNS-proxy feature in Istio. DNS-proxy basically intercepts DNS for the services in the mesh (that is, for every service with a sidecar), and resolves the request if it targets another service in the mesh, and uses the normal pod DNS scheme otherwise.

By default, multi-cluster deployments of Service Mesh Manager don’t do anything special for DNS, and do not enable DNS-proxy. Instead, Service Mesh Manager relies on the DNS setup of the pod and the cluster.

This means that to use a service that is part of the service mesh but resides in another cluster, you have to add a Kubernetes service (with no endpoints) to the local cluster. Note that you have to use a Service, do not use a ServiceEntry (which requires global DNS).

2.10.4 - GitOps

GitOps is a way of implementing Continuous Deployment for cloud native applications. Based on Git and Continuous Deployment tools, GitOps provides a declarative way to store the desired state of your infrastructure and automated processes to realize the desired state in your production environment.

For example, to deploy a new application you update the repository, and the automated processes perform the actual deployment steps.

When used in operator mode, Service Mesh Manager works flawlessly with GitOps solutions such as Argo CD, and can be used to declaratively manage your service mesh. For a detailed tutorial on setting up Argo CD with Service Mesh Manager, see Install SMM - GitOps - single cluster.

2.11 - Security

2.11.1 - Zero Trust Security

Zero Trust Security with Service Mesh Manager

Service Mesh Manager brings the zero trust security to modern apps. The zero trust security model is based on the following points:

  • don’t trust any system, not even the ones within your network
  • verify the identity, authorization, and tool that the user or service uses before establishing trust,
  • grant only the minimal access needed for the task or functionality.

Service Mesh Manager gives you the following security controls to your service mesh to achieve zero trust security:

Control ingress and egress traffic at the edge

To protect your applications and your sensitive data, you need to have a control over the incoming traffic from external sources. Service Mesh Manager allows you to:

  • create and manage ingress and egress gateways,
  • apply rate-limiting to mitigate denial-of-service attacks,
  • encrypt and mutually authenticate traffic on the gateway,
  • permit egress traffic only to specific endpoints to prevent data exfiltration.

Authenticate, authorize, and encrypt all connections

To make zero trust security work, every connection should be authenticated, authorizes, and encrypted. Service Mesh Manager uses mTLS encryption within the mesh to protect data-in-motion on all service-to-service connections, and can do the same for all ingress and egress traffic. If you need to create a FIPS-compliant service mesh, you can use the FIPS-compliant builds of Service Mesh Manager and Istio that use only FIPS 140-2 compliant cipher suites.

For fine-grained authorization control, you can use Istio Authorization Policies to set access control rules on workloads in the mesh. You can also restrict outbound traffic for workloads and namespaces, manually or automatically.

Health monitoring, logging, and tracing

For maintenance and security monitoring, Service Mesh Manager allows you to configure health monitoring and alerts for your services and workloads to quickly and easily notice issues that might cause a service outage. In addition, Service Mesh Manager gives you quick access to pod logs, traces, and Prometheus metrics.

2.11.2 - Authentication

Service Mesh Manager leverages the Kubeconfig, the official client libraries and the Kubernetes API to perform authentication and authorization for its users.

If you’re allowed to add, edit or delete specific Istio custom resources, you’ll have the same permissions from Service Mesh Manager as well.

Authentication overview

The authentication flow consists of the following steps:

  • The CLI extracts authentication credentials from the user’s Kubeconfig the same way kubectl would do
  • The CLI sends these credentials (client certificate or bearer token) to Service Mesh Manager during the login process
  • Service Mesh Manager validates these credentials against the Kubernetes API Server Service Mesh Manager doesn’t store these credentials afterwards)
  • Once the credentials are proved to be valid Service Mesh Manager generates it’s own ID token (JWT) and encodes relevant user information in it
  • The user - in possession of the ID token - can then use the token to authenticate against Service Mesh Manager until it expires
  • Service Mesh Manager will send subsequent requests to the API server with impersonation headers set to the user’s name and groups to delegate Authorization entirely to Kubernetes

Test authentication

Dashboard access

smm dashboard

When you open the dashboard through the recommended way of typing smm dashboard, you’re seamlessly authenticated with your Kubeconfig, logged in automatically and redirected to a browser tab with the Service Mesh Manager Dashboard open.

Login

smm login

You can explicitly log in any time using the smm login command, which gives you a short lifetime (10s), encrypted token to use over the UI login window.

Troubleshooting

The ID token will be saved to the current context’s config to reuse for subsequent CLI commands for efficiency. You can check or edit this config any time using the smm config get, smm config edit commands respectively.

Once the token expires (10h) the CLI performs a new login automatically within the next command.

If the token seems to be invalid for any reason you can always reauthenticate with the smm login command.

Anonymous mode

Service Mesh Manager provides a way to disable user authentication and use its own service account token for all communication with the Kubernetes API server.

Use the --anonymous-auth flag of the install command to disable authentication.

smm install --anonymous-auth

2.11.3 - Mutual TLS

TLS authentication overview

Istio offers mutual TLS as a solution for service-to-service authentication.

Note: For FIPS-compliant TLS settings, see FIPS-compliant service mesh.

Istio uses the sidecar pattern, meaning that each application container has a sidecar Envoy proxy container running beside it in the same pod.

  1. When a service receives or sends network traffic, the traffic always goes through the Envoy proxies first.
  2. When mTLS is enabled between two services, the client side and server side Envoy proxies verify each other’s identities before sending requests.
  3. If the verification is successful, then the client-side proxy encrypts the traffic, and sends it to the server-side proxy.
  4. The server-side proxy decrypts the traffic and forwards it locally to the actual destination service.

isito-mtls isito-mtls

In Service Mesh Manager, you can manage the mTLS settings:

Change service-specific mTLS settings using the UI

To configure service-specific mTLS settings using the UI, complete the following steps. You can change the mTLS settings of a namespace or the entire service mesh from the command line.

  1. Select the service on the MENU > TOPOLOGY or MENU > SERVICES page.

  2. Select MTLS POLICIES. You can configure the MTLS policy you want to use independently for each port of the service. The following policies are available:

    • STRICT: The service can accept only mutual TLS traffic.
    • PERMISSIVE: The service can accept both plaintext/unencrypted traffic and mutual TLS traffic at the same time.
    • DISABLED: The service can accept plaintext/unencrypted traffic only.
    • DEFAULT: Use the global MTLS policy.

    mtls-set mtls-set

  3. Select APPLY CHANGES.

  4. When a load is sent to the service, you can verify whether the traffic between your services is actually encrypted or not on the MENU > TOPOLOGY page by selecting EDGE LABELS > security.

    Either red open locks or green closed ones are displayed between the services in the UI, indicating non-encrypted or encrypted traffic between the services.

Change mTLS settings using PeerAuthentication

You can change the mTLS settings of a workload, namespace, or the entire service mesh from the command line using the PeerAuthentication custom resource. For example, the following CR disables mTLS for the “catalog” service.

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: catalog
  namespace: smm-demo
spec:
  mtls:
    mode: DISABLE

For other examples, see the PeerAuthentication documentation.

2.11.4 - FIPS-compliant service mesh

Note: The default version of Service Mesh Manager is built with the standard SSL libraries. To use a FIPS-compliant version of Istio, see Install FIPS images.

FIPS compliance overview

The Istio distribution of Service Mesh Manager provides compliance with the rules for cryptographic modules of FIPS 140-2 Security Level 1. To achieve this, Service Mesh Manager provides the following measures:

  • Service Mesh Manager is built using a FIPS-compliant library (BoringCrypto).
  • Envoy is built with the same FIPS-compliant library (BoringCrypto).
  • Service Mesh Manager delivers a custom Istio build, using the same FIPS-compliant library (BoringCrypto).
  • For certificate management, Service Mesh Manager uses a version of cert-manager built with the same FIPS-compliant library (BoringCrypto).
  • Service Mesh Manager is tested with FIPS 140-2 compliant cipher suites (and rejects anything else).
  • Although FIPS 140 allows other ciphers, Service Mesh Manager only GCM ciphers are enabled, because only those can prevent the SSL LUCKY13 timing attack.
  • BoringSSL is a fork of OpenSSL that is designed to meet Google’s needs. BoringSSL as a whole is not FIPS validated. However, there is a core library (called BoringCrypto) that has been FIPS validated.

FIPS 140-2 compliant Service Mesh Manager TLS settings

Allowed TLS versions

  • TLS v1.2
  • TLS v1.3

Although FIPS 140-2 would allow lower TLS versions under some circumstances, we disabled them for security reasons. TLS 1.0 and 1.1 are out-of-date protocols that do not support modern cryptographic algorithms, and they contain security vulnerabilities that may be exploited by attackers. The IETF is also planning to officially deprecate both protocols. In addition, the vast majority of encrypted Internet traffic is now over TLS 1.2, which was introduced over a decade ago.

Allowed FIPS compatible ciphers

  • ECDHE-RSA-AES128-GCM-SHA256
  • ECDHE-RSA-AES256-GCM-SHA384
  • ECDHE-ECDSA-AES128-GCM-SHA256
  • ECDHE-ECDSA-AES256-GCM-SHA384
  • AES128-GCM-SHA256
  • AES256-GCM-SHA384

There are more ciphers allowed by FIPS 140-2. We only enable GCM ciphers, because only those ciphers can prevent a LUCKY13 timing attack

Allowed Elliptic-curve algorithm

  • P-256

Installation

To install a FIPS-compliant version of Istio, see Install FIPS images.

2.11.5 - Open Port Inventory

A Service Mesh Manager installation requires the following open service ports. Opening only the required ports helps to keep deployment’s attack surface is the as small as possible. Each service is described in YAML format, with the list of all the ports how the service uses them. This helps to understand the risks associated with all the open ports.

Every service is described in a YAML file using the following format:

namespace: smm-system
name: mesh-manager
description: is used by our mesh-manager instance which manages istio operators on a kubernetes cluster
ports:
  - name: https
    number: 443
    use: handle https traffic and queries with tls/ssl

Useful commands

The following commands help you examine the services of your Service Mesh Manager deployment.

List services under smm-system namespace:

kubectl get services -n smm-system

Inspect a particular service, for example, smm-leo:

kubectl describe service smm-leo -n smm-system

Services (namespace-scoped)

cert-manager

The cert-manager namespace contains the following services.

cert-manager

The cert-manager instance of Service Mesh Manager provides Kubernetes certificate management as part of the controller-component. The service uses these ports:

  • tcp-prometheus-servicemonitor (9402): The default port for the cert-manager service.

cert-manager-webhook

The cert-manager instance of Service Mesh Manager uses the service as part of the webhook-component. The service uses these ports:

  • https (443): Incoming http traffic for the webhooks.

istio-system

The istio-system namespace contains the following services.

istio-meshexpansion-cp-v115x

Service Mesh Manager provides multicluster expansion capabilities using this service, which is used for version 1.10 istio mesh-gateway expansion. The service uses these ports:

  • tcp-status-port (15021): Health check (readiness-probe) port for Mesh Overview functionality and troubleshooting istio related issues.

  • tls-istiod (15012): Handles incoming gRPC traffic for accessing to istiod from passive clusters.

  • tls-istiodwebhook (15017): Handles incoming gRPC traffic for the istiod webhooks.

  • tls (15443): Handles incoming TLS traffic from other clusters.

  • tcp-smm-als-tls (50600): Handles incoming TLS traffic to access log services.

  • tcp-smm-zipkin-tls (59411): Default Zipkin port that handles incoming HTTPS traffic for distributed tracing services.

istio-meshexpansion-cp-v115x-external

Service Mesh Manager provides multicluster expansion capabilities using this external service, which is used for version 1.10 istio mesh-gateway expansion. The service uses these ports:

  • tcp-status-port (15021): Health check (readiness-probe) port for Mesh Overview functionality and troubleshooting istio related issues.

  • tls-istiod (15012): Handles incoming gRPC traffic for accessing to istiod from passive clusters.

  • tls-istiodwebhook (15017): Handles incoming gRPC traffic for the istiod webhooks.

  • tls (15443): Handles incoming TLS traffic from other clusters.

  • tcp-smm-als-tls (50600): Handles incoming TLS traffic to access log services.

  • tcp-smm-zipkin-tls (59411): Default Zipkin port that handles incoming HTTPS traffic for distributed tracing services.

istiod-cp-v115x

Used by the Istio 1.10 control plane. The service uses these ports:

  • grpc-xds (15010): Handles gRPC traffic for xds transport protocol, which is used for Envoy discovery services and Istio proxies.

  • https-dns (15012): Handles DNS requests (with TLS) for the Istio service-mesh.

  • https-webhook (443): Handles incoming HTTPS traffic (with TLS) for Istio webhook management.

  • http-monitoring (15014): Handles HTTP requests or queries for monitoring of the traffic management between microservices.

smm-system

The smm-system namespace contains the following services.

istio-operator-v113x

Istio-operator version 1.13, which is the controller of Istio resources, uses this service to manage Istio 1.13 deployments. The service uses these ports:

  • https (443): Handles the incoming HTTPS traffic for the Istio webhook.

istio-operator-v113x-authproxy

RBAC-authenticated endpoints use this service for istio-operator version 1.13. The service uses these ports:

  • https (8443): Authenticated (RBAC or TLS) endpoint for providing access to Prometheus metrics.

istio-operator-v115x

Istio-operator version 1.15, which is the controller of Istio resources, uses this service to manage Istio 1.15 deployments. The service uses these ports:

  • https (443): Handles the incoming HTTPS traffic for the Istio webhook.

istio-operator-v115x-authproxy

RBAC-authenticated endpoints use this service for istio-operator version 1.15. The service uses these ports:

  • https (8443): Authenticated (RBAC or TLS) endpoint for providing access to Prometheus metrics.

mesh-manager

The mesh-manager instance of Service Mesh Manager that manages Istio operators on a Kubernetes cluster uses this service. The service uses these ports:

  • https (443): Handles HTTPS traffic and queries with TLS/SSL.

mesh-manager-authproxy

The mesh-manager instance of Service Mesh Manager acts as an authentication proxy to the mesh-manager service. The service uses these ports:

  • https (8443): Authenticated (RBAC or TLS) endpoint for providing access to Prometheus metrics.

prometheus-node-exporter

The Prometheus instances use this service as an exporter for Kubernetes nodes. The service uses these ports:

  • metrics (19101): Exposes node-level metrics to Prometheus.

prometheus-operated

Used by the Prometheus instances of Service Mesh Manager. The service uses these ports:

  • http (9090): Handles normal HTTP traffic and Prometheus queries.

  • grpc (10901): Default port to handle incoming gRPC traffic.

smm

The Service Mesh Manager uses this service as part of the application-component. The service uses these ports:

  • http (80): Handles GraphQL API traffic and queries.

  • http-metrics (10000): Exports metrics to the Prometheus service.

smm-als

The Service Mesh Manager uses the smm-als service as part of the als-component. The service uses these ports:

  • grpc-als (50600): The container port of the grpc-als container, used for accessing log services.

smm-authentication

The smm-authentication instance verifies that a user has a valid token or certificate to make API calls to the backend service. The service uses these ports:

  • http (80): Handles HTTP traffic and GraphQL queries.

smm-expansion-gw

This is an external service of the mesh expansion gw of the local cluster. It is synced with the cluster registry to provide reachability of smm from the peer clusters even in case of multiple active istio controlplanes. The service uses these ports:

smm-federation-gateway

The smm-federation-gateway instance provides federation to GraphQL services via this service. The service uses these ports:

  • http (80): Handles HTTP traffic and GraphQL queries.

smm-grafana

The Service Mesh Manager uses the Grafana dashboard monitoring service as part of the grafana-component. The service uses these ports:

  • http (3000): Exposes GraphQL web interface and API endpoints over HTTP.

smm-health

is used by our for smm-health instance which includes health controller and exporter and is part of health-component The service uses these ports:

  • http-metrics (8080): export metrics to prometheus service

smm-health-api

The smm-health-api instance uses the service for the GraphQL API. The service uses these ports:

  • http-graphql (80): Handles HTTP traffic and GraphQL queries.

smm-ingressgateway

The service is part of the ingressgateway-component which acts as a loadbalancer, receiving incoming HTTP or TCP connections. The service uses these ports:

  • http2 (80): Handles traffic to the Service Mesh Manager dashboard.

smm-jaeger-agent

Jaeger agents can send traces to the Jaeger collector. The service is part of the tracing-component. The service uses these ports:

  • udp-agent-zipkin (5775): Default UDP port for zipkin-thrift tracing services.

  • udp-agent-compact (6831): Default UDP port for the Jaeger agent endpoint.

  • udp-agent-binary (6832): Default UDP port for the Jaeger agent binary protocol.

smm-jaeger-collector

The Jaeger collector receives traces from Jaeger agents via this service. It is part of the tracing-component. The service uses these ports:

  • tcp-smm-jaeger-collector-tchannel (14267): Default TCP port of tchannel for Jaeger collector.

  • http-jaeger-collector (14268): Default port of Jaeger collector to handle HTTP traffic.

smm-jaeger-query

Service Mesh Manager uses this service to access Jaeger. It is part of the tracing-component. The service uses these ports:

  • http-query (16686): Handles normal HTTP traffic and queries for the tracing services.

smm-kubelet-node-discovery

kubelet uses this service to export metrics. The service uses these ports:

  • https-metrics (10250): kubelet-service default port, used for exporting metrics to Prometheus services over TLS/SSL.

  • http-metrics (10255): kubelet-service default port, used for exporting metrics to Prometheus services without encryption.

  • cadvisor (4194): Port of container advisor that exposes Prometheus out of the box.

smm-kubestatemetrics

Part of the kubestatemetrics-component. The service uses these ports:

  • http-monitoring (42422): Monitoring port for the kube-state-metrics application (HTTP).

  • http-telemetry (15014): Telemetry port for the kube-state-metrics application (HTTP).

smm-leo

Makes cert-manager Istio-aware. It is part of the leo-component. The service uses these ports:

  • http-metrics (8080): Export metrics to Prometheus services.

smm-prometheus

Used for event monitoring and alerting as part of the prometheus-component. The service uses these ports:

  • http (59090): Default port of the Prometheus service for handing HTTP traffic and queries.

smm-prometheus-operator

The smm-prometheus-operator instance which is the controller of the Prometheus application uses this service. The service uses these ports:

  • http (8080): Incoming webhook traffic and Prometheus exporter for operator metrics.

smm-sre-alert-exporter

Used by the smm-sre-alert-exporter instance. The service uses these ports:

  • http-metrics (8080): Used for exporting Prometheus alerting status as metrics.

smm-sre-api

The smm-sre-api instance uses this service for the GraphQL API. The service uses these ports:

  • http-graphql (80): Handles HTTP traffics and GraphQL queries.

smm-sre-controller

Used by the smm-sre instance, as part of the sre-controller-component. The service uses these ports:

  • http-metrics (8080): Exports metrics to the Prometheus service.

smm-tracing

The Service Mesh Manager uses this service as part of the tracing-component. The service uses these ports:

  • http-query (80): Handles the Jaeger user interface and HTTP API traffic.

smm-vm-integration

The vm-integration instance uses the service for the GraphQL API. The service uses these ports:

  • http (8080): Handles GraphQL API traffic and queries.

  • http-metrics (8081): HTTP metrics from the controller.

  • http-istiostate (8082): Metrics mapping istio resources to together.

smm-web

The Service Mesh Manager uses this service as part of the web-component. The service uses these ports:

  • http (80): Serves static web content used by the Service Mesh Manager dashboard.

  • http-downloads (81): Serves the functionaliy for downloading the SMM binary file and is used by Service Mesh Manager dashboard.

smm-zipkin

The Service Mesh Manager uses this service as part of the tracing-component. The service uses these ports:

  • http (59411): Default Zipkin port for handling HTTP traffic and distributed tracing mechanisms.

2.11.6 - Recovery procedure

This document details how to change and revoke all keys related to a Service Mesh Manager installation when the cluster where it is running is compromised. We’ll assume that an attacker could have had access to various Service Mesh Manager related service account tokens or keys.

Disclaimer: This is not a full guide for Kubernetes security recovery, only covers parts related to Service Mesh Manager.

Service Account Tokens

Service account tokens provide access to specific actions related to the Kubernetes API. Service Mesh Manager is using multiple service account tokens to access the Kubernetes API service. If you think some, or all of these are compromised, and an attacker could have gained access, delete these tokens to revoke them, and restart the pods in the Service Mesh Manager related namespaces. Complete the following steps:

  1. Delete all secrets holding service account tokens.

    for namespace in smm-system smm-demo cert-manager cluster-registry istio-system smm-canary smm-registry-access
    do
      kubectl delete secret -n $namespace $(kubectl get secret -n smm-system --field-selector type=kubernetes.io/service-account-token -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}')
    done
    
  2. Restart all pods.

    for namespace in smm-system smm-demo cert-manager cluster-registry istio-system smm-canary smm-registry-access
    do
      kubectl delete pods --all
    done
    

Istio root CAs

Istio root CAs are used to sign certificates provided to workloads in the service mesh. If an attacker gets access to your CA keys, they may be able to set up a man-in-the-middle attack and eavesdrop encrypted network traffic.

Using the default, generated CA

If you are using the default, generated CA, delete the secret that holds the generated root CA and restart all pods in the istio-system namespace.

kubectl delete secret -n istio-system istio-ca-secret
kubectl delete pods -n istio-system --all

After Istiod is restarted, and the CA is regenerated, you restart all pods with a sidecar in the system.

Using your own certificates

If using your own certificates with Istio, revoke them, create new ones, and configure those in the cluster.

This guide doesn’t cover how exactly you’ll need to revoke the certificates, or create new ones. It depends on how the original certificates were created, what is the signing root CA for them, and where else are they used. You’ll probably need to add them to the Certificate Revocation List (CRL) of the issuing Certificate Authority, and follow the issuing Certificate Authority’s guidelines to create a new one.

Once you’ve revoked the CA and generated a new one, set the new CA in Service Mesh Manager.

  1. If you’re storing the CAs in secrets on the cluster, update the contents of the configured secret specified in spec.citadel.caSecretName. To properly set up the secret with the certificates, follow the Istio documentation
  2. If you’ve configured Vault using istiod.ca.vault in the IstioControlPlane CR, update the CAs in Vault.

Once your CA is changed, restart all pods in the istio-system namespace:

kubectl delete pods -n istio-system --all

After Istiod is restarted, restart all pods with a sidecar in the system.

JWT backend key

A JWT backend key is generated during Service Mesh Manager installation to sign and validate JWT tokens for the GraphQL API authentication service. If a malicious attacker acquires this key, they could use it to forge a JWT token and gain unauthorized access to the SMM API using that key to retrieve or even change Kubernetes data.

The JWT backend key is stored in the smm-authentication Kubernetes secret in the smm-system namespace. If you suspect that this key is compromised, delete the Kubernetes secret:

kubectl delete secret -n smm-system smm-authentication

To regenerate a new key:

  • If you’re using the Service Mesh Manager operator, it will automatically regenerate the key and the secret.
  • Otherwise, rerun the smm install command.

IMPS

Service Mesh Manager can hold access key pairs in Kubernetes secrets to be able to access images in ECR (default) or other image registries. It’s always a good idea to restrict the permissions of these keys to image access, but in case a malicious attacker gets access to these secrets, revoke these keypairs, and create new ones instead. To be able to revoke the old keys and create new ones, you should follow the process of the image registry you’re using.

For ECR, you should be able to revoke and recreate credentials on the AWS console, or by contacting your AWS administrator.

To check the credentials configured in the environment, list secrets in the smm-registry-access namespace. Credentials are base64 encoded within the secrets:

kubectl get secret -n smm-registry-access

Once you have new image registry credentials, use the smm registry commands of the Service Mesh Manager CLI to configure them in the cluster.

  • smm registry list lists the currently configured image registry settings.
  • smm registry remove remove the configured registry entry.
  • smm registry add configures your image registry with new access credentials.
  • smm registry reconcile updates the remote cluster’s image pull secrets from the local settings.

2.11.7 - Fetch CA certificate from Vault

Fetching CA certificate overview

Istio uses X.509 certificates to create strong identities to every workload. These certificates are used to authenticate the service-to-service communication in the mesh. The Istio agent of the workload obtains these certificates from istiod and provides them to the Envoy proxy of the workload. istiod acts as a Certificate Authority, and signs the certificates with the signing certificate of the Certificate Authority (CA) certificate.

How Istio provides certificates for the workloads How Istio provides certificates for the workloads

By default, the Certificate Authority has a self-signed root certificate and key and uses them to sign the workload certificates. The private key of the CA is stored in Kubernetes secrets. In production environments these are not recommended, and you should instead:

  • Manage the CA independently from Istio, in a solution that offers strong security protection.
  • Use a production-ready CA, such as Hashicorp Vault.

You can configure Service Mesh Manager to use Vault as a secret store and fetch the signing certificate of the Certificate Authority from a Vault instance. That way you can securely store the CA certificate and its private key outside Istio.

The following high-level procedure shows the main steps of configuring Service Mesh Manager to use Vault.

Note: Service Mesh Manager works with any Vault solution. For the following procedures and examples we used Bank-Vaults, which is an open-source solution to run and manage Vault in Kubernetes.

  1. Configure Vault so Istio (the istiod service account) can access the CA certificate.
  2. Configure Istio to use the CA from Vault. If you are using Bank-Vaults, then the following examples show how to configure Service Mesh Manager to inject the bank-vaults sidecar into Istio, so the istiod can access the CA certificate.
  3. (Optional) Test your setup to verify that it works properly.

Prerequisites

Before configuring Istio and Vault, make sure these prerequisites are met:

  • Make sure you have a Vault instance with CA certificate and private key ready to pull.

  • If you want to use Bank-Vaults to create and manage Vault, install Vault, the Vault operator, and the Vault Secrets Webhook using the Helm charts of the Bank-Vaults project. For details, see the Bank-Vaults documentation).

Configure Vault

Create a policy so that istiod can access the Vault secrets.

To give istiod the access right to Vault secrets, a policy needs to be created for authentication purpose. The Vault is not limited to Bank-Vaults, it can be any other vault solutions. But we will use bank-vaults as an example on how to set the authentication policy:

Edit Vault CR by running kubectl edit vault.

spec:
  caNamespaces:
  - istio-system
  externalConfig:
    auth:
    - roles:
      # Make sure the role name 'istiod' corresponds to the role name we set in Istio
      - bound_service_account_names:
        - istiod-service-account-cp-v115x
        bound_service_account_namespaces:
        - istio-system
        name: istiod
        policies:
        - allow_istiod_secrets
      - bound_service_account_names:
        - default
        - vault-secrets-webhook
        - vault
        bound_service_account_namespaces:
        - default
        name: default
        policies:
        - allow_secrets
        - allow_pki
        ttl: 1h
      type: kubernetes
    policies:
    # Policy created for istiod to read secrets under this specific path
    - name: allow_istiod_secrets
      rules: path "secret/data/pki/istiod" { capabilities = ["read", "list"] }
    - name: allow_secrets
      rules: path "secret/*" { capabilities = ["create", "read", "update", "delete",
        "list"] }
    - name: allow_pki
      rules: path "pki/*" { capabilities = ["create", "read", "update", "delete",
        "list"] }

Note: Remember to add the istio-system namespace under .spec.caNamespaces so that istiod has the required TLS certificate to access Vault.

Configure SMM

In order for istiod pod to access the Vault instance, you need to set values of .istiod.ca.vault.

  1. Collect the parameters you need to configure the Istio custom resource:

    • The address of your Vault instance (For example, https://vault.default:8200).
    • The path to the CA certificate in Vault. Istio will use this certificate to sign the workload certificates with. (For example, vault:secret/data/pki/istiod#certificate).
    • The path to the private key of the certificate in Vault. (For example, vault:secret/data/pki/istiod#privateKey).
    • The name of the role that Istio can use to access the CA certificate and its private key in Vault. (For example, istiod).
  2. Edit the Istio custom resource. You can use the following command to edit the Istio custom resource (adjust the name of the controlplane to your Istio version: kubectl edit istio cp-v115x -n istio-system

  3. Add the parameters of Vault under the .istiod.ca.vault key, for example:

      istiod:
        ca:
          vault:
            address: https://vault.default:8200
            certPath: vault:secret/data/pki/istiod#certificate
            enabled: true
            keyPath: vault:secret/data/pki/istiod#privateKey
            role: istiod
            vaultEnvImage: ghcr.io/banzaicloud/vault-env:1.13.1
    
  4. Set the .istiod.ca.vault.enabled key to true, so that vault-env sidecar will be injected into istiod

Test your setup

If you are using Bank-Vaults and followed the steps on this page, you should see that the vault-env sidecar has been injected into the istiod pod which pulls the certificate from the Vault instance.

Sidecar container injected in istiod Sidecar container injected in istiod

If not, delete the istiod pod and wait for it to restart with the latest Istio configuration.

Additional Info

Add secrets in Vault

For testing purposes, you may need this step-by-step guide to put some secrets in the specified path defiend in Istio CR.

  1. Port-forward into the pod:
    kubectl port-forward vault-0 8200 &
    
  2. Set the address of the Vault instance:
    export VAULT_ADDR=https://127.0.0.1:8200
    
  3. Import the CA certificate and private key of the Vault instance by running the following commands (otherwise, you’ll get x509: certificate signed by unknown authority errors):
    kubectl get secret vault-tls -o jsonpath="{.data.ca\.crt}" | base64 --decode > $PWD/vault-ca.crt
    kubectl get secret vault-tls -o jsonpath="{.data.ca\.key}" | base64 --decode > $PWD/vault-ca.key
    export VAULT_CACERT=$PWD/vault-ca.crt
    
  4. To authenticate to Vault, you can access its root token by running:
    export VAULT_TOKEN=$(kubectl get secrets vault-unseal-keys -o jsonpath={.data.vault-root} | base64 --decode)
    
  5. Now you can interact with Vault and add a secret by running:
    vault kv put secret/pki/istiod privateKey=@vault-ca.key certificate=@vault-ca.crt 
    
  6. To verify the secret has been added to Vault:
    vault kv get secret/pki/istiod
    

2.12 - Frequently Asked Questions

2.12.1 - Calisti Service Mesh Manager

What is Service Mesh Manager?

Service Mesh Manager helps you to confidently scale your microservices over single- and multi-cluster environments and to make daily operational routines standardized and more efficient. The componentization and scaling of modern applications inevitably leads to a number of optimization and management issues:

  • How do you spot bottlenecks? Are all components functioning correctly?
  • How are connections between components secured?
  • How does one reliably upgrade service components?

Service Mesh Manager helps you accomplish these tasks and many others in a simple and scalable way, by leveraging the Istio service mesh and building many automations around it. Our tag-line for the product captures this succinctly:

Service Mesh Manager operationalizes the service mesh to bring deep observability, convenient management, and policy-based security to modern container-based applications.

What are the key features?

  • Service Mesh Manager not only handles the automated installation, operation and upgrade of service mesh infrastructure, but also provides a rich, high-level, multi-modal user experience that eliminates the complexity associated with service meshes.
  • High-level functionality, such as deep observability, Zero-Trust security, canary deployments, traffic routing, ingress / egress exposure, or fault injection can be conveniently managed and visualized through its user interface.
  • Service Mesh Manager’s automation engine reduces the risk inherent in the performance of complex tasks such as canary upgrades of microservice components, thereby cutting operational risk and cost.
  • The system provides a detailed real-time dashboard for debugging.

What does the Service Mesh Manager architecture look like?

Service Mesh Manager architecture

Why is Service Mesh Manager using Istio?

Istio is still the most feature complete and mature service mesh solution by far. It may have its shortcomings, especially around complexity, but it has a great community around it that continuously works towards making it better. We also aim to solve some of these problems with Service Mesh Manager. One of the main use cases of Service Mesh Manager is the ability to connect multiple clusters even across different networks, and Istio has several flexible topologies for different use cases to achieve this.

What is the Cisco Istio operator?

We developed the open source Cisco Istio operator to solve the first tier of problems related to the installation, management and upgrade of the Istio infrastructure components. The operator continuously reconciles the state of the Istio components to keep them healthy, and facilitates multi-cluster federation. We offer community and paid support for the Istio operator.

Should I use Service Mesh Manager or the Istio operator?

The Cisco Istio operator is an open-source component of the commercial Service Mesh Manager product. In addition to the Cisco Istio operator, Service Mesh Manager:

  • includes a battle-hardened Istio distribution,
  • installs and manages the observability infrastructure, including Prometheus, Grafana, Jaeger
  • provides a UI (Web UI, CLI, API) for developers and ops to easily observe and configure all the service mesh components
  • picks up user roles from native Kubernetes RBAC
  • provides UI-based automation to carry out complex management tasks such as canary upgrades, traffic routing, and so on.

All Service Mesh Manager features work in multi-cluster configurations as well, and a unified cross-cluster application view is provided.

How do I integrate Service Mesh Manager with my application?

After you’ve installed Service Mesh Manager, and want to put your application in the mesh, you need to inject a sidecar in the pods of your application. You can do that manually, or by enabling automatic injection for your namespaces, and restarting your pods. While in theory it’s usually that simple, we know that in practice an application can have some problems running a sidecar, and won’t behave the same anymore. We have a deep domain knowledge of Istio and have seen a lot of these problems. When integrating your application, we can help you overcome these issues.

What’s the overhead of Service Mesh Manager?

Most of the overhead of Service Mesh Manager is coming from Istio itself, and it’s there in two different layers.

  • First, it has some CPU and memory resource requirements. It needs to have a control plane running in a cluster that handles the discovery of services, injects sidecars to pods, pushes down configuration to them, and manages certificates for handling service-to-service security.
  • The sidecars themselves also consume some CPU and memory. If the mesh is configured properly, this overhead shouldn’t be significant.
  • The second layer of the overhead appears in network requests. Because all traffic flows through Envoy proxies, it means 2 additional hops for every request, and that adds some minimal latency. Other than for a few very latency-critical applications, this shouldn’t be significant, but see latency overheads for details.

Should I worry about latency overheads?

In general, no. There is some latency overhead added for every request because of the sidecar proxies, but if the mesh is configured properly it shouldn’t be more than a few milliseconds. Per Istio’s own measurements, with 16 concurrent connections and 1000 RPS, Istio adds 3ms over the baseline (P50) when a request travels through both a client and server proxy. At 64 concurrent connections, Istio adds 7ms over the baseline, with Mixer disabled. There could be some latency critical applications where it matters, but for most apps it won’t make a difference.

How does Service Mesh Manager keep my mesh healthy?

Service Mesh Manager provides a few handy features to keep a mesh healthy. The most important of these is the mesh validation feature. Other than doing basic validation of Istio configuration, Service Mesh Manager analyses the whole mesh state and tries to find ambiguous or invalid configs. For example, a label selector that points to an invalid service, or there is some shadowed or ambiguous routing config present.

Service Mesh Manager also provides debugging features like tapping an Envoy proxy and analyzing requests. You can also keep track of real-time metrics on the dashboard and check if your latency or error rate values are increasing.

Is this a new abstraction layer over Istio?

No, we’ve designed Service Mesh Manager in a way that it doesn’t add a new abstraction layer. We thought that Istio is complicated enough in itself and it wouldn’t do any good introducing a few new CRDs. Service Mesh Manager can help you configure your mesh through a CLI or the dashboard, but those commands are always translated to plain old Istio CRs. Doing it this way enables Service Mesh Manager to be completely compatible with all Istio configuration changes. If you write Istio config directly, Service Mesh Manager will still be able to detect it, display it, and validate it properly.

Does Service Mesh Manager support GitOps?

Yes. Since there is no additional abstraction layer involved, Service Mesh Manager is able to interpret your Istio configurations. If your virtual services, service entries, and other Istio resources are deployed through a CI/CD flow, Service Mesh Manager will instantly parse them and display your configuration on the dashboard.

2.12.2 - Service mesh FAQ

What is a service mesh?

Service mesh is a software layer used for handling all communications between services. It is independent of each service’s code so that it can work with multiple service management systems and across network boundaries without a problem. Its new features connect and manage connections between services effortlessly.

What problem does a service mesh solve?

By enabling independence between applications and infrastructure, containers facilitated a shift in architectures from monolithic to microservice. This came with a multitude of challenges. Container orchestration tools solved deploy issues and microservices build, but many runtime challenges were left unaddressed. A service mesh offers solutions for these runtime issues by providing a bundling of capabilities like security, policy configuration, ingress and egress control, load balancing, distributed tracing, traffic shaping, or metrics collection.

Can I use my existing Istio deployment with Service Mesh Manager?

Yes, if you are already using the Cisco Istio operator. If not, we can help you migrate to Service Mesh Manager from existing Istio installations. It is a manual process, where the mesh configuration needs to be migrated to match the Istio operator’s custom resources.

Are you using upstream Istio?

Our Istio distribution is very close to upstream Istio, but contains a few stability fixes and enhancements, especially around multi-cluster topologies and telemetry. For a detailed list of changes compared to upstream Istio, see Istio distribution.

Do I have to change my applications to use Istio?

In most cases you don’t need to change anything. But we have experience with putting a lot of different applications in Istio, and know that sometimes there are special cases when an application doesn’t handle having a sidecar well. It could be some special HTTP headers, or mTLS configuration that conflicts with an Envoy sidecar. In these cases there could be some slight changes involved and we can help you solve these kind of issues.

Do Service Mesh Manager and Streaming Data Manager use the same mesh?

Currently Service Mesh Manager and Streaming Data Manager use separate service meshes with separate control planes. The Streaming Data Manager service mesh is used only for the Apache Kafka brokers and the control-plane services of Streaming Data Manager. They are tied together in the sense that they are managed by the same Istio operator and use the same version of Istio.

Note that currently you cannot manage the Streaming Data Manager service mesh from the Service Mesh Manager UI, only from the command line.

2.12.3 - Observability and debugging

What can I observe?

One of the main goals of Service Mesh Manager is to give you an overview of your service mesh. You’ll see the topology of the services running in the mesh with real-time monitoring information of

  • error rate,
  • RPS,
  • throughput, and
  • latency.

You also get one-click access to distributed tracing with Jaeger, and Grafana dashboards if you want to further explore metrics provided by the service mesh. Service Mesh Manager completes the service mesh metrics with a drill-down view of your services and workloads from their mesh configuration to pod and node-level info and metrics of resource utilization.

Service Mesh Manager observability

How do you help me debug my services?

A lot of different features exist in Service Mesh Manager that help debugging your services. Usually you start by checking real-time error rates and latency values on the topology view, then go on with mesh validations, and the drill-down view of a service or workload. You also have 1-click access to Jaeger and Grafana dashboards if you want to further explore your traces and metrics. If you need to check requests flowing through an Envoy proxy, Service Mesh Manager provides you a tapping feature to see access logs, or a detailed view of the requests.

2.12.4 - Security and compliance

How are my services secured?

Service Mesh Manager uses the mutual TLS feature of Istio for service-to-service authentication and traffic encryption. In Service Mesh Manager, you can manage mTLS settings between services with the CLI or on the UI, mesh-wide, namespace-wide, and on the service-specific level.

Does Service Mesh Manager use its own authentication system?

No, Service Mesh Manager leverages Kubeconfig, the official client libraries, and the Kubernetes API to perform authentication and authorization for its users.

If you’re allowed to add, edit, or delete specific Istio custom resources, you’ll have the same permissions from Service Mesh Manager as well.

The Service Mesh Manager installer provides a way - mainly for demo/tryout purposes - to disable user authentication and use its own service account token for all communication with the Kubernetes API server.

What’s the story on access and visibility control?

By default, authentication is needed to access Service Mesh Manager UI. The observability features are granted for every authenticated users, the control features allowance is based on the authenticated user’s RBAC permissions.

2.12.5 - Multi-cluster support

What do you mean by multi-cluster support?

Service Mesh Manager helps you manage multi-cluster service meshes in three different layers.

  • First, multi-cluster meshes can easily be built using the Service Mesh Manager CLI, avoiding the need to manually manage complex Istio configurations on all of the clusters.
  • Second, our Istio distribution contains important changes from upstream Istio to collect cluster-aware metrics.
  • Third, multi-cluster support is natively built in the Service Mesh Manager dashboard and CLI. They are able to display and seamlessly manage services across clusters with a shared Istio control plane.

Can you add clusters dynamically?

Yes, attaching and detaching clusters from a service mesh can easily be done through the Service Mesh Manager CLI. These CLI commands are backed by the Istio operator that manages remote clusters through Kubernetes custom resources and secrets that hold the Kubeconfigs of those clusters.

What are some key multi-cluster use-cases?

Perhaps the most common use case for a multi-cluster service mesh is to connect on-premises and cloud environments easily. For example, using a multi-cluster mesh you can securely connect your cloud services to the legacy services running in on-premises clusters.

Public clouds are also often used to scale out from an on-premises datacenter during particular events when your services need to handle an increased load.

Some common load balancing and high availability patterns can easily be implemented using a multi-cluster mesh as well. You can have multiple clusters in different regions using locality-based load balancing, and driving traffic to another region during a failure event in a specific region.

Why does the istio injection label disappear from a namespace on a remote cluster?

Service Mesh Manager synchronizes the istio injection labels for all namespaces from the cluster where Service Mesh Manager is installed to all other remote clusters in the mesh. That way you can add (or remove) the istio injection label only on the cluster where Service Mesh Manager is installed, and Service Mesh Manager automatically adds (or removes) namespace labels on every cluster in the mesh.

If you see disappearing istio injection labels from namespaces on remote clusters, it is because:

  • the namespace does not exist, or
  • the cluster where Service Mesh Manager is installed does not have the istio injection label on the namespace. The solution is to create the namespace in that cluster and add the label there. Refer to this link for more info on how to Deploy custom application in a multi-cluster setup.

2.12.6 - Licensing

Is there a free version?

Yes, you can use Service Mesh Manager after a free registration on a limited number of nodes. For details, see Licensing options.

How can I check how many nodes I use?

The easiest way is to open the dashboard, select the user account in the top right, then select License.

Displaying the license usage

Can I buy commercial support?

Yes, you can buy pro and enterprise licenses that include commercial support. For details, see our pricing page, or contact your Cisco sales representative.

2.13 - Reference

2.13.1 - API

Service Mesh Manager provides a powerful GraphQL API to serve the Service Mesh Manager UI, the CLI or any other client. You can access the GraphQL console from your Service Mesh Manager dashboard under the /api/graphql/ address.

2.13.2 - CRD

The following sections contain the reference documentation of the various custom resource definitions (CRDs) that are specific to Service Mesh Manager.

2.13.2.1 - ControlPlane CRD schema reference (group smm.cisco.com)

ControlPlane is the Schema for the controlplanes API

ControlPlane

ControlPlane is the Schema for the controlplanes API

Full name:
controlplanes.smm.cisco.com
Group:
smm.cisco.com
Singular name:
controlplane
Plural name:
controlplanes
Scope:
Cluster
Versions:
v1alpha1

Version v1alpha1

Properties

.apiVersion

string

APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources

.kind

string

Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds

.metadata

object

.spec

object

ControlPlaneSpec defines the desired state of ControlPlane

.spec.affinity

object

Affinity is a group of affinity scheduling rules.

.spec.affinity.nodeAffinity

object

Describes node affinity scheduling rules for the pod.

.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution

array

The scheduler will prefer to schedule pods to nodes that satisfy the affinity expressions specified by this field, but it may choose a node that violates one or more of the expressions. The node that is most preferred is the one with the greatest sum of weights, i.e. for each node that meets all of the scheduling requirements (resource request, requiredDuringScheduling affinity expressions, etc.), compute a sum by iterating through the elements of this field and adding “weight” to the sum if the node matches the corresponding matchExpressions; the node(s) with the highest sum are the most preferred.

.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution[*]

object

An empty preferred scheduling term matches all objects with implicit weight 0 (i.e. it’s a no-op). A null preferred scheduling term matches no objects (i.e. is also a no-op).

.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].preference

object Required

A node selector term, associated with the corresponding weight.

.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].preference.matchExpressions

array

A list of node selector requirements by node’s labels.

.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].preference.matchExpressions[*]

object

A node selector requirement is a selector that contains values, a key, and an operator that relates the key and values.

.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].preference.matchExpressions[*].key

string Required

The label key that the selector applies to.

.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].preference.matchExpressions[*].operator

string Required

Represents a key’s relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.

.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].preference.matchExpressions[*].values

array

An array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer. This array is replaced during a strategic merge patch.

.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].preference.matchExpressions[*].values[*]

string

.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].preference.matchFields

array

A list of node selector requirements by node’s fields.

.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].preference.matchFields[*]

object

A node selector requirement is a selector that contains values, a key, and an operator that relates the key and values.

.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].preference.matchFields[*].key

string Required

The label key that the selector applies to.

.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].preference.matchFields[*].operator

string Required

Represents a key’s relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.

.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].preference.matchFields[*].values

array

An array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer. This array is replaced during a strategic merge patch.

.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].preference.matchFields[*].values[*]

string

.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].weight

integer Required

Weight associated with matching the corresponding nodeSelectorTerm, in the range 1-100.

.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution

object

If the affinity requirements specified by this field are not met at scheduling time, the pod will not be scheduled onto the node. If the affinity requirements specified by this field cease to be met at some point during pod execution (e.g. due to an update), the system may or may not try to eventually evict the pod from its node.

.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms

array Required

Required. A list of node selector terms. The terms are ORed.

.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[*]

object

A null or empty node selector term matches no objects. The requirements of them are ANDed. The TopologySelectorTerm type implements a subset of the NodeSelectorTerm.

.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[*].matchExpressions

array

A list of node selector requirements by node’s labels.

.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[*].matchExpressions[*]

object

A node selector requirement is a selector that contains values, a key, and an operator that relates the key and values.

.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[*].matchExpressions[*].key

string Required

The label key that the selector applies to.

.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[*].matchExpressions[*].operator

string Required

Represents a key’s relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.

.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[*].matchExpressions[*].values

array

An array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer. This array is replaced during a strategic merge patch.

.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[*].matchExpressions[*].values[*]

string

.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[*].matchFields

array

A list of node selector requirements by node’s fields.

.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[*].matchFields[*]

object

A node selector requirement is a selector that contains values, a key, and an operator that relates the key and values.

.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[*].matchFields[*].key

string Required

The label key that the selector applies to.

.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[*].matchFields[*].operator

string Required

Represents a key’s relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.

.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[*].matchFields[*].values

array

An array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer. This array is replaced during a strategic merge patch.

.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[*].matchFields[*].values[*]

string

.spec.affinity.podAffinity

object

Describes pod affinity scheduling rules (e.g. co-locate this pod in the same node, zone, etc. as some other pod(s)).

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution

array

The scheduler will prefer to schedule pods to nodes that satisfy the affinity expressions specified by this field, but it may choose a node that violates one or more of the expressions. The node that is most preferred is the one with the greatest sum of weights, i.e. for each node that meets all of the scheduling requirements (resource request, requiredDuringScheduling affinity expressions, etc.), compute a sum by iterating through the elements of this field and adding “weight” to the sum if the node has pods which matches the corresponding podAffinityTerm; the node(s) with the highest sum are the most preferred.

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*]

object

The weights of all of the matched WeightedPodAffinityTerm fields are added per-node to find the most preferred node(s)

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm

object Required

Required. A pod affinity term, associated with the corresponding weight.

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.labelSelector

object

A label query over a set of resources, in this case pods.

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.labelSelector.matchExpressions

array

matchExpressions is a list of label selector requirements. The requirements are ANDed.

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.labelSelector.matchExpressions[*]

object

A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.labelSelector.matchExpressions[*].key

string Required

key is the label key that the selector applies to.

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.labelSelector.matchExpressions[*].operator

string Required

operator represents a key’s relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist.

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.labelSelector.matchExpressions[*].values

array

values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch.

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.labelSelector.matchExpressions[*].values[*]

string

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.labelSelector.matchLabels

object

matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is “key”, the operator is “In”, and the values array contains only “value”. The requirements are ANDed.

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaceSelector

object

A label query over the set of namespaces that the term applies to. The term is applied to the union of the namespaces selected by this field and the ones listed in the namespaces field. null selector and null or empty namespaces list means “this pod’s namespace”. An empty selector ({}) matches all namespaces.

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaceSelector.matchExpressions

array

matchExpressions is a list of label selector requirements. The requirements are ANDed.

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaceSelector.matchExpressions[*]

object

A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaceSelector.matchExpressions[*].key

string Required

key is the label key that the selector applies to.

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaceSelector.matchExpressions[*].operator

string Required

operator represents a key’s relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist.

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaceSelector.matchExpressions[*].values

array

values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch.

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaceSelector.matchExpressions[*].values[*]

string

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaceSelector.matchLabels

object

matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is “key”, the operator is “In”, and the values array contains only “value”. The requirements are ANDed.

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaces

array

namespaces specifies a static list of namespace names that the term applies to. The term is applied to the union of the namespaces listed in this field and the ones selected by namespaceSelector. null or empty namespaces list and null namespaceSelector means “this pod’s namespace”.

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaces[*]

string

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.topologyKey

string Required

This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching the labelSelector in the specified namespaces, where co-located is defined as running on a node whose value of the label with key topologyKey matches that of any node on which any of the selected pods is running. Empty topologyKey is not allowed.

.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].weight

integer Required

weight associated with matching the corresponding podAffinityTerm, in the range 1-100.

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution

array

If the affinity requirements specified by this field are not met at scheduling time, the pod will not be scheduled onto the node. If the affinity requirements specified by this field cease to be met at some point during pod execution (e.g. due to a pod label update), the system may or may not try to eventually evict the pod from its node. When there are multiple elements, the lists of nodes corresponding to each podAffinityTerm are intersected, i.e. all terms must be satisfied.

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*]

object

Defines a set of pods (namely those matching the labelSelector relative to the given namespace(s)) that this pod should be co-located (affinity) or not co-located (anti-affinity) with, where co-located is defined as running on a node whose value of the label with key matches that of any node on which a pod of the set of pods is running

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].labelSelector

object

A label query over a set of resources, in this case pods.

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].labelSelector.matchExpressions

array

matchExpressions is a list of label selector requirements. The requirements are ANDed.

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].labelSelector.matchExpressions[*]

object

A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].labelSelector.matchExpressions[*].key

string Required

key is the label key that the selector applies to.

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].labelSelector.matchExpressions[*].operator

string Required

operator represents a key’s relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist.

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].labelSelector.matchExpressions[*].values

array

values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch.

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].labelSelector.matchExpressions[*].values[*]

string

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].labelSelector.matchLabels

object

matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is “key”, the operator is “In”, and the values array contains only “value”. The requirements are ANDed.

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].namespaceSelector

object

A label query over the set of namespaces that the term applies to. The term is applied to the union of the namespaces selected by this field and the ones listed in the namespaces field. null selector and null or empty namespaces list means “this pod’s namespace”. An empty selector ({}) matches all namespaces.

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].namespaceSelector.matchExpressions

array

matchExpressions is a list of label selector requirements. The requirements are ANDed.

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].namespaceSelector.matchExpressions[*]

object

A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].namespaceSelector.matchExpressions[*].key

string Required

key is the label key that the selector applies to.

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].namespaceSelector.matchExpressions[*].operator

string Required

operator represents a key’s relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist.

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].namespaceSelector.matchExpressions[*].values

array

values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch.

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].namespaceSelector.matchExpressions[*].values[*]

string

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].namespaceSelector.matchLabels

object

matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is “key”, the operator is “In”, and the values array contains only “value”. The requirements are ANDed.

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].namespaces

array

namespaces specifies a static list of namespace names that the term applies to. The term is applied to the union of the namespaces listed in this field and the ones selected by namespaceSelector. null or empty namespaces list and null namespaceSelector means “this pod’s namespace”.

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].namespaces[*]

string

.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[*].topologyKey

string Required

This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching the labelSelector in the specified namespaces, where co-located is defined as running on a node whose value of the label with key topologyKey matches that of any node on which any of the selected pods is running. Empty topologyKey is not allowed.

.spec.affinity.podAntiAffinity

object

Describes pod anti-affinity scheduling rules (e.g. avoid putting this pod in the same node, zone, etc. as some other pod(s)).

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution

array

The scheduler will prefer to schedule pods to nodes that satisfy the anti-affinity expressions specified by this field, but it may choose a node that violates one or more of the expressions. The node that is most preferred is the one with the greatest sum of weights, i.e. for each node that meets all of the scheduling requirements (resource request, requiredDuringScheduling anti-affinity expressions, etc.), compute a sum by iterating through the elements of this field and adding “weight” to the sum if the node has pods which matches the corresponding podAffinityTerm; the node(s) with the highest sum are the most preferred.

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*]

object

The weights of all of the matched WeightedPodAffinityTerm fields are added per-node to find the most preferred node(s)

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm

object Required

Required. A pod affinity term, associated with the corresponding weight.

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.labelSelector

object

A label query over a set of resources, in this case pods.

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.labelSelector.matchExpressions

array

matchExpressions is a list of label selector requirements. The requirements are ANDed.

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.labelSelector.matchExpressions[*]

object

A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.labelSelector.matchExpressions[*].key

string Required

key is the label key that the selector applies to.

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.labelSelector.matchExpressions[*].operator

string Required

operator represents a key’s relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist.

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.labelSelector.matchExpressions[*].values

array

values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch.

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.labelSelector.matchExpressions[*].values[*]

string

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.labelSelector.matchLabels

object

matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is “key”, the operator is “In”, and the values array contains only “value”. The requirements are ANDed.

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaceSelector

object

A label query over the set of namespaces that the term applies to. The term is applied to the union of the namespaces selected by this field and the ones listed in the namespaces field. null selector and null or empty namespaces list means “this pod’s namespace”. An empty selector ({}) matches all namespaces.

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaceSelector.matchExpressions

array

matchExpressions is a list of label selector requirements. The requirements are ANDed.

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaceSelector.matchExpressions[*]

object

A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaceSelector.matchExpressions[*].key

string Required

key is the label key that the selector applies to.

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaceSelector.matchExpressions[*].operator

string Required

operator represents a key’s relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist.

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaceSelector.matchExpressions[*].values

array

values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch.

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaceSelector.matchExpressions[*].values[*]

string

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaceSelector.matchLabels

object

matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is “key”, the operator is “In”, and the values array contains only “value”. The requirements are ANDed.

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaces

array

namespaces specifies a static list of namespace names that the term applies to. The term is applied to the union of the namespaces listed in this field and the ones selected by namespaceSelector. null or empty namespaces list and null namespaceSelector means “this pod’s namespace”.

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.namespaces[*]

string

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].podAffinityTerm.topologyKey

string Required

This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching the labelSelector in the specified namespaces, where co-located is defined as running on a node whose value of the label with key topologyKey matches that of any node on which any of the selected pods is running. Empty topologyKey is not allowed.

.spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[*].weight

integer Required

weight associated with matching the corresponding podAffinityTerm, in the range 1-100.

.spec.affinity.podAntiAffinity.requiredDuringSchedulingIgnoredDuringExecution