Scale SMM Control Plane
Service Mesh Manager follows the microservice architecture pattern. This document details what are the scaling properties of the different microservices Service Mesh Manager is built on.
These services (by default) are installed into the smm-system
namespace, thus the procedures described in Scale a specific Workload apply to them.
This document omits a few services as they have minimal resource requirements even for large-scale deployments, thus no tuning is necessary for those.
API services
To provide the dashboard functionality, Service Mesh Manager relies on a set of GraphQL API servers. These servers are only used when the dashboard is being used.
Their Memory usage scales linearly with the number of Workloads, Services, and other Kubernetes objects. Since they cache some Kubernetes objects, we recommend setting their Memory limits based on actual measurements specific to the current workload in the cluster.
Note: As Service Mesh Manager can be used to monitor Workloads that are not part of the Istio mesh, the resource utilization of these services depends on the total number of Kubernetes Workloads, not just the Istio-enabled ones.
Component | Usage | Resource setting in ControlPlane |
---|---|---|
smm-health-api |
Provides Health scores on the dashboard | .spec.smm.health.api.resources |
smm-sre-api |
Provides SLO access on the dashboard | .spec.smm.sre.api.resources |
smm |
Provides Istio management | .spec.smm.application.resources |
smm-federation-gateway |
Aggregates the API server’s APIs | .spec.smm.federationGateway.resources |
For example, to set the resource requirements of smm-health-api
, run the following commands:
cat > change-health-resources.yaml <<EOF
spec:
smm:
health:
api:
resources:
requests:
cpu: 500m
memory: "1500M"
limits:
cpu: "1"
memory: "2000M"
EOF
kubectl patch controlplane --type=merge --patch "$(cat change-health-resources.yaml )" smm
- If you are using Service Mesh Manager in Operator Mode, then the Istio deployment is updated automatically.
- If you are using the imperative mode, run the
smm operator reconcile
command to apply the changes.
Health controller
The health controller is responsible for collecting outlier detection data for all Services and Workloads in the Cluster running Service Mesh Manager. The health controller is implemented in a way that it cannot scale horizontally. Use the Service Mesh Manager dashboard to find out the right CPU and Memory requirements for this component.
The health controller’s resource requirements can be set in the ControlPlane
CR’s .spec.smm.health.controller.resources
key as shown in the API services example.
Note: The health controller increases the resource usage of Prometheus by approximately 30%. You can disable the outlier detection system using the
.spec.smm.health.enabled
setting.
SRE Controller
The SRE controller is responsible for SLO measurement and alerting. The SRE controller is implemented in a way that it cannot scale horizontally. Use the Service Mesh Manager dashboard to find out the right CPU and Memory requirements for this component.
The SRE controller’s resource requirements can be set in the ControlPlane
CR’s .spec.smm.sre.controller.resources
key as shown in the api services example.
Another component belonging to the alerting subsystem is the smm-sre-alert-exporter
that helps Service Mesh Manager visualizing the historical alerting data. This service has small footprint, but if it needs to be scaled it can be done using the .spec.smm.sre.alertExporter.resources
settings.
Other components
As you can see from the list of Pods running in the smm-system
namespace, Service Mesh Manager uses other components as well. Refer to the definition of the ControlPlane resource to check where to set the resource requirements of those parts.
For details, see The ControlPlane Custom Resource
You can get the current CRD by running the following command:
kubectl get crd controlplanes.smm.cisco.com -o yaml