Flagger Canary
Flagger is Progressive Delivery Operator for Kubernetes that is designed to give developers confidence in automating production releases with progressive delivery techniques.
The benefit of using Canary releases is its ability to do capacity testing of the new version in a production environment with a safe rollback strategy if issues are found. It reduces the risk of new software versions in production by gradually shifting traffic to the new version while measuring traffic metrics and running rollout tests.
Flagger can run automated application testing for the following deployment strategies:
- Canary (progressive traffic shifting)
- A/B testing (HTTP headers and cookie traffic routing)
- Blue/Green (Traffic switching mirroring)
The following example shows how to integrate Flagger with Service Mesh Manager to observe Progressive delivery on the Service Mesh Manager dashboard.
To demonstrate this, you will learn how to configure and deploy podinfo
application for Blue/Green traffic mirror testing, upgrade its version and watch the Canary release on the Service Mesh Manager dashboard.
Setting up Flagger with Service Mesh Manager
-
Deploy Flagger into the
smm-system
namespace and connect it to Istio and Prometheus at the Service Mesh Manager Prometheus address as shown in the following command:Note: The Prometheus metrics service is hosted at http://smm-prometheus.smm-system.svc.cluster.local:59090/prometheus
kubectl apply -f https://raw.githubusercontent.com/fluxcd/flagger/main/artifacts/flagger/crd.yaml helm repo add flagger https://flagger.app helm upgrade -i flagger flagger/flagger \ --namespace=smm-system \ --set crd.create=false \ --set meshProvider=istio \ --set metricsServer=http://smm-prometheus.smm-system.svc.cluster.local:59090/prometheus
-
Make sure you see the following log message for successful flagger operator deployment in your Service Mesh Manager cluster:
kubectl -n smm-system logs deployment/flagger
Expected output:
{"level":"info","ts":"2022-01-25T19:45:02.333Z","caller":"flagger/main.go:200","msg":"Connected to metrics server http://smm-prometheus.smm-system.svc.cluster.local:59090/prometheus"}
At this point flagger is integrated with Service Mesh Manager. You can now deploy your own applications to be used for Progressive Delivery.
Podinfo example with Flagger
Next let’s try out an example from Flagger docs.
-
Create the “test” namespace and enable
sidecar-proxy auto-inject on
for this namespace (use the smm binary downloaded from the Service Mesh Manager download page).Then deploy the “podinfo” target image that needs to be enabled for canary deployment for load testing during automated canary promotion:
kubectl create ns test smm sidecar-proxy auto-inject on test kubectl apply -k https://github.com/fluxcd/flagger//kustomize/podinfo
-
Create IstioMeshGateway service:
kubectl apply -f - << EOF apiVersion: servicemesh.cisco.com/v1alpha1 kind: IstioMeshGateway metadata: annotations: banzaicloud.io/related-to: istio-system/cp-v115x labels: app: test-imgw-app istio.io/rev: cp-v115x.istio-system name: test-imgw namespace: test spec: deployment: podMetadata: labels: app: test-imgw-app istio: ingressgateway istioControlPlane: name: cp-v115x namespace: istio-system service: ports: - name: http port: 80 protocol: TCP targetPort: 8080 type: LoadBalancer type: ingress EOF
-
Add Port and Hosts for IstioMeshGateway using the following gateway configuration.
kubectl apply -f - << EOF apiVersion: networking.istio.io/v1alpha3 kind: Gateway metadata: name: public-gateway namespace: test spec: selector: app: test-imgw-app gateway-name: test-imgw gateway-type: ingress istio.io/rev: cp-v115x.istio-system servers: - port: number: 80 name: http protocol: HTTP hosts: - "*" EOF
-
Create a Canary custom resource.
-
Wait until Flagger initializes the deployment and sets up a VirtualService for podinfo.
kubectl -n smm-system logs deployment/flagger -f
Expected:
{"level":"info","ts":"2022-01-25T19:54:42.528Z","caller":"controller/events.go:33","msg":"Initialization done! podinfo.test","canary":"podinfo.test"}
-
Get the Ingress IP from IstioMeshGateway:
export INGRESS_IP=$(kubectl get istiomeshgateways.servicemesh.cisco.com -n test test-imgw -o jsonpath='{.status.GatewayAddress[0]}') echo $INGRESS_IP
The output should be an IP address, for example:
34.82.47.210
-
Verify that podinfo is reachable from external IP address by running
curl http://$INGRESS_IP/
The output should be similar to:
{ "hostname": "podinfo-96c5c65f6-l7ngc", "version": "6.0.0", "revision": "", "color": "#34577c", "logo": "https://raw.githubusercontent.com/stefanprodan/podinfo/gh-pages/cuddle_clap.gif", "message": "greetings from podinfo v6.0.0", "goos": "linux", "goarch": "amd64", "runtime": "go1.16.5", "num_goroutine": "8", "num_cpu": "4" }
-
Send traffic to the ingress IP. For this setup we will use the hey traffic generator. On macOS, you can install it from the brew package manager:
brew install hey
You can send traffic from any terminal where the IP address is reachable. This command sends curl requests for 30 mins from two threads, each with 10 requests per second:
hey -z 30m -q 10 -c 2 http://$INGRESS_IP/
On the Service Mesh Manager dashboard, select MENU > TOPOLOGY, and select the test namespace to see the generated traffic.
Upgrade Image version
The current pod version is v6.0.0
, update it to the next version.
-
Upgrade the target image to the new version and watch the canary functionality on the Service Mesh Manager dashboard.
kubectl -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:6.1.0
Expected output:
deployment.apps/podinfo image updated
You can check the logs as flagger tests and promotes the new version:
{"msg":"New revision detected! Scaling up podinfo.test","canary":"podinfo.test"} {"msg":"Starting canary analysis for podinfo.test","canary":"podinfo.test"} {"msg":"Advance podinfo.test canary weight 20","canary":"podinfo.test"} {"msg":"Advance podinfo.test canary weight 40","canary":"podinfo.test"} {"msg":"Advance podinfo.test canary weight 60","canary":"podinfo.test"} {"msg":"Advance podinfo.test canary weight 80","canary":"podinfo.test"} {"msg":"Copying podinfo.test template spec to podinfo-primary.test","canary":"podinfo.test"} {"msg":"HorizontalPodAutoscaler podinfo-primary.test updated","canary":"podinfo.test"} {"msg":"Routing all traffic to primary","canary":"podinfo.test"} {"msg":"Promotion completed! Scaling down podinfo.test","canary":"podinfo.test"}
-
Check Canaries status by running the
kubectl get canaries -n test -o wide
command. The output should be similar to:NAME STATUS WEIGHT FAILEDCHECKS INTERVAL MIRROR STEPWEIGHT STEPWEIGHTS MAXWEIGHT LASTTRANSITIONTIME podinfo Initializing 0 0 30s 20 80 2022-04-11T21:25:31Z .. NAME STATUS WEIGHT FAILEDCHECKS INTERVAL MIRROR STEPWEIGHT STEPWEIGHTS MAXWEIGHT LASTTRANSITIONTIME podinfo Initialized 0 0 30s 20 80 2022-04-11T21:26:03Z .. NAME STATUS WEIGHT FAILEDCHECKS INTERVAL MIRROR STEPWEIGHT STEPWEIGHTS MAXWEIGHT LASTTRANSITIONTIME podinfo Progressing 0 0 30s 20 80 2022-04-11T21:33:03Z .. NAME STATUS WEIGHT FAILEDCHECKS INTERVAL MIRROR STEPWEIGHT STEPWEIGHTS MAXWEIGHT LASTTRANSITIONTIME podinfo Succeeded 0 0 30s 20 80 2022-04-11T21:35:28Z
-
Visualize the entire progressive delivery through the Service Mesh Manager dashboard.
Traffic from “TEST-IMGW-APP” is shifted from “podinfo-primary” to “podinfo-canary” from 20% to 80% (according to the step configured for canary rollouts). The following image shows the incoming traffic on the “podinfo-primary” pod:
The following image shows the incoming traffic on “podinfo-canary” pod:
You can see that flagger dynamically shifts the ingress traffic to canary deployment in steps and performs conformity tests. Once the tests pass, flagger shifts the traffic back to the primary deployment and updates the version of the primary deployment to the new version.
Finally, Flagger scales down podinfo:6.0.0 and shifts the traffic to podinfo:6.1.0, and makes it a primary deployment.
The following image shows that the canary-image(v6.1.0) was tagged as primary-image(v6.1.0):
Automated rollback
To test automated rollback in case a canary fails, complete the following steps.
-
Generate status 500 and delay by running the following command on the tester pod by running:
watch "curl -s http://$INGRESS_IP/delay/1 && curl -s http://$INGRESS_IP/status/500"
-
Watch how the Canary release fails. Run
kubectl get canaries -n test -o wide
The output should be similar to:
NAME STATUS WEIGHT FAILEDCHECKS INTERVAL MIRROR STEPWEIGHT STEPWEIGHTS MAXWEIGHT LASTTRANSITIONTIME podinfo Progressing 60 1 30s 20 80 2022-04-11T22:10:33Z .. NAME STATUS WEIGHT FAILEDCHECKS INTERVAL MIRROR STEPWEIGHT STEPWEIGHTS MAXWEIGHT LASTTRANSITIONTIME podinfo Progressing 60 1 30s 20 80 2022-04-11T22:10:33Z .. NAME STATUS WEIGHT FAILEDCHECKS INTERVAL MIRROR STEPWEIGHT STEPWEIGHTS MAXWEIGHT LASTTRANSITIONTIME podinfo Progressing 60 2 30s 20 80 2022-04-11T22:11:03Z .. NAME STATUS WEIGHT FAILEDCHECKS INTERVAL MIRROR STEPWEIGHT STEPWEIGHTS MAXWEIGHT LASTTRANSITIONTIME podinfo Progressing 60 3 30s 20 80 2022-04-11T22:11:33Z .. NAME STATUS WEIGHT FAILEDCHECKS INTERVAL MIRROR STEPWEIGHT STEPWEIGHTS MAXWEIGHT LASTTRANSITIONTIME podinfo Failed 0 0 30s 20 80 2022-04-11T22:12:03Z
{"msg":"New revision detected! Scaling up podinfo.test","canary":"podinfo.test"} {"msg":"Starting canary analysis for podinfo.test","canary":"podinfo.test"} {"msg":"Advance podinfo.test canary weight 20","canary":"podinfo.test"} {"msg":"Advance podinfo.test canary weight 40","canary":"podinfo.test"} {"msg":"Advance podinfo.test canary weight 60","canary":"podinfo.test"} {"msg":"Halt podinfo.test advancement request duration 917ms > 500ms","canary":"podinfo.test"} {"msg":"Halt podinfo.test advancement request duration 598ms > 500ms","canary":"podinfo.test"} {"msg":"Halt podinfo.test advancement request duration 1.543s > 500ms","canary":"podinfo.test"} {"msg":"Rolling back podinfo.test failed checks threshold reached 3","canary":"podinfo.test"} {"msg":"Canary failed! Scaling down podinfo.test","canary":"podinfo.test"}
-
Visualize the canary rollout on the Service Mesh Manager Dashboard.
When the rollout steps from 0% -> 20% -> 40% -> 60%, you can observe that the performance degrades for incoming requests
> 500ms
, causing image rollout to halt. Threshold was set to max 3 attempts, so after trying for three times, rollout was backed off.The following image shows the “primary-pod” incoming traffic graph:
The following image shows the “canary-pod” incoming traffic graph:
Cleaning up
To clean up your cluster, run the following commands.
-
Remove the Gateway and Canary CRs.
kubectl delete -n test canaries.flagger.app podinfo kubectl delete -n test gateways.networking.istio.io public-gateway kubectl delete -n test istiomeshgateways.servicemesh.cisco.com test-imgw kubectl delete -n test deployment podinfo
-
Delete the “test” namespace.
kubectl delete namespace test
-
Uninstall the Flagger deployment and delete the canary CRD resource.
helm delete flagger -n smm-system kubectl delete -f https://raw.githubusercontent.com/fluxcd/flagger/main/artifacts/flagger/crd.yaml