-
Notifications
You must be signed in to change notification settings - Fork 1
Fault Tolerance Testing
We performed fault tolerance testing with a replica count of 3 for the data microservice – our heaviest microservice. We manually deleted 1 pod while testing for 100 user threads via JMeter to evaluate the performance when 1 pod is down.
As expected, our throughput remained the same (20.4/min) even after deleting 1 pod as Kubernetes respawned a new pod to maintain the total number of replicas specified in the deployment YAML.
The service experienced an error of 4% as the response time of this service is high (8-10 seconds under normal conditions). Hence when a pod was deleted, the services in it’s queue were rejected, resulting in error.
Below is the demonstration of our fault tolerance testing. Deleting 1 pod manually:
After deletion, new replica of pod is created by K8s:
Aggregate graph results after deletion of pod:
Throughput impact due to deletion remained negligible, however the error rate went up to 4% due to the pod restarting.
Response time after deleting 1 pod during execution of requests
- Testing Overview
- Load Testing
- Overall System Load Testing
- Spike Testing
- Fault Tolerance Testing
- Conclusion & Future Improvements
- Data Assimilation
- Architecture Improvements: Message Queues, Caching and Polling
- CI/CD and Infrastructure Deployment
- Visualization
- Custos Deployment Status
- Rancher Setup
- Kubernetes Cluster Deployment using Rancher
- Setting cert-manager, keycloak, consul, vault and MySQL
- Custos Deployment
- JMeter Testing for Custos Deployment with Python SDK
- Custos - Suggested Improvements