Skip to content

Custos Suggested Improvements

rajdeepc2792 edited this page May 6, 2022 · 3 revisions
  1. There are multiple services which have a single dependency point, for example when we faced an error in Tenant Management Service during two post requests (at the custos setup end) there are connection logs to Custos-Configuration-Services. And it seems every API call uses this Service. This can be removed.

  2. There are grpc calls between multiple services on a single API call, and this makes a long grpc queue between consecutive services. For eg, Tenant Management Service call I-am-Admin-Core-Service, it calls Federated-Service and then calls Keycloak. In my opinion, this architectural idea of calling Services in a queue can be resolved somehow.

  3. There are multiple external services like Keycloak, Vault and SQL which requires external access too, this is a scope of higher vulnerability to system. This systems should also be allowed access by setting up some additional Access Point Controls in between.

  4. Logs in the system are not sequential in nature in some places, i.e. log statements in service should only provide information related to the next point where it failed, but every log seems to have information about last failure contact.

  5. During the load test on different endpoints we noticed a large difference in response times. For Eg, in the user creation and user update queries had a throughput ~20 per minute, and for other queries like group and entity the throughput ~2 per sec. User services can be improved for better throughput performance.

  6. There seems to be some bottleneck during the service call, as during the stress testing for all services when 300+ concurrent users tried to hit the service the failure rate increased drastically, in my opinion increasing number of pods will solve this issue. On dev, there are only 1-2 replicas running for each service.

Clone this wiki locally