You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to keep catalog site performing well, data.gov team wants to scale up catalog-web instances count under a CPU usage spike, and scale back to normal when it is over to save memory usage.
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
GIVEN catalog-web CPU usage is checked on a regular basis
WHEN AVG CPU usage is above 320% for 2 connective checks
THEN catalog-web instances are scaled up by 2 (max 9 total)
AND write logs as comment to a sticky issue on the catalog repo.
GIVEN catalog-web CPU usage is checked on a regular basis
WHEN AVG CPU usage is below 250% for 2 connective checks
THEN catalog-web instances are scaled down by 2 ( minimal 5 or as defined in manifest)
AND write logs as comment to a sticky issue on the catalog repo.
This should be added into the restart script we have. We use the same script for both actions, but run the CPU check (and scaling when need) more frequent than the restart. For example, we do the CPU every five minutes, do the restart every 30 minutes. We do not want them run on different scripts to avoid two action overlapping. An ongoing restart will confuse the CPU check.
We do auto scaling every 5 mins. Any other ongoing deployment will make the task quit and wait another 5 mins and then try the auto scale again.
The text was updated successfully, but these errors were encountered:
...
Running command: datagov/bin/check-and-renew catalog-web scale
No job running for app catalog-web
Current total instances: 5
Average CPU is 331.72. Too High.
Scaling up to 7
Scaling catalog-web to 7
Scaling app catalog-web in org gsa-datagov / space prod as ***...
...
User Story
In order to keep catalog site performing well, data.gov team wants to scale up
catalog-web
instances count under a CPU usage spike, and scale back to normal when it is over to save memory usage.Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
GIVEN catalog-web CPU usage is checked on a regular basis
WHEN AVG CPU usage is above 320% for 2 connective checks
THEN catalog-web instances are scaled up by 2 (max 9 total)
AND write logs as comment to a sticky issue on the catalog repo.
GIVEN catalog-web CPU usage is checked on a regular basis
WHEN AVG CPU usage is below 250% for 2 connective checks
THEN catalog-web instances are scaled down by 2 ( minimal 5 or as defined in manifest)
AND write logs as comment to a sticky issue on the catalog repo.
Background
This is to address catalog performance issue under stress.
cloud.gov does not offer auto scaling feature yet, so we have to implement our own.
Security Considerations (required)
None.
Sketch
This should be added into the restart script we have.
We use the same script for both actions, but run the CPU check (and scaling when need) more frequent than the restart. For example, we do the CPU every five minutes, do the restart every 30 minutes. We do not want them run on different scripts to avoid two action overlapping. An ongoing restart will confuse the CPU check.We do auto scaling every 5 mins. Any other ongoing deployment will make the task quit and wait another 5 mins and then try the auto scale again.
The text was updated successfully, but these errors were encountered: