scale up/down catalog-web instances as needed #4549

FuhuXia · 2023-12-07T21:06:32Z

User Story

In order to keep catalog site performing well, data.gov team wants to scale up catalog-web instances count under a CPU usage spike, and scale back to normal when it is over to save memory usage.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

GIVEN catalog-web CPU usage is checked on a regular basis
WHEN AVG CPU usage is above 320% for 2 connective checks
THEN catalog-web instances are scaled up by 2 (max 9 total)
AND write logs as comment to a sticky issue on the catalog repo.
GIVEN catalog-web CPU usage is checked on a regular basis
WHEN AVG CPU usage is below 250% for 2 connective checks
THEN catalog-web instances are scaled down by 2 ( minimal 5 or as defined in manifest)
AND write logs as comment to a sticky issue on the catalog repo.

Background

This is to address catalog performance issue under stress.
cloud.gov does not offer auto scaling feature yet, so we have to implement our own.

Security Considerations (required)

None.

Sketch

This should be added into the restart script we have. We use the same script for both actions, but run the CPU check (and scaling when need) more frequent than the restart. For example, we do the CPU every five minutes, do the restart every 30 minutes. We do not want them run on different scripts to avoid two action overlapping. An ongoing restart will confuse the CPU check.

We do auto scaling every 5 mins. Any other ongoing deployment will make the task quit and wait another 5 mins and then try the auto scale again.

The text was updated successfully, but these errors were encountered:

FuhuXia · 2024-01-17T16:48:38Z

catalog-web could be struggling for hours before manually scaled up.

FuhuXia · 2024-02-07T15:40:46Z

it is deployed and auto scaling prod.

...
Running command: datagov/bin/check-and-renew catalog-web scale
No job running for app catalog-web
Current total instances: 5
Average CPU is 331.72. Too High.
Scaling up to 7
Scaling catalog-web to 7
Scaling app catalog-web in org gsa-datagov / space prod as ***...
...

FuhuXia added the bug Software defect or bug label Dec 7, 2023

btylerburton added this to data.gov team board Dec 7, 2023

gujral-rei moved this to 📔 Product Backlog in data.gov team board Dec 7, 2023

jbrown-xentity added the O&M Operations and maintenance tasks for the Data.gov platform label Dec 7, 2023

FuhuXia mentioned this issue Dec 8, 2023

O+M 2023-12-8 #4546

Closed

10 tasks

gujral-rei moved this from 📔 Product Backlog to 📟 Sprint Backlog [7] in data.gov team board Jan 18, 2024

FuhuXia moved this from 📟 Sprint Backlog [7] to 🏗 In Progress [8] in data.gov team board Jan 24, 2024

FuhuXia self-assigned this Jan 24, 2024

This was referenced Feb 5, 2024

add scale action for testing GSA/catalog.data.gov#1244

Merged

add scale-web-template #4609

Merged

auto scale catalog web GSA/catalog.data.gov#1247

Merged

FuhuXia moved this from 🏗 In Progress [8] to 👀 Needs Review [2] in data.gov team board Feb 7, 2024

FuhuXia closed this as completed Feb 7, 2024

github-project-automation bot moved this from 👀 Needs Review [2] to ✔ Done in data.gov team board Feb 7, 2024

btylerburton moved this from ✔ Done to 🗄 Closed in data.gov team board Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scale up/down catalog-web instances as needed #4549

scale up/down catalog-web instances as needed #4549

FuhuXia commented Dec 7, 2023 •

edited

Loading

FuhuXia commented Jan 17, 2024

FuhuXia commented Feb 7, 2024

scale up/down catalog-web instances as needed #4549

scale up/down catalog-web instances as needed #4549

Comments

FuhuXia commented Dec 7, 2023 • edited Loading

User Story

Acceptance Criteria

Background

Security Considerations (required)

Sketch

FuhuXia commented Jan 17, 2024

FuhuXia commented Feb 7, 2024

FuhuXia commented Dec 7, 2023 •

edited

Loading