Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

O+M 2023-12-8 #4546

Closed
8 of 10 tasks
hkdctol opened this issue Dec 1, 2023 · 7 comments
Closed
8 of 10 tasks

O+M 2023-12-8 #4546

hkdctol opened this issue Dec 1, 2023 · 7 comments

Comments

@hkdctol
Copy link
Contributor

hkdctol commented Dec 1, 2023

As part of day-to-day operation of Data.gov, there are many Operation and Maintenance (O&M) responsibilities. Instead of having the entire team watching notifications and risking some notifications slipping through the cracks, we have created an O&M Triage role. One person on the team is assigned the Triage role which rotates each sprint. This is not meant to be a 24/7 responsibility, only East Coast business hours. If you are unavailable, please note when you will be unavailable in Slack and ask for someone to take on the role for that time.

Check the O&M Rotation Schedule for future planning.

Acceptance criteria

You are responsible for all O&M responsibilities this week. We've highlighted a few so they're not forgotten. You can copy each checklist into your daily report.

Daily Checklist

Note: Catalog Auto Tasks
You will need to update the chart values manually. Click the Action link in each issue and grab the values from monitor task output and check runtime.

Weekly Checklist

Monthly Checklist

Reference

@hkdctol hkdctol moved this to 📟 Sprint Backlog [7] in data.gov team board Dec 1, 2023
@FuhuXia FuhuXia moved this from 📟 Sprint Backlog [7] to 🏗 In Progress [8] in data.gov team board Dec 4, 2023
@FuhuXia
Copy link
Member

FuhuXia commented Dec 5, 2023

Alert "inventory.data.gov unavailable" is getting too frequent. We used to get once a month. Now we are getting once or twice a week.

@FuhuXia
Copy link
Member

FuhuXia commented Dec 8, 2023

The out of memory error is affecting our GH actions, we are seeing a few failed actions(egress check and harvest job) because of it. It is also affecting catalog performance, when catalog-web seeing an increased count of memory errors over the past 60 days.

Image

I think catalog-web performance issue can be alleviated by:

  1. increase memory in our cf account by another 10GB,
  2. turn off idle apps (non-prod harvesters, dev airflows...),
  3. auto scale up/down catalog-web

@FuhuXia
Copy link
Member

FuhuXia commented Dec 8, 2023

After #4535 fix, for the first time over the past a few weeks/months, we see db and solr are totally synced.

[ckanext.geodatagov] total 404388 solr indexed_package
[ckanext.geodatagov] 0 packages need to be removed from Solr
[ckanext.geodatagov] 0 packages need to be updated/added to Solr
[ckanext.geodatagov] 0 packages without harvest_object need to be mannually

@FuhuXia
Copy link
Member

FuhuXia commented Dec 8, 2023

On catalog we are also seeing a spike of this error message.

INFO  [ckan.config.middleware.flask_app] 500 Internal Server Error:
The server encountered an internal error and was unable to complete
your request. Either the server is overloaded or there is an error in the application.

@FuhuXia
Copy link
Member

FuhuXia commented Dec 11, 2023

Multiple NOAA sources are not reachable, for example

[linux ~]# curl -I https://data.noaa.gov/waf/NOAA/coris/native/iso/xml/
HTTP/1.1 403 Forbidden
Date: Mon, 11 Dec 2023 18:13:05 GMT
Server: Apache
Strict-Transport-Security: max-age=31536000
Content-Type: text/html; charset=iso-8859-1
Connection: close

[linux ~]# curl -I https://data.noaa.gov/waf/NOAA/NESDIS/NGDC/MGG/NOS/H06001-H08000/iso/xml/
HTTP/1.1 403 Forbidden
Date: Mon, 11 Dec 2023 18:13:14 GMT
Server: Apache
Strict-Transport-Security: max-age=31536000
Content-Type: text/html; charset=iso-8859-1
Connection: close

Here is the full list the NOAA harvest sources. We should set them to Manual and notify NOAA contact point. @hkdctol

harvest source: https://catalog.data.gov/harvest/nesdis-ngdc-mgg-nos-l00001-l02000
url: https://data.noaa.gov/waf/NOAA/NESDIS/NGDC/MGG/NOS/L00001-L02000/iso/xml/

harvest source: https://catalog.data.gov/harvest/nesdis-ngdc-mgg-nos-l02001-l04000
url: https://data.noaa.gov/waf/NOAA/NESDIS/NGDC/MGG/NOS/L02001-L04000/iso/xml/

harvest source: https://catalog.data.gov/harvest/nesdis-ngdc-mgg-nos-t00001-t02000
url: https://data.noaa.gov/waf/NOAA/NESDIS/NGDC/MGG/NOS/T00001-T02000/iso/xml/

harvest source: https://catalog.data.gov/harvest/ngdc-stp-dmsp
url: https://data.noaa.gov/waf/NOAA/NESDIS/NGDC/STP/DMSP/iso/xml/

harvest source: https://catalog.data.gov/harvest/nsidc
url: https://data.noaa.gov/waf/NOAA/NESDIS/ncei/nsidc/iso/xml/

harvest source: https://catalog.data.gov/harvest/coris-native-iso-metadata
url: https://data.noaa.gov/waf/NOAA/coris/native/iso/xml/

harvest source: https://catalog.data.gov/harvest/nesdis-ngdc-mgg-nos-h02001-h04000
url: https://data.noaa.gov/waf/NOAA/NESDIS/NGDC/MGG/NOS/H02001-H04000/iso/xml/

harvest source: https://catalog.data.gov/harvest/nesdis-ngdc-mgg-nos-h04001-h06000
url: https://data.noaa.gov/waf/NOAA/NESDIS/NGDC/MGG/NOS/H04001-H06000/iso/xml/

harvest source: https://catalog.data.gov/harvest/nesdis-ngdc-mgg-nos-h06001-h08000
url: https://data.noaa.gov/waf/NOAA/NESDIS/NGDC/MGG/NOS/H06001-H08000/iso/xml/

harvest source: https://catalog.data.gov/harvest/ngdc-stp-ionosphere
url: https://data.noaa.gov/waf/NOAA/NESDIS/NGDC/STP/Ionosphere/iso/xml/

harvest source: https://catalog.data.gov/harvest/nesdis-ngdc-mgg-nos-h08001-h10000
url: https://data.noaa.gov/waf/NOAA/NESDIS/NGDC/MGG/NOS/H08001-H10000/iso/xml/

@FuhuXia FuhuXia closed this as completed Dec 12, 2023
@github-project-automation github-project-automation bot moved this from 🏗 In Progress [8] to ✔ Done in data.gov team board Dec 12, 2023
@hkdctol
Copy link
Contributor Author

hkdctol commented Dec 13, 2023

@FuhuXia can you set the sources listed above to manual or have you already done that?

@FuhuXia
Copy link
Member

FuhuXia commented Dec 13, 2023

@hkdctol Just did. All listed are changed to Manual.

@btylerburton btylerburton moved this from ✔ Done to 🗄 Closed in data.gov team board Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants