-
Notifications
You must be signed in to change notification settings - Fork 110
Log Review
In order to standardize our approach to reviewing logs, we need to note what errors/warnings are acceptable and what are unexpected and need to be investigated. The following is a living document of ongoing updates to this process.
Quick note on how to start: go to https://one.newrelic.com/logger (login if necessary).
If you want to search for a string (like error), just type in the filter "error". If you want to exclude something, use -
: `-"IgnoreError". For more details, see https://docs.newrelic.com/docs/logs/ui-data/query-syntax-logs/.
Below are the list of applications and the currently "acceptable" log messages that include error.
These are documented as-is; ie if the application currently works and the log message exists, then it's "acceptable"
.
Some of these "acceptable" errors may fall in the "should fix" category, just that they are currently acceptable in current situation.
If you are looking for a specific task log, for some reason the current parser doesn't find the app name (may be a future improvement?). For now, app tasks can be found by only searching source_type:"task-name"
, such as source_type:"APP/TASK/harvester-check-completed (CI)"
. You can always search all logs for 7 days, then type source_type:
, and you should get a drop down list of all the tasks that have been run and you can select (if you aren't sure of the language). This is true for all of the log metadata.
#Catalog
In order to investigate catalog web logs, the current query is saved as a view in New Relic called catalog-log-review
Current "OK" or "expected" error list (some of these may be triaged, fixed, and tickets created for them in the future):
- 2022-08-26 16:02:13,040 ERROR [ckan.views.group] Group search error: ('Invalid "sort" parameter',)
- /home/vcap/deps/1/python/lib/python3.7/site-packages/gunicorn/workers/ggevent.py:38: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched)
- AttributeError: module 'ckanext.spatial.util' has no attribute 'transform_to_html'
- 2022-08-26 15:55:09,602 ERROR [ckan.views.dataset] Dataset search error: ('SOLR returned an error running query: {'facet.field': ['groups', 'vocab_category_all', 'metadata_type', 'tags', 'res_format', 'organization_type', 'organization', 'publisher', 'bureauCode'], 'fq': ['organization_type:"Federal Government" 9YQ3=1ZJY([!+!]) 6Do="" -collection_package_id:["" TO ] -dataset_type:harvest', '+site_id:"datagov_catalog"', '+state:active', '+permission_labels:("public")'], 'q': ':*', 'rows': 21, 'start': 0, 'df': 'text', 'sort': 'views_recent desc', 'fl': 'id validated_data_dict', 'facet': 'true', 'facet.limit': '50', 'facet.mincount': 1, 'wt': 'json', 'q.op': 'AND'} Error: SolrError('Solr responded with an error (HTTP 400): [Reason: org.apache.solr.search.SyntaxError: Cannot parse \'organization_type:"Federal Government" 9YQ3=1ZJY([!+!]) 6Do="" -collection_package_id:["" TO *] -dataset_type:harvest\': Encountered " "]" "] "" at line 1, column 53.\nWas expecting:\n "TO" ...\n ]')',)
- ValueError: A string literal cannot contain NUL (0x00) characters.
- 2022-08-26 15:32:42,146 ERROR [ckan.config.middleware.flask_app] User not authorized to read package ed026d91-dace-4b18-bd78-41b490c4c8ac
- 2022-08-26 15:25:32,657 ERROR [ckan.views.dataset] Dataset search error: ('Wrong bounding box provided',)
- KeyError: ('xml"952712@', <class 'rdflib.serializer.Serializer'>)
- 2022-08-26 14:31:53,988 ERROR [ckan.config.middleware.flask_app] No plugin registered for (xml;901727@, <class 'rdflib.serializer.Serializer'>)
- 2022-08-26 14:17:21,596 ERROR [ckan.config.middleware.flask_app] No plugin registered for (json-ld, <class 'rdflib.serializer.Serializer'>)
- 2022-08-26 18:56:09,108 ERROR [ckan.config.middleware.flask_app]
- 2022-08-23 17:42:26,380 ERROR [ckan.lib.api_token] Cannot decode JWT token: Not enough segments
- <11>1 2022-08-25T15:55:25.935781+00:00 gsa-datagov.prod.catalog-web cf731d26-fa01-4e9d-827f-591218847b80 [STG/0] - [tags@47450 deployment="cf-production" index="e49592cc-346d-4fd2-b553-bd659bffafef" instance_id="0" ip="10.10.2.6" job="diego-cell" origin="rep" source_id="cf731d26-fa01-4e9d-827f-591218847b80" source_type="STG"] error: subprocess-exited-with-error
- <11>1 2022-08-22T22:34:01.786976+00:00 gsa-datagov.staging.catalog-web c7bcca7a-fb54-40de-904b-293c57155e10 [STG/0] - [tags@47450 deployment="cf-production" index="e49592cc-346d-4fd2-b553-bd659bffafef" instance_id="0" ip="10.10.2.6" job="diego-cell" origin="rep" source_id="c7bcca7a-fb54-40de-904b-293c57155e10" source_type="STG"] error: error in /home/vcap/.pydistutils.cfg: command 'develop' has no such option 'allows_hosts'
- <14>1 2022-08-24T05:32:05.442559+00:00 gsa-datagov.prod.catalog-web cf731d26-fa01-4e9d-827f-591218847b80 [APP/PROC/WEB/3] - [tags@47450 app_id="cf731d26-fa01-4e9d-827f-591218847b80" app_name="catalog-web" deployment="cf-production" index="fca078c1-c655-4d4e-8dde-001c634c490b" instance_id="3" ip="10.10.1.27" job="diego-cell" organization_id="90047c5d-337f-4802-bd48-2149a4265040" organization_name="gsa-datagov" origin="rep" process_id="5da84c00-36b6-4993-83fe-d32cacb3a764" process_instance_id="c88e7438-3156-4526-7b1e-3d64" process_type="web" source_id="cf731d26-fa01-4e9d-827f-591218847b80" source_type="APP/PROC/WEB" space_id="acbabadb-2604-4736-81f7-01288b0371c6" space_name="prod"] ValueError: invalid literal for int() with base 10: '0ramsexywow'
In order to investigate catalog-admin logs, the current query is saved as a view in New Relic called catalog-admin-review
.
Almost all of the above are errors that are covered in other sections, but a few unique ones:
- source_type="APP/TASK/db-solr-sync scheduled (CI)"] 2022-10-14 04:39:51,726 ERROR [ckanext.spatial.plugin] Solr backend only supports bboxes (Polygons with 5 points), ignoring geometry {"type": "Point", "coordinates": [-89.0, 30.0]}
- source_type="APP/TASK/harvester-check-completed (CI)"] 2022-10-14 07:41:07,943 INFO [ckanext.harvest.logic.action.update] Marking job as finished due to error: https://www.nist.gov/sites/default/files/data.json 9856acc9-c762-48db-a720-7f3c3bf8d1a7
- /home/vcap/deps/1/python/lib/python3.7/site-packages/beaker/session.py:810: BeakerWarning: Invalidating corrupt session a4710c1a02b146448e1caa50f3c4424a; error was: '_accessed_time'. Set invalidate_corrupt=False to propagate this exception.
In order to investigate catalog-gather logs, the current query is saved as a view in New Relic called catalog-gather-review
.
Current "OK" or "expected" error list (some of these may be triaged, fixed, and tickets created for them in the future) is below.
Please note you can add to or reverse-engineer the examples below with the query above.
- TypeError: loads() takes 1 positional argument but 2 were given
- UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 5273: invalid start byte There are 3 other cloud.gov staging errors, see here
In order to investigate catalog-gather logs, the current query is saved as a view in New Relic called catalog-fetch-review
.
Current "OK" or "expected" error list (some of these may be triaged, fixed, and tickets created for them in the future) is below.
Please note you can add to or reverse-engineer the examples below with the query above.
- ERROR [ckanext.harvest.queue] Too many consecutive retries for object 113ae595-4884-436f-aa04-0095dcd225d4
- pysolr.SolrError: Solr responded with an error (HTTP 400): [Reason: org.apache.solr.search.SyntaxError: Cannot parse '+capacity:public extras_identifier:https://www.cdc.gov/nchs/data-linkage/index.htm AND extras_collection_metadata:true -collection_package_id:["" TO *] -dataset_type:harvest': Encountered " ":" ": "" at line 1, column 40.
- ckanext.datajson.exceptions.ParentNotHarvestedException: Unable to find parent dataset. Raising error to allow re-run later
- ERROR [ckanext.datajson.datajson] Parent identifier not found: "NCHS - Percent Distribution of Births to Unmarried Women by Age Group: United States"
- ERROR [ckanext.harvest.queue] Job 2f48d67f-53d6-40c4-989d-3da7d0ad1be6 was aborted or timed out, object 5ace7fe2-641c-466f-bee6-83f1d18fc627 set to error
- ERROR [ckanext.geodatagov.harvesters.base] ISO Transform Failure: b'Stylesheet file /home/vcap/deps/1/python/lib/python3.7/site-packages/ckanext/geodatagov/harvesters/fgdcrse2iso19115-2.xslt does not exist\n'
- ERROR [ckanext.spatial.harvesters.base] Validation errors found using profile fgdc_minimal for object with GUID 5e918903fb816724a8c4988071b9eee24fa29630
- ERROR [ckanext.geodatagov.harvesters.base] Invalid document: [('Element not found: Identification Citation Originator', None), ('Element not found: Metadata Reference Information Contact Address Type', None), ('Element not found: Metadata Reference Information Contact Address State', None)] with profile fgdc_minimal
- ERROR [ckanext.spatial.plugin] Trying to update package schema {'name': [<function not_empty at 0x7f289ee92a70>, <class 'str'>], 'vocabulary_id': [<function ignore_missing at 0x7f289ee92e60>, <function unicode_safe at 0x7f289ee940e0>, <function vocabulary_id_exists at 0x7f28a589b4d0>], 'revision_timestamp': [<function ignore at 0x7f289ee92cb0>], 'state': [<function ignore at 0x7f289ee92cb0>], 'display_name': [<function ignore at 0x7f289ee92cb0>]}
- ERROR [ckanext.spatial.plugin] Geometry not valid GeoJSON, not indexing There are 3 other cloud.gov staging errors, see here
In order to investigate catalog-proxy logs, the current query is saved as a view in New Relic called catalog-proxy-review
.
Current "OK" or "expected" error list (some of these may be triaged, fixed, and tickets created for them in the future) is below.
Please note you can add to or reverse-engineer the examples below with the query above. Please note we exclude 404
and x_cf_routererror
by default, as 404's are expected on a public site and x_cf_routererror
is a header included with many logs.
- 2022/10/14 15:54:11 [error] 105#0: *10555 peer closed connection in SSL handshake while SSL handshaking to upstream, client: 10.255.35.53, server: catalog-prod-datagov.app.cloud.gov, request: "GET /dataset/galileo-nims-spectral-image-tubes-jupiter-operations HTTP/1.1", upstream: "https://10.255.151.239:61443/dataset/galileo-nims-spectral-image-tubes-jupiter-operations", host: "catalog-prod-datagov.app.cloud.gov"
- 2022/10/14 09:27:12 [error] 118#0: *767822 connect() failed (111: Connection refused) while connecting to upstream, client: 10.255.100.30, server: catalog-prod-datagov.app.cloud.gov, request: "POST /_tracking HTTP/1.1", upstream: "https://10.255.35.131:61443/_tracking", host: "catalog-prod-datagov.app.cloud.gov"
- Ignore URL's that have
error
in the name:NginxLog "GET /dataset/removing-mid-spatial-frequency-errors-with-vibe-phase-i HTTP/1.1" 200 55736
In order to investigate inventory-proxy logs, the current query is saved as a view in New Relic called inventory-proxy-review
.
See catalog-proxy for details on what this query is excluding.
In order to investigate inventory logs, the current query is saved as a view in New Relic called inventory-review
.
Current "OK" or "expected" error list (some of these may be triaged, fixed, and tickets created for them in the future) is below.
Note that the above still has 42 logs found, but are actual stack traces which are hard to escape without catching actual concerning errors. A deeper dive would be needed to find the stack trace for these items.
- 2022-10-07 21:33:12,179 ERROR [ckan.lib.webassets_tools] Trying to include unknown asset: <base/vendor/jquery.js>
- 2022-10-07 20:20:57,802 ERROR [ckan.logic] Could not find resource f51ae92f-7dfb-4c2c-a484-dea3e7aedabb after all
- 2022-10-07 19:59:51,069 ERROR [ckanext.datajson.blueprint] NotAuthorized to generate JSON for unredacted to mcc-gov (marschnercm_mcc_gov)
- 2022-10-07 19:14:41,088 ERROR [ckan.config.middleware.flask_app] ajax_snippets/onlinePreview (or
ajax_snippets/setup.php
)