Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory access violation (3221225477) crashes the Node while calculating PoW #172

Open
brusherru opened this issue Dec 28, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@brusherru
Copy link
Member

brusherru commented Dec 28, 2023

We have a report in Discord that go-spacemesh unexpectedly quits.

Go-spacemesh logs look fine and do not contain any errors.
However, app-logs indicates that the node process exits with the code 3221225477.

Here is a part of log files:

Go-spacemesh:

...
2023-12-27T22:28:26.699Z	INFO	800e3.post	initialization: file already initialized	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "post", "fileIndex": 671, "currentNumLabels": 134217728, "targetNumLabels": 134217728, "startPosition": 90060095488}
2023-12-27T22:28:26.699Z	INFO	800e3.post	initialization: completed, found nonce	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "post", "nonce": 81997299014}
2023-12-27T22:28:26.699Z	INFO	800e3.post	post setup completed	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "post", "node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "commitment_atx": "9399f34ca0", "data_dir": "E:\\HDDNODE4", "num_units": "21", "labels_per_unit": "4294967296", "provider": "{824636000664}", "name": "post"}
2023-12-27T22:28:26.705Z	INFO	800e3.atxBuilder	loaded the initial post from disk	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "atxBuilder"}
2023-12-27T22:28:26.705Z	INFO	800e3.atxBuilder	verifying the initial post	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "atxBuilder", "post": {"nonce": 179, "indices": "9f32e0326039241d242889fa5e051c0a1fd2601fa8321fdea40f70842a139ca6003bcf8617af6ef504e3df3c9863586785070f2ed956f7a12531444074d6d6d888c894391bd159cceb2308b58ace844bff34a1c416cb849666a915dd62e301a371f021dd020f3a8efceae9dc329e3f4898e666e82609e110f5c5c2dca24eac98c234d313c6a698fa23a095dab51623fb3c247a6813d340abed5e9761bfbd88b54338f695cd2ee80ae9681701"}, "metadata": {"Challenge": "0000000000000000000000000000000000000000000000000000000000000000", "LabelsPerUnit": 4294967296}, "name": "atxBuilder"}
2023-12-27T22:28:26.815Z	INFO	800e3.atxBuilder	atx challenge is ready	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "atxBuilder", "sessionId": "0d2a3100-fd10-4ad5-a1f4-ca91d8129d18", "current_epoch": "11", "publish_epoch": "11", "target_epoch": "12", "name": "atxBuilder"}
2023-12-27T22:28:26.815Z	INFO	800e3.nipostBuilder	building nipost	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "nipostBuilder", "sessionId": "0d2a3100-fd10-4ad5-a1f4-ca91d8129d18", "poet round start": "2023-12-11T08:00:00.000Z", "poet round end": "2023-12-24T20:00:00.000Z", "publish epoch": "11", "publish epoch end": "2023-12-29T08:00:00.000Z", "target epoch": "12", "name": "nipostBuilder"}
2023-12-27T22:28:26.952Z	INFO	800e3.nipostBuilder	starting post execution	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "nipostBuilder", "challenge": "VgDHwLYegT5coz4ptGFD9ORuJeEVBEoVIwiC0CsXcjo=", "name": "nipostBuilder"}
2023-12-27T22:28:26.952Z	INFO	800e3.nipostValidator	scaling post verifier	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "nipostValidator", "current": 9, "new": 1}
2023-12-27T22:28:26.952Z	INFO	800e3.nipostValidator.worker-8	stopped	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "nipostValidator"}
2023-12-27T22:28:26.952Z	INFO	800e3.nipostValidator.worker-1	stopped	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "nipostValidator"}
2023-12-27T22:28:26.952Z	INFO	800e3.nipostValidator.worker-5	stopped	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "nipostValidator"}
2023-12-27T22:28:26.952Z	INFO	800e3.nipostValidator.worker-7	stopped	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "nipostValidator"}
2023-12-27T22:28:26.952Z	INFO	800e3.nipostValidator.worker-4	stopped	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "nipostValidator"}
2023-12-27T22:28:26.952Z	INFO	800e3.nipostValidator.worker-2	stopped	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "nipostValidator"}
2023-12-27T22:28:26.952Z	INFO	800e3.nipostValidator.worker-3	stopped	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "nipostValidator"}
2023-12-27T22:28:26.952Z	INFO	800e3.nipostValidator.worker-6	stopped	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "nipostValidator"}
2023-12-27T22:28:26.960Z	INFO	800e3.nipostValidator	generating proof with PoW flags: RandomXFlag(FLAG_HARD_AES | FLAG_FULL_MEM | FLAG_JIT | FLAG_ARGON2_SSSE3 | FLAG_ARGON2_AVX2) and params: ProvingParams { difficulty: 5317578556, pow_difficulty: [0, 0, 170, 111, 105, 238, 214, 162, 12, 48, 195, 12, 48, 195, 12, 48, 195, 12, 48, 195, 12, 48, 195, 12, 48, 195, 12, 48, 195, 12, 48, 195] }	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "nipostValidator", "module": "post::prove", "file": "src\\prove.rs", "line": 276}
2023-12-27T22:28:27.331Z	INFO	grpc	finished streaming call with code OK	{"module": "grpc", "grpc.start_time": "2023-12-27T22:19:58Z", "system": "grpc", "span.kind": "server", "grpc.service": "spacemesh.v1.SmesherService", "grpc.method": "PostSetupStatusStream", "peer.address": "127.0.0.1:49946", "grpc.code": "OK", "grpc.time_ms": 509008.72}
2023-12-27T22:28:27.642Z	INFO	grpc	finished streaming call with code OK	{"module": "grpc", "grpc.start_time": "2023-12-27T22:19:57Z", "system": "grpc", "span.kind": "server", "grpc.service": "spacemesh.v1.SmesherService", "grpc.method": "PostSetupStatusStream", "peer.address": "127.0.0.1:49946", "grpc.code": "OK", "grpc.time_ms": 510008.3}
2023-12-27T22:28:40.483Z	INFO	800e3.blockHandler	new block	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "blockHandler", "block_id": "05686864f8", "layer_id": 47462, "name": "blockHandler"}
2023-12-27T22:28:40.514Z	INFO	800e3.executor	executed block	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "executor", "sessionId": "155456e0-8e97-4ecb-b0de-9799d75db3fc", "lid": 47461, "block": "0b73fbbbac", "state_hash": "0xb9e5cb60045e4c592357b51cc8611805fcea1658ad18b770fba1102619eb8833", "duration": "59.4948ms", "count": 3, "rewards": 42, "name": "executor"}
2023-12-27T22:28:40.633Z	INFO	800e3.blockHandler	new block	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "blockHandler", "block_id": "86b9eb89c9", "layer_id": 47464, "name": "blockHandler"}
2023-12-27T22:28:40.887Z	INFO	800e3.executor	executed block	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "executor", "sessionId": "155456e0-8e97-4ecb-b0de-9799d75db3fc", "lid": 47462, "block": "05686864f8", "state_hash": "0xaf193fd9dba8fba8d5e14848f6e8348b2bdc0d27d4cba5ebb4ab36e2e68e9d6c", "duration": "43.1586ms", "count": 2, "rewards": 47, "name": "executor"}
2023-12-27T22:28:40.935Z	INFO	800e3.executor	executed block	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "executor", "sessionId": "155456e0-8e97-4ecb-b0de-9799d75db3fc", "lid": 47463, "block": "afcde47b00", "state_hash": "0xd2496d0bc4f538c98ff6d9c83de2ff7fde108e940524c8d716880ae719e44fca", "duration": "41.1809ms", "count": 1, "rewards": 48, "name": "executor"}
2023-12-27T22:28:40.975Z	INFO	800e3.executor	executed block	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "executor", "sessionId": "155456e0-8e97-4ecb-b0de-9799d75db3fc", "lid": 47464, "block": "86b9eb89c9", "state_hash": "0x697b54dda4cf332e696e8b00511172628514427c9053252f569467d301e847e4", "duration": "39.1292ms", "count": 0, "rewards": 57, "name": "executor"}
2023-12-27T22:28:44.688Z	INFO	grpc	started streaming call	{"module": "grpc", "grpc.start_time": "2023-12-27T22:28:44Z", "system": "grpc", "span.kind": "server", "grpc.service": "spacemesh.v1.AdminService", "grpc.method": "EventsStream", "peer.address": "127.0.0.1:49946"}
2023-12-27T22:28:45.856Z	INFO	800e3.blockHandler	new block	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "blockHandler", "block_id": "5d7e197d88", "layer_id": 47465, "name": "blockHandler"}
2023-12-27T22:28:46.318Z	INFO	800e3.blockHandler	new block	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "blockHandler", "block_id": "218cdf8128", "layer_id": 47466, "name": "blockHandler"}
2023-12-27T22:28:46.509Z	INFO	800e3.blockHandler	new block	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "blockHandler", "block_id": "0ca98cc9c3", "layer_id": 47467, "name": "blockHandler"}
2023-12-27T22:28:46.520Z	INFO	800e3.executor	executed block	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "executor", "sessionId": "155456e0-8e97-4ecb-b0de-9799d75db3fc", "lid": 47465, "block": "5d7e197d88", "state_hash": "0xaddf8c52ca05242676aaa426f342139b00d6d16a8c5fe83bce63eaee28a774dd", "duration": "51.3029ms", "count": 1, "rewards": 42, "name": "executor"}
2023-12-27T22:28:47.238Z	INFO	800e3.executor	executed block	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "executor", "sessionId": "155456e0-8e97-4ecb-b0de-9799d75db3fc", "lid": 47466, "block": "218cdf8128", "state_hash": "0x1ebe77e764256def431b07f15e5f072f7788b51218653c32e2ee6fb8b7975490", "duration": "45.5812ms", "count": 3, "rewards": 43, "name": "executor"}
2023-12-27T22:28:47.285Z	INFO	800e3.executor	executed block	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "executor", "sessionId": "155456e0-8e97-4ecb-b0de-9799d75db3fc", "lid": 47467, "block": "0ca98cc9c3", "state_hash": "0xae8d7935e393aafbce6596cfc5ac88f0eddf54c2174591ed1eb5028b16ddfcab", "duration": "47.7282ms", "count": 1, "rewards": 48, "name": "executor"}
2023-12-27T22:28:47.514Z	INFO	800e3.blockHandler	new block	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "blockHandler", "block_id": "bd553bf3d4", "layer_id": 47468, "name": "blockHandler"}
2023-12-27T22:28:47.927Z	WARN	800e3.sync	mesh failed to process layer from sync	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "sync", "sessionId": "155456e0-8e97-4ecb-b0de-9799d75db3fc", "layer_id": 47470, "errmsg": "get block: get block bd553bf3d4: database: not found", "name": "sync"}
2023-12-27T22:28:47.986Z	INFO	800e3.blockHandler	new block	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "blockHandler", "block_id": "36ab2e082e", "layer_id": 47469, "name": "blockHandler"}
2023-12-27T22:29:02.364Z	INFO	800e3.nipostValidator	calculating proof of work for nonces 0..288	{"node_id": "800e38f530ceb835110a88f477a2804b5008a22f02b935fdd8a03bb8240c0b4d", "module": "nipostValidator", "module": "post::prove", "file": "src\\prove.rs", "line": 131}

App-logs:

...
[2023-12-27 22:28:27.634] [info]  SmesherService, in grpc PostDataCreationProgressStream, output: {"state":5,"numLabelsWritten":90194313216} 
[2023-12-27 22:28:27.636] [info]  SmesherService, in PostSetupStatusStream, output: {"postSetupState":5,"numLabelsWritten":90194313216,"opts":{"dataDir":"E:\\HDDNODE4","numUnits":21,"maxFileSize":2147483648,"providerId":4294967295}} 
[2023-12-27 22:28:27.636] [info]  SmesherService, in PostSetupStatusStream, output: "Status complete -> closing stream" 
[2023-12-27 22:28:33.775] [info]  NodeManager, in sendNodeStatus, output: {"connectedPeers":25,"isSynced":false,"syncedLayer":47465,"topLayer":47981,"verifiedLayer":47460} 
[2023-12-27 22:28:38.098] [info]  NodeManager, in sendNodeStatus, output: {"connectedPeers":26,"isSynced":false,"syncedLayer":47465,"topLayer":47981,"verifiedLayer":47460} 
[2023-12-27 22:28:45.280] [info]  NodeManager, in sendNodeStatus, output: {"connectedPeers":26,"isSynced":false,"syncedLayer":47465,"topLayer":47981,"verifiedLayer":47464} 
[2023-12-27 22:28:49.045] [info]  NodeManager, in sendNodeStatus, output: {"connectedPeers":26,"isSynced":false,"syncedLayer":47469,"topLayer":47981,"verifiedLayer":47467} 
[2023-12-27 22:29:02.323] [info]  NodeManager, in sendNodeStatus, output: {"connectedPeers":25,"isSynced":false,"syncedLayer":47469,"topLayer":47981,"verifiedLayer":47467} 
[2023-12-27 22:29:13.199] [info]  NodeManager, in sendNodeStatus, output: {"connectedPeers":24,"isSynced":false,"syncedLayer":47469,"topLayer":47981,"verifiedLayer":47467} 
[2023-12-27 22:29:14.619] [info]  NodeManager, in sendNodeStatus, output: {"connectedPeers":23,"isSynced":false,"syncedLayer":47469,"topLayer":47981,"verifiedLayer":47467} 
[2023-12-27 22:29:34.373] [error] syncSmesherInfo, in subscribeNodeEvents, error: "Error: 14 UNAVAILABLE: read ECONNRESET" 
[2023-12-27 22:29:34.821] [error] NodeManager, in Node Process close, error: 3221225477 
...

I guess that prove.rs tries to read wrong address somehow and that crashes an app.

You can download full logs from the report in Discord or ask User to help with the investigation (e.g. running Smapp with --pprof-server flag to pass it to go-spacemesh, and then fetch required profiles).

@brusherru brusherru added the bug Something isn't working label Dec 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant