Segmentation fault #216

NorthFaceGoose · 2024-12-26T13:41:43Z

请帮助我解决问题：

我在运行./main_qwen_npu 时发生Segmentation fault，并出现了解码为< unk >的形式,。
使用参数为 -s 512 -c 1 以及 -s 256 -c 1都出现了这一问题
我严格按照您文档中说的进行构建，QNN：Linux V2.20，Hexagon SDK: Linux 5.5.0.1
使用设备为OnePlus12(8gen3+24gb ram)

问题已经困扰我很长一段时间，@liang1232018 @lx200916 请帮助我解决问题，万分感谢！

[Q] <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Give me a short introduction to large language model.<|im_end|>
<|im_start|>assistant

[A] TIME of CPU Graph 0: 37.758ms, End at 37.758
TIME of QNN Graph 1: 10.555ms, End at 48.462
TIME of CPU Graph 2: 126.913ms, End at 175.467
TIME of QNN Graph 3: 17.417ms, End at 192.925
TIME of CPU Graph 4: 47.361ms, End at 240.33
TIME of QNN Graph 5: 16.956ms, End at 257.343
TIME of CPU Graph 6: 67.726ms, End at 325.1
TIME of QNN Graph 7: 3.644ms, End at 328.768
TIME of CPU Graph 8: 97.125ms, End at 425.903
TIME of QNN Graph 9: 17.182ms, End at 443.11
TIME of CPU Graph 10: 71.429ms, End at 514.619
TIME of QNN Graph 11: 7.845ms, End at 522.521
TIME of CPU Graph 12: 127.636ms, End at 650.211
TIME of QNN Graph 13: 17.206ms, End at 667.449
TIME of CPU Graph 14: 151.102ms, End at 818.613
TIME of QNN Graph 15: 17.096ms, End at 835.771
TIME of CPU Graph 16: 84.004ms, End at 919.828
TIME of QNN Graph 17: 17.602ms, End at 937.467
TIME of CPU Graph 18: 201.004ms, End at 1138.54
TIME of QNN Graph 19: 28.027ms, End at 1166.58
TIME of CPU Graph 20: 114.533ms, End at 1281.14
TIME of QNN Graph 21: 9.795ms, End at 1290.96
TIME of CPU Graph 22: 65.113ms, End at 1356.11
TIME of QNN Graph 23: 17.4ms, End at 1373.58
TIME of CPU Graph 24: 97.078ms, End at 1471.02
TIME of QNN Graph 25: 35.592ms, End at 1506.67
TIME of CPU Graph 26: 138.259ms, End at 1644.97
TIME of QNN Graph 27: 18.67ms, End at 1663.69
TIME of CPU Graph 28: 103.715ms, End at 1767.46
TIME of QNN Graph 29: 23.822ms, End at 1791.4
TIME of CPU Graph 30: 39.663ms, End at 1831.1
TIME of QNN Graph 31: 19.88ms, End at 1851.04
TIME of CPU Graph 32: 95.742ms, End at 1946.82
TIME of QNN Graph 33: 30.683ms, End at 1977.54
TIME of CPU Graph 34: 109.895ms, End at 2087.47
TIME of QNN Graph 35: 23.832ms, End at 2113.01
TIME of CPU Graph 36: 39.957ms, End at 2153.01
TIME of QNN Graph 37: 21.951ms, End at 2175
TIME of CPU Graph 38: 34.331ms, End at 2210.93
TIME of QNN Graph 39: 30.333ms, End at 2241.3
TIME of CPU Graph 40: 183.765ms, End at 2425.96
TIME of QNN Graph 41: 40.018ms, End at 2467.16
TIME of CPU Graph 42: 253.994ms, End at 2723.34
TIME of QNN Graph 43: 36.832ms, End at 2761.75
TIME of CPU Graph 44: 80.222ms, End at 2842.82
TIME of QNN Graph 45: 42.102ms, End at 2887.49
TIME of CPU Graph 46: 67.164ms, End at 2954.74
TIME of QNN Graph 47: 27.288ms, End at 2982.89
TIME of CPU Graph 48: 442.141ms, End at 3425.88
TIME of QNN Graph 49: 71.884ms, End at 3498.73
TIME of CPU Graph 50: 416.232ms, End at 3919.07
TIME of QNN Graph 51: 30.775ms, End at 3952.18
TIME of CPU Graph 52: 705.464ms, End at 4658.46
TIME of QNN Graph 53: 32.073ms, End at 4692.66
TIME of CPU Graph 54: 180.01ms, End at 4873.38
TIME of QNN Graph 55: 22.625ms, End at 4896.09
prefill time: 4896.12ms

下面为Decoding出来的字符（由于通过文本无法显示< unk >故用截图显示）

====================
load time: 1747.03 ms
token time: nan ms
inference speed: nan tokens/s
load time: 3882.66 ms
token time: 420.943 ms
inference speed: 2.37562 tokens/s
0.0ms [WARNING] sg_stubPtr is not null, skip loadRemoteSymbols

Segmentation fault

shinel013 · 2024-12-27T09:40:18Z

I have the same question. Likely it's the question of KVCache, and I try to modify main_qwen_npu.cpp like this to make isDecoding is enabled for op CPUKVCacheNPU::setUp when executing on decode phase

prefill_cpu_backend->setSequenceLength(real_seq_length);
+prefill_cpu_backend->setExecutionType(AUTOREGRESSIVE);
prefill_cpu_backend->toggleSwitching();
inter_cpu_backend->setSequenceLength(real_seq_length);
+inter_cpu_backend->setExecutionType(AUTOREGRESSIVE);
inter_cpu_backend->toggleSwitching();
decode_cpu_backend->setSequenceLength(real_seq_length);
+decode_cpu_backend->setExecutionType(AUTOREGRESSIVE);
decode_cpu_backend->toggleSwitching();

NorthFaceGoose · 2024-12-28T12:04:31Z

I have the same question. Likely it's the question of KVCache, and I try to modify main_qwen_npu.cpp like this to make isDecoding is enabled for op CPUKVCacheNPU::setUp when executing on decode phase

prefill_cpu_backend->setSequenceLength(real_seq_length); +prefill_cpu_backend->setExecutionType(AUTOREGRESSIVE); prefill_cpu_backend->toggleSwitching(); inter_cpu_backend->setSequenceLength(real_seq_length); +inter_cpu_backend->setExecutionType(AUTOREGRESSIVE); inter_cpu_backend->toggleSwitching(); decode_cpu_backend->setSequenceLength(real_seq_length); +decode_cpu_backend->setExecutionType(AUTOREGRESSIVE); decode_cpu_backend->toggleSwitching();

@shinel013 感谢您给的建议，但是我还是没有解决问题，仍然出现Segmentation fault这样的问题

[Q] <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Give me a short introduction to large language model.<|im_end|>
<|im_start|>assistant

[A]  TIME of CPU Graph 0: 1.805ms, End at 1.806
 TIME of QNN Graph 1: 4.795ms, End at 6.635
 TIME of CPU Graph 2: 24.957ms, End at 31.622
 TIME of QNN Graph 3: 7.825ms, End at 39.48
 TIME of CPU Graph 4: 5.267ms, End at 44.776
 TIME of QNN Graph 5: 7.189ms, End at 51.996
 TIME of CPU Graph 6: 6.939ms, End at 58.963
 TIME of QNN Graph 7: 2.437ms, End at 61.438
 TIME of CPU Graph 8: 6.164ms, End at 67.63
 TIME of QNN Graph 9: 6.965ms, End at 74.627
 TIME of CPU Graph 10: 9.968ms, End at 84.626
 TIME of QNN Graph 11: 3.16ms, End at 87.825
 TIME of CPU Graph 12: 10.777ms, End at 98.644
 TIME of QNN Graph 13: 7.805ms, End at 106.504
 TIME of CPU Graph 14: 9.973ms, End at 116.525
 TIME of QNN Graph 15: 7.813ms, End at 124.394
 TIME of CPU Graph 16: 26.499ms, End at 150.947
 TIME of QNN Graph 17: 8.027ms, End at 159.029
 TIME of CPU Graph 18: 18.208ms, End at 177.392
 TIME of QNN Graph 19: 7.398ms, End at 184.903
 TIME of CPU Graph 20: 33.073ms, End at 218.026
 TIME of QNN Graph 21: 2.392ms, End at 220.449
 TIME of CPU Graph 22: 14.04ms, End at 234.505
 TIME of QNN Graph 23: 8.494ms, End at 243.039
 TIME of CPU Graph 24: 25.458ms, End at 268.535
 TIME of QNN Graph 25: 8.607ms, End at 277.177
 TIME of CPU Graph 26: 14.913ms, End at 292.126
 TIME of QNN Graph 27: 7.479ms, End at 299.644
 TIME of CPU Graph 28: 21.472ms, End at 321.153
 TIME of QNN Graph 29: 9.728ms, End at 330.905
 TIME of CPU Graph 30: 27.361ms, End at 358.286
 TIME of QNN Graph 31: 9.586ms, End at 367.908
 TIME of CPU Graph 32: 22.736ms, End at 390.668
 TIME of QNN Graph 33: 7.863ms, End at 398.568
 TIME of CPU Graph 34: 11.835ms, End at 410.451
 TIME of QNN Graph 35: 8.653ms, End at 419.684
 TIME of CPU Graph 36: 26.248ms, End at 445.963
 TIME of QNN Graph 37: 7.655ms, End at 453.656
 TIME of CPU Graph 38: 37.365ms, End at 491.06
 TIME of QNN Graph 39: 7.703ms, End at 498.834
 TIME of CPU Graph 40: 29.654ms, End at 528.518
 TIME of QNN Graph 41: 14.008ms, End at 542.586
 TIME of CPU Graph 42: 18.45ms, End at 561.073
 TIME of QNN Graph 43: 9.491ms, End at 570.602
 TIME of CPU Graph 44: 49.157ms, End at 619.796
 TIME of QNN Graph 45: 8.052ms, End at 627.92
 TIME of CPU Graph 46: 31.725ms, End at 659.717
 TIME of QNN Graph 47: 7.816ms, End at 667.596
 TIME of CPU Graph 48: 14.256ms, End at 681.889
 TIME of QNN Graph 49: 7.61ms, End at 689.542
 TIME of CPU Graph 50: 8.287ms, End at 697.866
 TIME of QNN Graph 51: 7.592ms, End at 705.497
 TIME of CPU Graph 52: 7.894ms, End at 713.428
 TIME of QNN Graph 53: 7.63ms, End at 721.096
 TIME of CPU Graph 54: 6.792ms, End at 727.929
 TIME of QNN Graph 55: 5.968ms, End at 733.945
prefill time: 733.991ms
A large language model is a type of artificial intelligence system that is designed to generate human-like language based on the input it receives These models are typically trained on large amounts of text data, such as books, articles, and other forms of text, and use this data to learn patterns and relationships in the language used in the text The goal of a large language model is to generate human-like language that can be used for a wide range of applications, such as language translation, chatbots, and text
====================
load time: 1874.67 ms
token time: nan ms
inference speed: nan tokens/s
load time: 2679.57 ms
token time: 176.43 ms
inference speed: 5.66798 tokens/s
     0.0ms [WARNING]  <W> sg_stubPtr is not null, skip loadRemoteSymbols


Segmentation fault

shinel013 · 2024-12-30T02:31:06Z

This method only solves the problem that the output of the decode phase is all unk. Based on your log the reason phase has actually been completed, but there is an error in the backend of the qnn, and this error has not been encountered for my environment (it may be related to the SDK used, we use the htp_v79, similar to the error is invalid memory pointer, and there is no solution yet).
What is more curious is that the previous prefill time: 4896.12ms on your side, is now prefill time: 733.991ms, what changes have been made to optimize it?

NorthFaceGoose closed this as completed Dec 28, 2024

NorthFaceGoose reopened this Dec 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault #216

Segmentation fault #216

NorthFaceGoose commented Dec 26, 2024

shinel013 commented Dec 27, 2024 •

edited

Loading

NorthFaceGoose commented Dec 28, 2024 •

edited

Loading

shinel013 commented Dec 30, 2024

Segmentation fault #216

Segmentation fault #216

Comments

NorthFaceGoose commented Dec 26, 2024

shinel013 commented Dec 27, 2024 • edited Loading

NorthFaceGoose commented Dec 28, 2024 • edited Loading

shinel013 commented Dec 30, 2024

shinel013 commented Dec 27, 2024 •

edited

Loading

NorthFaceGoose commented Dec 28, 2024 •

edited

Loading