Skip to content

Latest commit

 

History

History
37 lines (25 loc) · 1.79 KB

model_status.md

File metadata and controls

37 lines (25 loc) · 1.79 KB

Model status

Model IR generation compilation runtime shortfin-SGLang serving Kubernetes cluster
8B-FP16(unsharded) PASS PASS PASS NTD NTD
70B-FP16(unsharded) PASS PASS PASS NTD NTD
405B-FP16(sharded) Prefill-PASS Prefill-PASS Prefill-PASS NTD NTD
8B-Instruct-FP16(unsharded) PASS PASS PASS PASS NTD
70B-Instruct-FP16unsharded) PASS PASS PASS NTD NTD
405B-Instruct-FP16(sharded) NTD NTD NTD NTD NTD

N.B. The weight file for 70B-Instruct was generated using llama.cpp/convert_hf_to_gguf.py through the following command:

 python3 convert_hf_to_gguf.py <path_to_hf_safetensor_files> --outtype f16 --outfile llama_70b_3.1_instruct.gguf

issue with models

Unsharded

Model IR generation Compilation runtime comment
405B-FP16-Prefill PASS PASS FAIL RESOURCE_EXHAUSTED; HIP driver error 'hipErrorOutOfMemory' (2): out of memory
405B-FP16-Decode PASS PASS FAIL RESOURCE_EXHAUSTED; HIP driver error 'hipErrorOutOfMemory' (2): out of memory

Sharded

Model IR generation Compilation runtime Comment
8B-Prefill PASS FAIL FAIL Memory access fault by GPU node-4 (Agent handle: 0x58470a300960) on address 0x7182ec58b000. Reason: Unknown
8B-Decode PASS FAIL FAIL :0:rocdevice.cpp :2984: 2787027630305 us: [pid:688936 tid:0x7dfc4e600640] Callback: Queue 0x7dfbe0300000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29
405B-Decode PASS PASS FAIL Seems input is not correct. INVALID_ARGUMENT; function expected fewer input values; parsing input `@/data/llama3.1/weights/405b/decode_args_bs4_128_stride_32/cs_f16_shard_7.npy