Inference freezes when running llama example with pp>2 #1118

JamesLYan · 2024-05-28T05:09:59Z

Hi,
I am trying to run the example script provided for llama model for inference only. Since the repository is going through migration and a lot of changes, I went back and install the stable v0.2.0 version. Everything works fine until I started trying to run the example script using cpu-initialization on more than 2 pipeline stages. I am currently running on a server with 8 gpus of Nvidia L4. For pp = 2 it works perfectly, but as soon as I run the same script with pp more than 2, after the model is initialized, all the other gpus have 0 utilization according to nvidia-smi output, and the gpu ranked 1 will have 100% util, yet the entire inference process freezes. Has anyone seeing similar issues? Or perhaps there are some quick fix I can try?

NVCC and Cuda Verison: 12.1.
torch version: 2.4.0.dev20240521+cu118.

The text was updated successfully, but these errors were encountered:

kwen2501 · 2024-05-28T17:07:06Z

Indeed, we are migrating pippy into pytorch, see:
https://github.com/pytorch/pytorch/tree/main/torch/distributed/pipelining

Does the script work for pp > 2 but without cpu-init?

JamesLYan · 2024-05-29T02:44:20Z

Indeed, we are migrating pippy into pytorch, see: https://github.com/pytorch/pytorch/tree/main/torch/distributed/pipelining

Does the script work for pp > 2 but without cpu-init?

Unfortunately with the Llama2-7b-hf model , if I set pp>2 without cpu-init, then it would go cuda OOM on all the devices.

JamesLYan · 2024-06-04T06:04:38Z

I tried downgrading torch to stable 2.3.0 and the same problem occurs. The example script that I am running is /examples/llama/pippy_llama.py. Since this could be a problem with Pippy v0.2.0, I will try later with a different Pippy version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference freezes when running llama example with pp>2 #1118

Inference freezes when running llama example with pp>2 #1118

JamesLYan commented May 28, 2024

kwen2501 commented May 28, 2024

JamesLYan commented May 29, 2024

JamesLYan commented Jun 4, 2024

Inference freezes when running llama example with pp>2 #1118

Inference freezes when running llama example with pp>2 #1118

Comments

JamesLYan commented May 28, 2024

kwen2501 commented May 28, 2024

JamesLYan commented May 29, 2024

JamesLYan commented Jun 4, 2024