What is the simplest way to run inference/ inference endopint with Fin-GPT? #133

datainvestor · 2023-12-04T10:16:05Z

datainvestor
Dec 4, 2023

I want to try out the fingpt pre-trained model with some of my prompts. I want to do this as easy as possible and cost-efficient as well.
First I tried running this script in jupyter-notebook deployed on my VM, but its crashing saying that its running out of GPU memory (16GB)

from transformers import AutoModel, AutoTokenizer, AutoModelForCausalLM, LlamaForCausalLM, LlamaTokenizerFast
from peft import PeftModel  # 0.5.0

# Load Models
base_model = "NousResearch/Llama-2-13b-hf"
peft_model = "FinGPT/fingpt-sentiment_llama2-13b_lora"
tokenizer = LlamaTokenizerFast.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
model = LlamaForCausalLM.from_pretrained(base_model, trust_remote_code=True, device_map = "cuda:0", load_in_8bit = True,)
model = PeftModel.from_pretrained(model, peft_model)
model = model.eval()


# Make prompts
prompt = [
'''MY PROMPT''',
]
# Generate results
tokens = tokenizer(prompt, return_tensors='pt', padding=True, max_length=512, return_token_type_ids=False)

res = model.generate(**tokens, max_length=512)
res_sentences = [tokenizer.decode(i) for i in res]
out_text = [o.split("Answer: ")[1] for o in res_sentences]

# show results
for sentiment in out_text:
    print(sentiment)

So then I tried what I found on huggingface:

from transformers import pipeline

model_name_or_path = "FinGPT/fingpt-mt_llama2-7b_lora"
pipe = pipeline("text-generation", model=model_name_or_path)
print(pipe("This movie was really")[0]["generated_text"])

But its taking quite long to load and my VM died before finishing so not sure if this is correct approach.

I was also thinking about just creating inference endpoint api which I can access like for exampl this Mistral model:
https://docs.mistral.ai/cloud-deployment/skypilot

But I am not sure how this is achievable with pretrained models like Fin GPT where you need to load base model first.

Could someone provide some explanation or recommendations how to achieve this? Basically I just want to run either simple script with my prompt using fin-gpt model or even better run an API like gpt-4 OpenAPI or similar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the simplest way to run inference/ inference endopint with Fin-GPT? #133

{{title}}

Replies: 0 comments

Select a reply

What is the simplest way to run inference/ inference endopint with Fin-GPT? #133

datainvestor Dec 4, 2023

Replies: 0 comments

datainvestor
Dec 4, 2023