Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Streaming Structured Output is not working compared to native OpenAI SDK #7374

Open
hem210 opened this issue Dec 23, 2024 · 0 comments
Labels
bug Something isn't working mlops user request

Comments

@hem210
Copy link

hem210 commented Dec 23, 2024

What happened?

When I use OpenAI's Structured Output with a Pydantic object, the litellm.completion endpoint works completely fine. But when I try to stream it with the below given code, the endpoint streams empty objects. I have used Azure OpenAI in my implementation.
Code to replicate:

def generate_structured_output_litellm(prompt):
    try:
        response = completion(
            model="azure/" + AZURE_DEPLOYMENT["deployment"],
            api_key=AZURE_DEPLOYMENT["api_key"],
            api_version=AZURE_DEPLOYMENT["api_version"],
            api_base=AZURE_DEPLOYMENT["endpoint"],
            messages=[
                {"role": "system", "content": "You are a helpful assistant who responds in json structured responses."},
                {"role": "user", "content": prompt},
            ],
            temperature=0.7,
            response_format=AnyPydanticModel,
            stream=True
        )

        for event in response:
            print(event)
    except Exception as e:
        print(f"Error generating structured output: {e}")
        return None

In the above code if I remove the line stream=True then it works fine in delivering the structured response. I have added the console logs (where I am logging the event chunk) below in the section "Relevant log output". Just to point out the bug, the content field is remaining None.

The reason for this bug could be because of the way OpenAI is currently supporting streaming of Structured Output. Here is a reference code (original doc) which OpenAI provides for streaming:

# truncated imports and initial setup
with client.beta.chat.completions.stream(
  model="gpt-4o",
  messages=[
      {"role": "system", "content": "Extract entities from the input text"},
      {
          "role": "user",
          "content": "The quick brown fox jumps over the lazy dog with piercing blue eyes",
      },
  ],
  response_format=EntitiesModel,
) as stream:
  for event in stream:
      if event.type == "content.delta":
          if event.parsed is not None:
              # Print the parsed data as JSON
              print("content.delta parsed:", event.parsed)
      elif event.type == "content.done":
          print("content.done")
      elif event.type == "error":
          print("Error in stream:", event.error)

final_completion = stream.get_final_completion()
print("Final completion:", final_completion)

The above implementation using OpenAI's SDK streams the following chunk objects in response.

ChunkEvent(type='chunk', chunk=ChatCompletionChunk(id='chatcmpl-AhW', choices=[Choice(delta=ChoiceDelta(content='chunk streamed', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}})], created=1734937121, model='gpt-4o-mini-2024-07-18', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_51', usage=None), snapshot=ParsedChatCompletion[object](id='', choices=[ParsedChoice[object](finish_reason=None, index=0, logprobs=None, message=ParsedChatCompletionMessage[object](content='raw streaming with incomplete JSON object in it.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None, parsed='JSONObjectRemovingUncompletePortion'), content_filter_results={})], created=0, model='', object='chat.completion', service_tier=None, system_fingerprint='fp_51', usage=None, prompt_filter_results=[{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'jailbreak': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}]))

# the below given object is accessed to view chunked structured output
ContentDeltaEvent(type='content.delta', delta='is', snapshot='raw streaming with incomplete JSON object in it.', parsed='JSONObjectRemovingUncompletePortion')

I would request the LiteLLM folks to please resolve this issue soon. If I am missing something or this is solvable using some other existing approach, please guide.

Relevant log output

ModelResponse(id='chatcmpl-AhW', created=1734936723, model='gpt-4o-mini-2024-07-18', object='chat.completion.chunk', system_fingerprint='fp_04', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(refusal=None, content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionDeltaToolCall(id=None, function=Function(arguments=']}', name=None), type='function', index=0)], audio=None), logprobs=None)])
ModelResponse(id='chatcmpl-AhW', created=1734936723, model='gpt-4o-mini-2024-07-18', object='chat.completion.chunk', system_fingerprint='fp_04', choices=[StreamingChoices(finish_reason='stop', index=0, delta=Delta(refusal=None, content=None, role=None, function_call=None, tool_calls=None, audio=None), logprobs=None)])

Are you a ML Ops Team?

Yes

What LiteLLM version are you on ?

v1.55.9

Twitter / LinkedIn details

No response

@hem210 hem210 added the bug Something isn't working label Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working mlops user request
Projects
None yet
Development

No branches or pull requests

1 participant