[Bug]: Streaming Structured Output is not working compared to native OpenAI SDK #7374

hem210 · 2024-12-23T07:19:58Z

What happened?

When I use OpenAI's Structured Output with a Pydantic object, the litellm.completion endpoint works completely fine. But when I try to stream it with the below given code, the endpoint streams empty objects. I have used Azure OpenAI in my implementation.
Code to replicate:

def generate_structured_output_litellm(prompt):
    try:
        response = completion(
            model="azure/" + AZURE_DEPLOYMENT["deployment"],
            api_key=AZURE_DEPLOYMENT["api_key"],
            api_version=AZURE_DEPLOYMENT["api_version"],
            api_base=AZURE_DEPLOYMENT["endpoint"],
            messages=[
                {"role": "system", "content": "You are a helpful assistant who responds in json structured responses."},
                {"role": "user", "content": prompt},
            ],
            temperature=0.7,
            response_format=AnyPydanticModel,
            stream=True
        )

        for event in response:
            print(event)
    except Exception as e:
        print(f"Error generating structured output: {e}")
        return None

In the above code if I remove the line stream=True then it works fine in delivering the structured response. I have added the console logs (where I am logging the event chunk) below in the section "Relevant log output". Just to point out the bug, the content field is remaining None.

The reason for this bug could be because of the way OpenAI is currently supporting streaming of Structured Output. Here is a reference code (original doc) which OpenAI provides for streaming:

# truncated imports and initial setup
with client.beta.chat.completions.stream(
  model="gpt-4o",
  messages=[
      {"role": "system", "content": "Extract entities from the input text"},
      {
          "role": "user",
          "content": "The quick brown fox jumps over the lazy dog with piercing blue eyes",
      },
  ],
  response_format=EntitiesModel,
) as stream:
  for event in stream:
      if event.type == "content.delta":
          if event.parsed is not None:
              # Print the parsed data as JSON
              print("content.delta parsed:", event.parsed)
      elif event.type == "content.done":
          print("content.done")
      elif event.type == "error":
          print("Error in stream:", event.error)

final_completion = stream.get_final_completion()
print("Final completion:", final_completion)

The above implementation using OpenAI's SDK streams the following chunk objects in response.

ChunkEvent(type='chunk', chunk=ChatCompletionChunk(id='chatcmpl-AhW', choices=[Choice(delta=ChoiceDelta(content='chunk streamed', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}})], created=1734937121, model='gpt-4o-mini-2024-07-18', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_51', usage=None), snapshot=ParsedChatCompletion[object](id='', choices=[ParsedChoice[object](finish_reason=None, index=0, logprobs=None, message=ParsedChatCompletionMessage[object](content='raw streaming with incomplete JSON object in it.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None, parsed='JSONObjectRemovingUncompletePortion'), content_filter_results={})], created=0, model='', object='chat.completion', service_tier=None, system_fingerprint='fp_51', usage=None, prompt_filter_results=[{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'jailbreak': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}]))

# the below given object is accessed to view chunked structured output
ContentDeltaEvent(type='content.delta', delta='is', snapshot='raw streaming with incomplete JSON object in it.', parsed='JSONObjectRemovingUncompletePortion')

I would request the LiteLLM folks to please resolve this issue soon. If I am missing something or this is solvable using some other existing approach, please guide.

Relevant log output

ModelResponse(id='chatcmpl-AhW', created=1734936723, model='gpt-4o-mini-2024-07-18', object='chat.completion.chunk', system_fingerprint='fp_04', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(refusal=None, content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionDeltaToolCall(id=None, function=Function(arguments=']}', name=None), type='function', index=0)], audio=None), logprobs=None)])
ModelResponse(id='chatcmpl-AhW', created=1734936723, model='gpt-4o-mini-2024-07-18', object='chat.completion.chunk', system_fingerprint='fp_04', choices=[StreamingChoices(finish_reason='stop', index=0, delta=Delta(refusal=None, content=None, role=None, function_call=None, tool_calls=None, audio=None), logprobs=None)])

Are you a ML Ops Team?

Yes

What LiteLLM version are you on ?

v1.55.9

Twitter / LinkedIn details

No response

The text was updated successfully, but these errors were encountered:

hem210 added the bug Something isn't working label Dec 23, 2024

github-actions bot added the mlops user request label Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Streaming Structured Output is not working compared to native OpenAI SDK #7374

[Bug]: Streaming Structured Output is not working compared to native OpenAI SDK #7374

hem210 commented Dec 23, 2024

[Bug]: Streaming Structured Output is not working compared to native OpenAI SDK #7374

[Bug]: Streaming Structured Output is not working compared to native OpenAI SDK #7374

Comments

hem210 commented Dec 23, 2024

What happened?

Relevant log output

Are you a ML Ops Team?

What LiteLLM version are you on ?

Twitter / LinkedIn details