Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Catch token count issue while streaming with customized models
If llama, llava, phi, or some other models are used for streaming (with stream=True), the current design would crash after fetching the response. A warning is enough in this case, just like the non-streaming use cases.
- Loading branch information