-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incompatible with continuedev chat and code completion #2174
Comments
Can you use wireshark on loopback device to watch communication and can show what happenen on communication beetween client and gpt4all? May this code examples help you: node js
in Browser as fetch
|
Thank you @zwilch . I am quite rusty with wireshark, so I'm going to need some time to debug it adequately this way. Nevertheless, I tried to use Here is what GPT4All spits out:
I wrote before that it worked with curl. It indeed appears to do so, but it's only an appearance: when looking at the exact output, it is very much subpar in the quality we could expect, often outputting gibberish sentences and ending mid-sentence. For comparison, here is what GPT4All outputs when the same model is queried from the GUI:
And here is what ollama outputs with the same model and prompt:
So it seems that it's not just a formatting issue, but the GPT4All OpenAI-like API server does not work respond to queries the same way. It seems that it forgets the default parameters maybe? Because it outputs total gibberish, often stopping mid-sentence. So this issue is not only related to continuedev it seems, it's the whole OpenAI-like API server function that seems to be affected. I am trying to test my hypothesis above that it's because of missing parameters, but for the moment when I try to input the parameters it takes an infinite time to generate. |
Sorry, I haven't read through everything here, but it might be a templates/parameters issue, so: Note that many models don't work all that well if you don't provide them with the expected templates. I don't think these are added automatically to any of the web API endpoints. Also, the parameters can have a big influence, too. What you should try:
|
@cosmic-snow Thank you for your suggestions, and although I will implement them in future tests to improve replicability, this is not a templating/parameters issue, as the model works very fine in GPT4All, and furthermore the issue inside Continue's chat is that it does not output anything, whatever the prompt. (PS: I know how to edit continue config file, I made it work with several models in koboldcpp including the same model I am trying to use in gpt4all -- koboldcpp is also not supported by default in continue and must be manually configured as an OpenAI-like API server) |
Alright then, but are you sure? I'm not all that familiar with the GUI's API server, but I've spent a bit of time with that recently. It's certainly possible that it's not entirely compatible and something that's expected by the continue plugin is not actually returned by the server. That is, it definitely doesn't mimic the OpenAI API in full. However, looking at the output of your previous comment again:
ollama response excerpt:
Note how many more It's entirely possible that this isn't the only issue, though. To get everything to work, I mean. You might also want to run I'll probably have a look at the continue plugin when I have some time. |
I see, I missed this detail. I'll try to debug this further, but this is getting a bit out of my current abilities, I need to train but I'm not sure when I'll have time to do that... But at least your indications are pointing me to the right direction, I'll post further comments if I find how to do that. (NB: I wanted to use HTTP Toolkit but it didn't work, then I tried Wireshark but for some reason I cannot see the exchange, I must be mismanipulating, so what remains is Frida.re -- I think it would be more effective if I could catch and manipulate all the exchanges) |
I tested a few different backends and I think that the issue is that server doesn't support streaming responses and continuedev extension require those. Every backend that worked returned streaming response. There is also a parameter {"messages":[{"role":"user","content":"hello"}],"model":"Llama 3 Instruct","max_tokens":1024,"stream":true} |
it should do streaming: |
The GPT4ALL v3.0.0 client has a "Server Chat" section which correctly shows the response to queries received from VSCode/Continue as they arrive, but I can confirm that when configured as the OP suggests at least, these responses don't make it back into Continue. |
will there be any fix for this? |
Sorry, last time I tried to really look into it I got held up, so I shelved it for a while.
True, the server mode currently doesn't implement streaming responses. If that's a hard requirement, then I guess this is the problem here.
I can't really say what the plans are right now, sorry. Improvements to the server mode are mentioned on the roadmap, however. |
I got it working with a stopgap solution, continuedev/continue#2097. I'll see if I can make changes to gpt4all to support SSE. |
I've added support for SSE response in this PR, #2910, and tested with prod continue.dev version, it seems to be working. |
Awesome @raja-jamwal , thank you so much! I hope this will get merged soon! GPT4All is so much more efficient than other LLM runners such as ollama, I literally cannot run the best models my computer can with other runners. |
Bug Report
I tried to use GPT4All as a local LLM server with an OpenAI-like API for serving as a code copilot via the continue plugin for VSCode.
Unfortunately, whatever I tried, it did not work.
The server is correctly detected and all models are correctly loaded, using continue prerelease. However, when trying to send any message to gpt4all from continue, the response seems to be empty.
However, when I do my own curl query it works, so I don't know how to debug this further.
I have tried with Ollama and Koboldcpp (via the OpenAI-like API, same settings as for GPT4All - of course I changed the ports), and it worked for both flawlessly.
This seems to me to be an incompatibility in the API. Continue is expecting something that GPT4All is not providing or not in the expected format.
Steps to Reproduce
Expected Behavior
Continue should get non-empty responses from GPT4All.
Your Environment
The text was updated successfully, but these errors were encountered: