-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Videbridge needs a long time to fix connection problems to the XMPP server. #2052
Comments
What is the value of net.ipv4.tcp_retries2 in sysconf? |
$ sysctl net.ipv4.tcp_retries2 |
This means that the kernel will detect broken TCP connections after the 13-15 min mark. |
@freym Can you clarify steps to reproduce? I get a "closed on error" log immediately after restarting prosody (and then a successful reconnect). |
It happened when we restarted Prosody once, but we now think the problem is related to our network setup and Kubernetes. However, reducing net.ipv4.tcp_retries2 to 7 has alleviated the problem somewhat. |
In k8s it could also happen if the prosody deployment is restarted and the new pod(s) take a long time to go Ready (eg due to readiness or startup probes); if you're using a service DNS name to reach them from JVB then it won't resolve to the new pod IP(s) until they're Ready. Can also be related to a too-long TTL being served on those service A records (configured in CoreDNS config on most k8s deployments) so JVB still has the old IP for a while. |
The "Ping failed, the XMPP connection needs to reconnect." message you see early on should trigger a reconnect. I'm surprised that in your case it doesn't log anything about trying to reconnect. |
I just tested the case where the server stops responding. As expected the XMPP pings timeout in 30-60 seconds and they trigger a re-connect. I suspect it has something to do with your setup (feel free to reopen if you have more info or more questions). |
Attempts to address jitsi/jitsi-videobridge#2052 by not sending presence unavailable when disconnecting due to a connection failure. If the TCP connection is alive then the server will notice that it's closed. If it isn't alive then it won't matter anyway.
Attempts to address jitsi/jitsi-videobridge#2052 by not sending presence unavailable when disconnecting due to a connection failure. If the TCP connection is alive then the server will notice that it's closed. If it isn't alive then it won't matter anyway.
This Issue tracker is only for reporting bugs and tracking code related issues.
Description
We discovered that the videobridge takes a long time to recover and reconnect if there is an short error in the connection to the XMPP Server (= restart of Prosody, changes in our Network configuration).
Current behavior
Now every five seconds the following exception is printed to the logs:
After ~ 15 minutes the connection is closed and the videobridge successfully establishes a connection. The videobridge immediately establishes a connection when we manually restart the service. It ist also possible to connect to the XMPP Server with telnet after the error happens.
Expected Behavior
The videobridge should be more should be more tolerant of network errors and recover faster.
Environment details
The videobridge runs on a bare metal server (OS is Debian)
Prosody, Jicofo, Web are running on Kubernetes
Videobridge: 2.3-44-g8983b11f-1
Prosody: stable-8719
Jicofo: stable-8719
The text was updated successfully, but these errors were encountered: