-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Presence stops working after ~1 week #179
Comments
Can you provide more information about your setup? Approximate scale as far as number of simultaneous users/presence entries, etc? As you're hitting a system limit error, something is out of bounds, possibly the number of entries we build in our matchspec, but we've never seen this before so more information would be helpful. Thanks! |
The error message says:
How many nodes are you running? My guess is that the nodes are never marked as permanently down, so the list only grows, until the match specification grows too long. There is a |
This is a fly.io cluster in 8 regions, with ~2k concurrent users. There is typically ~10 machines in the cluster, and they stop automatically when there's nobody connected to them for 15 minutes, and start again when there's demand. I did not change |
It might be possible to avoid building massive match specs for very large sets of values leveraging map = :maps.from_keys(values_to_fiter_out, [])
[{{{key, :_, :"$1"}, :"$2", {:"$3", :_}}, [{:not, {:is_map_key, :"$3", map}}], [{:"$1", :"$2"}]}] This should hopefully avoid this crash (separate issue is why the list grows so big in the first place). There's also a need to asses performance of this solution vs what's used right now. |
Note: on fly I think node names are chosen randomly during startup, so I think Jose's guess is correct |
Environment
ubuntu:jammy-20230126
Actual behavior
After a long running period, Presence stops working:
It fails here:
Restarting the instances of the cluster fixes the issue and it comes back after a week or two.
Expected behavior
It should not crash.
The text was updated successfully, but these errors were encountered: