Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow LiveViews to be adopted #3551

Open
josevalim opened this issue Dec 4, 2024 · 5 comments
Open

Allow LiveViews to be adopted #3551

josevalim opened this issue Dec 4, 2024 · 5 comments

Comments

@josevalim
Copy link
Member

One of the issues with LiveView is the double render when going from dead render to live render and the fact we lose all state on disconnection.

This issue proposes for us to render a LiveView (a Phoenix.Channel really) upfront and then it gets "adopted" when necessary. In a nutshell:

  • On disconnect, we keep the LiveView alive for X seconds. Then on reconnect, we reestablish the connection back to the same LiveView, and we just send the latest diff

  • On dead render, we already spawn the LiveView, keep it alive until the WebSocket connection arrives and "adopts" the LiveView, so we just need to send the latest diff

However, this has some issues:

  • If the new connection happens on the same node, it is perfect. However, if it happens on a separate node, then we can either do cluster round-trips on every payload, copy only some assigns (from assigns_new) or build a new LiveView altogether (and discard the old one).

  • This solution means we will keep state around on the server for X seconds. This could perhaps be abused for DDoS attacks or similar. It may be safer to enable this only on certain pages (for example, where authentication is required) or keep the timeout short on public pages (e.g. 5 seconds instead of 30).

On the other hand, this solution should be strictly better than a cache layer for a single tab: there is zero copying and smaller payloads are sent on both connected render and reconnects. However, keep in mind this is not a cache, so it doesn't share across tabs (and luckily it does not introduce any of the caching issues, such as unbound memory usage, cache key management, etc).

There are a few challenges to implement this:

  • We need to add functionality for adoption first in Phoenix.Channel

  • We need to make sure that an orphan LiveView will submit the correct patch once it connects back. It may be we cannot squash patches on the server. We would need to queue them, which can introduce other issues

  • We may need an opt-in API

While this was extracted from #3482, this solution is completely orthogonal to the one outlined there, as live_navigation is about two different LiveViews.

@elliottneilclark
Copy link
Contributor

elliottneilclark commented Dec 4, 2024

After dead view renders the http response we have all of the assigns. If we spend the cycles to start a LV process off the critical path of sending the response then the cost of extra work is not user facing. So when the websocket connection comes in we already have the LiveView started and the assigns are cached. This would trade a bit of memory and non-critical CPU usage for faster websocket connect.

However we don't want to leak memory forever, and it's possible that the websocket never comes. So there would need to be some eviction system in place (time, memory, etc)

I don't know the Erlang VM well enough, but is it possible to convince the VM to move ownership of structs (specifically assigns) rather than copying if there are no live references? That would be another way to make spinning up LV processes even cheaper. Rather than copying assigns to a new LV process, since we are using them after we have the bytes of the response ready, we can drop all references.

@josevalim
Copy link
Member Author

@elliottneilclark there are some tricks we could do:

  • On HTTP 1, we send connection close but we keep the process around to be adopted as a LiveView later. No copying necessary. This may require changes to the underlying webservers.

  • On HTTP 2, each request is a separate process, so we can just ask for it to stick around.

Outside of that, we do need to copy it, the VM cannot transfer it (RefCounting is only for large binaries). But we can spawn the process relatively early on. For example, we spawn the process immediately after the router, so none of the data mounted in the LiveView needs to be copied, only the assigns set in the plug pipeline that are accessed by the LiveView are copied (using a similar optimization as live_navigation).

@elliottneilclark
Copy link
Contributor

Outside of that, we do need to copy it, the VM cannot transfer it (RefCounting is only for large binaries).

That makes sense; large binaries are equivalent of huge objects in JVM with different accounting.

@simoncocking
Copy link

it's possible that the websocket never comes

Our experience is that this happens only on public pages which are exposed to bots / search engines / other automatons. So in our situation:

It may be safer to enable this only on certain pages (for example, where authentication is required)

this is exactly what we'd do.

If the new connection happens on the same node, it is perfect. However, if it happens on a separate node, then we can either do cluster round-trips on every payload, copy only some assigns (from assigns_new) or build a new LiveView altogether (and discard the old one).

We have some LiveViews that do some pretty heavy lifting on connected mount, so we'd need some way to guarantee that this work wouldn't be repeated if the LV was spawned on a different node to that which receives the WebSocket connection.

@josevalim
Copy link
Member Author

josevalim commented Dec 5, 2024

I just realized that the reconnection approach has some complications. If the client crashes, LiveView doesn't know if the client has received the last message or not. So in order for reconnections to work, we would need to change LiveView server to keep a copy of all responses and only delete them when the client acknowledges it. This will definitely make the protocol chattier and perhaps affect the memory profile on the server. So for reconnection, we may want to spawn a new LiveView anyway, and then transfer the assigns, similar to push_navigate.

This goes back to the previous argument that it may be necessary to provide different solutions for each problem, if we want to maximize their efficiency.

We have some LiveViews that do some pretty heavy lifting on connected mount, so we'd need some way to guarantee that this work wouldn't be repeated if the LV was spawned on a different node to that which receives the WebSocket connection.

This is trivial to do if they are in the same node, it is a little bit trickier for distinct nodes. For distinct nodes, you would probably need to opt-in and say that a LiveView state is transferrable, which basically says that you don't rely on local ETS or resources (such as dataframes) in your LiveView state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants