Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smee.io status #137

Open
tcbyrd opened this issue Mar 16, 2023 · 11 comments
Open

smee.io status #137

tcbyrd opened this issue Mar 16, 2023 · 11 comments
Assignees

Comments

@tcbyrd
Copy link
Contributor

tcbyrd commented Mar 16, 2023

As has been pointed out in a few issues (#135, #125, #122), smee.io has become unstable because it's getting way more traffic than it was intended to handle for development purposes. I've scaled it up vertically as much as I can, but the service is no longer stable in its current form. We've attempted to implement mechanisms to prevent people from abusing the service, but it's become a game of whack-a-mole and implementing proper rate limiting and per-channel auth is beyond the scope of what we can do in a day or two to get things back online.

I have a few ideas for next steps, and if there's demand I can try to spin up a new and improved service on smee.io, but in the meantime, it will be offline until this latest wave of traffic dies down and the service can be brought back up reliably.

@gsmet
Copy link

gsmet commented Mar 17, 2023

@tcbyrd I just wanted to tell you that smee.io has been extremely handy for testing our GitHub Apps so thanks for taking care of it!

@gsmet
Copy link

gsmet commented Mar 17, 2023

@tcbyrd just a wild idea but given smee.io was built primarily to support the development of GitHub Apps, could you let pass only the requests corresponding to GitHub payloads for the public instance?
That would make the service useless for other types of usage and might help prevent abuses.

@Aldekein
Copy link

It was cool to have it, thanks!

@tcbyrd
Copy link
Contributor Author

tcbyrd commented Mar 18, 2023

just a wild idea but given smee.io was built primarily to support the development of GitHub Apps, could you let pass only the requests corresponding to GitHub payloads for the public instance?

It's still got the machine it's on pegged to 85-90% of CPU usage just serving the static page that's up there now. The inherent problem is the SSE that's required to make it work doesn't play nice with proxies that can block traffic before it gets to the app. I tried putting Cloudflare and Azure Front Door in front of it, but that caused other problems. My current thought is to move to generating channels on subdomains instead of paths, that way I can put a proper proxy in front of it and drop traffic before it makes it to the app. There's a few ways I can tackle that, but definitely open to suggestions.

@gsmet
Copy link

gsmet commented Mar 18, 2023

Seeing that from far away and you probably have thought about all that already but you have two sources of traffic:

  1. the webhooks calls coming from GitHub
  2. the SSE connections

For 1. you cannot really filter them easily but it might be a good idea to gather some stats: I wouldn't be surprised if you had a few channels representing 80% of the traffic. And also that a lot of channels are not actually used anymore. I'm wondering if there could be a way to just drop the traffic for channels that haven't been used for a while and that people would need to enable them again.

As for 2., I would say that anything that still tries to connect now is probably not someone actually testing an app and maybe you could start filtering at the IP level the ones generating the most load - at least until the situation is stabilized.

Your proposal to switch to separate domains is a good idea, I think. Probably more flexible to start with and you could also drop the traffic of anything coming to the existing channels and ask people to update their channel URLs in GitHub.

That would allow for a fresh start. And for this fresh start, probably a good idea to drop all events not coming with GitHub headers very early (i.e. have the ability to install a filter here https://github.com/probot/smee.io/blob/master/lib/server.js#L97 and drop everything you don't want) in the public instance so that the public instance is not used as a general purpose proxy for webhooks.

If you want to discuss ideas, feel free to ping me, I'm using smee.io as the testing proxy for my Quarkus GitHub App framework and I would really like to have it back on as it's by far the best proxy solution for this use case so I'm happy to help in any way possible.

GitHub
☁️📦 Webhook payload delivery service. Contribute to probot/smee.io development by creating an account on GitHub.

@skitterm
Copy link

@tcbyrd thanks for Smee. I'm checking if there's a rough estimate of when Smee might be back up and running? I can move to a different service if needed but wanted to see how temporary this outage might be.

@samatcodeapprove
Copy link

@tcbyrd smee has been critical to developing my GitHub app, thanks for everything so far and hopefully you can get it back up!

For others here: does anyone have a list of good drop-in replacements?

@tcbyrd
Copy link
Contributor Author

tcbyrd commented Mar 25, 2023

Hey all. I'm putting some time into this over the weekend. Things haven't changed much, and even after a week of it being offline it's still receiving a ton of traffic across hundreds of channels that aren't even returning content at this point.

the webhooks calls coming from GitHub

I've verified this isn't the source of any problems. The overwhelming majority of the CPU load comes from Accept: text/event-stream requests to the GET /:channel endpoint, which is all the "headless" clients using it consistently as a proxy for a backend service on their network. One thought here is I could just introduce a TTL on a channel so you simply can't expect to use it longer than, say, 24 hours or so. I don't want it to be a burden to create a new channel and update your GitHub App settings too often, but at the same time, having a channel be persistent forever is sending the wrong message that you can "set it and forget it" 🧑‍🍳

rough estimate of when Smee might be back up and running?

I'll try and have it back online in some form by Monday. What that looks like right now is unclear, but I'll experiment a bit and update this issue.

@tcbyrd
Copy link
Contributor Author

tcbyrd commented Mar 25, 2023

I found some specific IPs I could block that had thousands of open connections, so I took it out of maintenance mode for now. I'm going to separately work on some rate limiting features so this can be less manual in the future.

@HsiangNianian
Copy link

great of that,i'm looking forward to waiting for the news.

@gsmet
Copy link

gsmet commented Mar 25, 2023

Thanks @tcbyrd !

FWIW, I just wanted to mention that the URL being stable is extremely useful as you don't have to change the URL in the GitHub UI every time you go back to your GitHub App development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants