Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

long-running code on cloud services #1

Open
cboettig opened this issue Sep 2, 2017 · 5 comments
Open

long-running code on cloud services #1

cboettig opened this issue Sep 2, 2017 · 5 comments

Comments

@cboettig
Copy link

cboettig commented Sep 2, 2017

Hey @MarkEdmondson1234 ,

Thanks for your comments in cloudyr/cloudyr.github.io#16 (comment)! Just trying to wrap my head around the approach you have here wrt to the long-running code issue. To expand on this: I often have some cpu or memory intensive code I just want to run the largest available cloud instance. Usually the code will take a few hours to a few days to run, and of course I want the instance to shut down as soon as the job is done so I'm not paying $2-$3 / hr for resources I'm not using.

The README example seems to document a case of something you want to be up persistently? Or is the google compute resource spun up and shut down after each successive cron iteration?

Figuring out how to get machines to kill themselves when done seems tricky, so I usually rely on a tiny 'master' instance which is running the script that spins up the big machine and waits patiently for it to finish, and then shuts it down if it finishes successfully. (e.g. http://www.carlboettiger.info/2015/12/17/docker-workflows.html) But maybe that's happening automatically somehow with Google's appengine project here? Haven't really groked what all the pieces are doing.

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Sep 2, 2017

Hi @cboettig ! I'm a great admirer of all the Rocker stuff, it helps me everyday.

The README example seems to document a case of something you want to be up persistently? Or is the google compute resource spun up and shut down after each successive cron iteration?

App Engine spins up and down based on how much CPU is being used, so once the job finishes it should shut down and not charge further. A cron job is just a scheduled request to a URL.

In an R API case, it spawns a new instance for each request to the URL which you can configure - say once the underlying instance hits 50% of CPU load. Feasibly this means you can scale up and down as needed. I guess the load balancer is always up and running but you don't pay for that, just the CPU resources.

I'm in the process of testing this to see how it compares cost wise to my current setup, moving some of my existing workflows to this more serverless philosophy.

My existing setups are more like what you describe in your post, with a master VM setting off slave VMs (described here). In those, I rely on the scripts themselves calling the stop signal via googleComputeEngineR::gce_vm_stop() at the end, and have one VM per script.

@cboettig
Copy link
Author

cboettig commented Sep 2, 2017

App Engine spins up and down based on how much CPU is being used, so once the job finishes it should shut down and not charge further.

Wow, that's awesome. I should take a closer look at App Engine. I figured there'd always be some cpu use since the kernel is always doing something, but it's a clever idea to just set a threshhold. Not clear how that translates to multi-core, but guessing you can say 'shut off when use drops below 1/ (2n) %', e.g. no core running at > 50% load?

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Sep 3, 2017

This is a good place to start https://cloud.google.com/appengine/docs/standard/python/an-overview-of-app-engine

And how it scales in particular: https://cloud.google.com/appengine/docs/standard/python/how-instances-are-managed

Its probably not been on your radar as App Engine only used to work with Python and Java, but with the advent of flexible runtimes that use Docker (e.g. Rocker) any code can use its feature set now, although they don't qualify for the free tier its been cheaper than running a small master cron VM.

If its a long running process over 60 seconds, you would want to set your URL endpoint (via plumber), then trigger it using a task queue where the max timeout is 24 hours. These are what the cron.yaml uses.

@MarkEdmondson1234
Copy link
Owner

Actually @cboettig I had a closer look and its perhaps not suitable for your use case - for flexible environments the minimum instances you can have is 1, so its not possible to scale from 0 (e.g. no charge) - so you'd need to pay for at least 1 instance running 24/7 which would be around $30, so probably better to use a static VM for that as its cheaper.

@cboettig
Copy link
Author

cboettig commented Sep 4, 2017

@MarkEdmondson1234 Thanks for the follow-up. Yeah, that makes sense, seems like the standard use case is always to scale your app up and down in response to demand, maintaining 100% uptime, rather than on demand computing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants