long-running code on cloud services #1

cboettig · 2017-09-02T17:20:10Z

Thanks for your comments in cloudyr/cloudyr.github.io#16 (comment)! Just trying to wrap my head around the approach you have here wrt to the long-running code issue. To expand on this: I often have some cpu or memory intensive code I just want to run the largest available cloud instance. Usually the code will take a few hours to a few days to run, and of course I want the instance to shut down as soon as the job is done so I'm not paying $2-$3 / hr for resources I'm not using.

The README example seems to document a case of something you want to be up persistently? Or is the google compute resource spun up and shut down after each successive cron iteration?

Figuring out how to get machines to kill themselves when done seems tricky, so I usually rely on a tiny 'master' instance which is running the script that spins up the big machine and waits patiently for it to finish, and then shuts it down if it finishes successfully. (e.g. http://www.carlboettiger.info/2015/12/17/docker-workflows.html) But maybe that's happening automatically somehow with Google's appengine project here? Haven't really groked what all the pieces are doing.

MarkEdmondson1234 · 2017-09-02T19:16:57Z

Hi @cboettig ! I'm a great admirer of all the Rocker stuff, it helps me everyday.

The README example seems to document a case of something you want to be up persistently? Or is the google compute resource spun up and shut down after each successive cron iteration?

App Engine spins up and down based on how much CPU is being used, so once the job finishes it should shut down and not charge further. A cron job is just a scheduled request to a URL.

In an R API case, it spawns a new instance for each request to the URL which you can configure - say once the underlying instance hits 50% of CPU load. Feasibly this means you can scale up and down as needed. I guess the load balancer is always up and running but you don't pay for that, just the CPU resources.

I'm in the process of testing this to see how it compares cost wise to my current setup, moving some of my existing workflows to this more serverless philosophy.

My existing setups are more like what you describe in your post, with a master VM setting off slave VMs (described here). In those, I rely on the scripts themselves calling the stop signal via googleComputeEngineR::gce_vm_stop() at the end, and have one VM per script.

cboettig · 2017-09-02T20:12:07Z

App Engine spins up and down based on how much CPU is being used, so once the job finishes it should shut down and not charge further.

Wow, that's awesome. I should take a closer look at App Engine. I figured there'd always be some cpu use since the kernel is always doing something, but it's a clever idea to just set a threshhold. Not clear how that translates to multi-core, but guessing you can say 'shut off when use drops below 1/ (2n) %', e.g. no core running at > 50% load?

MarkEdmondson1234 · 2017-09-03T10:13:48Z

This is a good place to start https://cloud.google.com/appengine/docs/standard/python/an-overview-of-app-engine

And how it scales in particular: https://cloud.google.com/appengine/docs/standard/python/how-instances-are-managed

Its probably not been on your radar as App Engine only used to work with Python and Java, but with the advent of flexible runtimes that use Docker (e.g. Rocker) any code can use its feature set now, although they don't qualify for the free tier its been cheaper than running a small master cron VM.

If its a long running process over 60 seconds, you would want to set your URL endpoint (via plumber), then trigger it using a task queue where the max timeout is 24 hours. These are what the cron.yaml uses.

MarkEdmondson1234 · 2017-09-03T15:29:52Z

Actually @cboettig I had a closer look and its perhaps not suitable for your use case - for flexible environments the minimum instances you can have is 1, so its not possible to scale from 0 (e.g. no charge) - so you'd need to pay for at least 1 instance running 24/7 which would be around $30, so probably better to use a static VM for that as its cheaper.

cboettig · 2017-09-04T19:02:36Z

@MarkEdmondson1234 Thanks for the follow-up. Yeah, that makes sense, seems like the standard use case is always to scale your app up and down in response to demand, maintaining 100% uptime, rather than on demand computing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

long-running code on cloud services #1

long-running code on cloud services #1

cboettig commented Sep 2, 2017

MarkEdmondson1234 commented Sep 2, 2017 •

edited

Loading

cboettig commented Sep 2, 2017

MarkEdmondson1234 commented Sep 3, 2017 •

edited

Loading

MarkEdmondson1234 commented Sep 3, 2017

cboettig commented Sep 4, 2017

long-running code on cloud services #1

long-running code on cloud services #1

Comments

cboettig commented Sep 2, 2017

MarkEdmondson1234 commented Sep 2, 2017 • edited Loading

cboettig commented Sep 2, 2017

MarkEdmondson1234 commented Sep 3, 2017 • edited Loading

MarkEdmondson1234 commented Sep 3, 2017

cboettig commented Sep 4, 2017

MarkEdmondson1234 commented Sep 2, 2017 •

edited

Loading

MarkEdmondson1234 commented Sep 3, 2017 •

edited

Loading