Optimize the cache mechanism #23

pitag-ha · 2022-05-10T13:06:06Z

As described in https://github.com/tarides/ocaml-platform/issues/13, our first approach to populating our cache is via a sandboxing mechanism. While that sandboxing approach has many advantages (also described on that issue), it also has at least one disadvantage: it's very slow. So, once our cache mechanism is fully implemented (i.e. at least issues https://github.com/tarides/ocaml-platform/issues/13 and https://github.com/tarides/ocaml-platform/issues/11), we'll need to implement an optimization for that.

Three possible approaches

So far we've been discussing to use pre-built binaries to optimize our caching mechanism. There are different ways how to realize a pre-built binaries optimization. Among others, the following three:

We embed the pre-built binaries into our tool and the first time ocaml-platform is run, it pre-populates our cache with those binaries.
- upside: good integration of the optimazation into the rest of the caching workflow
- upside: no need for internet connection
- upside: straight-forward in terms of architecture / operating systems (when compiling ocaml-platform for a certain architecture and operating system, we embed the correspondent tool binaries only)
- downside: (probably significant) increase in size of the ocaml-platform binary
- downside: bad update story when the platform tools get updated
A similar approach to 1. (we pre-populate our cache first time ocaml-platform is run), but instead of embedding the pre-built binaries into ocaml-platform, we keep them somewhere online on some repo or similar.
- upside: good integration of the optimazation into the rest of the caching workflow
- downside: need for internet connection
- upside: we need to take care of architecture and operating system compatibility
- upside: no increase in size of the ocaml-platform binary
- neutral: we could implement an automatic update story when the platform tools get updated
In addition to our local cache, we add another cache in the following way: we have an online cache somewhere with the pre-built binaries and ocaml-platform first has a look at the online cache and installs all tools found there; then it has a look at the local cache and does the rest (one possible way to implement that approach would be by writing an online opam repo with the meta-information for all pre-built tools and adding that opam repo to the users opam state. that would be similar to option 1. in https://github.com/tarides/ocaml-platform/issues/11 but remote).
- downside: instead of integrating the optimazation into the rest of the caching workflow, we'd add a second cache
- downside: need for internet connection
- neutral: we need to take care of architecture and operating system compatibility (which will probably be simple if we follow the opam repo approach)
- upside: no increase in size of the ocaml-platform binary
- upside: without doing anything (from the installer point of view I mean. of course, the CI has to work for the update story), we have a good update story when the platform tools get updated

Have I missed any upsides or downsides? And/or are there more approaches anyone would like to discuss?

(btw, if anyone feels down for making a table out of this bad bullet point overview, don't hesitate! :))

The text was updated successfully, but these errors were encountered:

Julow · 2022-05-10T17:04:50Z

upside: no need for internet connection
downside: need for internet connection

I'm not sure this is relevant. At the point of installing a tool we already have created a switch (requires internet) and the step just after is to install the dependencies of the project (requires internet).
Even if we don't do that, the user used internet to download the installer and will use internet soon after to download some libraries.

For 1.:

downside: (probably significant) increase in size of the ocaml-platform binary

On my machine, Merlin is 12Mb, Odoc is 16Mb and OCamlformat is 24Mb. Let's say we want to build Merlin on 5 different versions of ocaml, on 5 different architectures, that's 300Mb for a single tool.
The installer, which is to be downloaded at once, would be bigger than a switch, which is gradually filled with results of local compilations.
It's an understatement to say that the update story is bad.

For 2.: You talk about a remote repo but that's actually the point 3. Otherwise, point 2. is exactly #11

upside: we need to take care of architecture and operating system compatibility

I guess you meant "downside" ? We'll need to differentiate several variants of the same package anyway (eg. with which version of OCaml is has been built) so it's not much more work on top of that.

panglesd · 2022-05-27T14:18:23Z

For the first alpha release, the cache is a local repo that is populated whenever a new package is built. So there is no "pre-populated" cache.

The next stage, I think, would be to add an online repo containing pre-built static binary.

pitag-ha · 2022-05-31T13:49:23Z

For 2.: You talk about a remote repo but that's actually the point 3. Otherwise, point 2. is exactly https://github.com/tarides/ocaml-platform/issues/11

I think I haven't explained very well what I meant in point 2. But unless you want me to explain it in more detail (in that case, please let me know!), I'd say let's just go for point 3, which I think we all prefer by now.

pitag-ha · 2022-05-31T13:50:31Z

The next stage, I think, would be to add an online repo containing pre-built static binary.

Yes, that's point 3, right? I also think that's best (mostly because of the very good update story).

Julow mentioned this issue May 10, 2022

Use a cache when installing the tools #11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the cache mechanism #23

Optimize the cache mechanism #23

pitag-ha commented May 10, 2022

Julow commented May 10, 2022

panglesd commented May 27, 2022

pitag-ha commented May 31, 2022

pitag-ha commented May 31, 2022

Optimize the cache mechanism #23

Optimize the cache mechanism #23

Comments

pitag-ha commented May 10, 2022

Three possible approaches

Julow commented May 10, 2022

panglesd commented May 27, 2022

pitag-ha commented May 31, 2022

pitag-ha commented May 31, 2022