Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the cache mechanism #23

Open
pitag-ha opened this issue May 10, 2022 · 4 comments
Open

Optimize the cache mechanism #23

pitag-ha opened this issue May 10, 2022 · 4 comments

Comments

@pitag-ha
Copy link
Member

As described in https://github.com/tarides/ocaml-platform/issues/13, our first approach to populating our cache is via a sandboxing mechanism. While that sandboxing approach has many advantages (also described on that issue), it also has at least one disadvantage: it's very slow. So, once our cache mechanism is fully implemented (i.e. at least issues https://github.com/tarides/ocaml-platform/issues/13 and https://github.com/tarides/ocaml-platform/issues/11), we'll need to implement an optimization for that.

Three possible approaches

So far we've been discussing to use pre-built binaries to optimize our caching mechanism. There are different ways how to realize a pre-built binaries optimization. Among others, the following three:

  1. We embed the pre-built binaries into our tool and the first time ocaml-platform is run, it pre-populates our cache with those binaries.

    • upside: good integration of the optimazation into the rest of the caching workflow
    • upside: no need for internet connection
    • upside: straight-forward in terms of architecture / operating systems (when compiling ocaml-platform for a certain architecture and operating system, we embed the correspondent tool binaries only)
    • downside: (probably significant) increase in size of the ocaml-platform binary
    • downside: bad update story when the platform tools get updated
  2. A similar approach to 1. (we pre-populate our cache first time ocaml-platform is run), but instead of embedding the pre-built binaries into ocaml-platform, we keep them somewhere online on some repo or similar.

    • upside: good integration of the optimazation into the rest of the caching workflow
    • downside: need for internet connection
    • upside: we need to take care of architecture and operating system compatibility
    • upside: no increase in size of the ocaml-platform binary
    • neutral: we could implement an automatic update story when the platform tools get updated
  3. In addition to our local cache, we add another cache in the following way: we have an online cache somewhere with the pre-built binaries and ocaml-platform first has a look at the online cache and installs all tools found there; then it has a look at the local cache and does the rest (one possible way to implement that approach would be by writing an online opam repo with the meta-information for all pre-built tools and adding that opam repo to the users opam state. that would be similar to option 1. in https://github.com/tarides/ocaml-platform/issues/11 but remote).

    • downside: instead of integrating the optimazation into the rest of the caching workflow, we'd add a second cache
    • downside: need for internet connection
    • neutral: we need to take care of architecture and operating system compatibility (which will probably be simple if we follow the opam repo approach)
    • upside: no increase in size of the ocaml-platform binary
    • upside: without doing anything (from the installer point of view I mean. of course, the CI has to work for the update story), we have a good update story when the platform tools get updated

Have I missed any upsides or downsides? And/or are there more approaches anyone would like to discuss?

(btw, if anyone feels down for making a table out of this bad bullet point overview, don't hesitate! :))

@Julow
Copy link
Member

Julow commented May 10, 2022

upside: no need for internet connection
downside: need for internet connection

I'm not sure this is relevant. At the point of installing a tool we already have created a switch (requires internet) and the step just after is to install the dependencies of the project (requires internet).
Even if we don't do that, the user used internet to download the installer and will use internet soon after to download some libraries.

For 1.:

downside: (probably significant) increase in size of the ocaml-platform binary

On my machine, Merlin is 12Mb, Odoc is 16Mb and OCamlformat is 24Mb. Let's say we want to build Merlin on 5 different versions of ocaml, on 5 different architectures, that's 300Mb for a single tool.
The installer, which is to be downloaded at once, would be bigger than a switch, which is gradually filled with results of local compilations.
It's an understatement to say that the update story is bad.

For 2.: You talk about a remote repo but that's actually the point 3. Otherwise, point 2. is exactly #11

upside: we need to take care of architecture and operating system compatibility

I guess you meant "downside" ? We'll need to differentiate several variants of the same package anyway (eg. with which version of OCaml is has been built) so it's not much more work on top of that.

@panglesd
Copy link
Contributor

For the first alpha release, the cache is a local repo that is populated whenever a new package is built. So there is no "pre-populated" cache.

The next stage, I think, would be to add an online repo containing pre-built static binary.

@pitag-ha
Copy link
Member Author

For 2.: You talk about a remote repo but that's actually the point 3. Otherwise, point 2. is exactly https://github.com/tarides/ocaml-platform/issues/11

I think I haven't explained very well what I meant in point 2. But unless you want me to explain it in more detail (in that case, please let me know!), I'd say let's just go for point 3, which I think we all prefer by now.

@pitag-ha
Copy link
Member Author

The next stage, I think, would be to add an online repo containing pre-built static binary.

Yes, that's point 3, right? I also think that's best (mostly because of the very good update story).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants