Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Dockerfile and add some dependencies #44

Open
wants to merge 7 commits into
base: 7.1-stretch
Choose a base branch
from

Conversation

andrewnicols
Copy link
Contributor

@andrewnicols andrewnicols commented Feb 26, 2019

This change is primarily about:

  1. adding NVM and NodeJS to the image
  2. adding Git to the image

During my changes I discovered some issues relating to our Dockerfile, and some other limitations. Notably:

  1. We shouldn't be creating the directories in the Dockerfile really - they're more a start-up thing
  2. We were adding /root/ to / but this was causing cache invalidation. Even a small change after this point required everything to be rebuilt.
  3. There were some caches not being emptied after extension installations.
  4. It would be super helpful to provide developers a way of adding further customisations on boot.

To solve these, I have:

  • Created a docker-entrypoint-initdb.d directory and written a new entrypoint file (moodle-php-entrypoint) which makes use of it. This is an extremely common way of achieving this, and uses the defacto standard nomenclature. The example is largely based on that used in Postgres.
  • Added NVM to the image using lts/carbon (which is pre-installed). On initial startup of the image, NVM will install and use whichever version is specified via its NODE_VERSION environment variable. This startup node usage occurs in the new entrypoint file.
  • Moved the directory creations to the entrypoint file.
  • Added git to the image
  • Split out the ADD /root/ / command into separate ADD commands for each script.
  • Split out the MSSQL installation into a separate file with its own RUN/ADD commands. Whilst this does add a layer to the image, it is helpful for building the image because the MSSQL component has been a little volatile (currently packages.microsoft.com is corrupt so won't even build)

I was hoping to make these changes in separate pull requests, but it's ended up easier doing it this way.

As I say, I'm currently unable to complete building of this image because of a bug with the Microsoft Debian repo. I'd like to get some initial feedback on these changes though in the mean time.

This will also need branches for 54, 70, 72, and 73.

@andrewnicols andrewnicols self-assigned this Feb 26, 2019
@andrewnicols andrewnicols requested a review from stronk7 February 26, 2019 00:43
@andrewnicols andrewnicols force-pushed the 71_stretch_local_ci branch 3 times, most recently from 084e1bf to 8994b2e Compare February 26, 2019 08:15
@andrewnicols
Copy link
Contributor Author

This is now looking good for 71.
If we're happy with this approach, I'll port to other branches.

Dockerfile Outdated Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
@scara
Copy link
Contributor

scara commented Feb 26, 2019

Hi Andrew,
I've gone through all of your commits and found a couple of really trivial issues: very nice work, indeed! 👍 😄

At the end, the only downside of this change IMHO is an increase of the size due to node and git: while I understand the role of node here for CI purposes, I cannot imagine the reason why Moodle CI should require even git since we are used to mount the code from outside the image; if we still need to get code within the running container via a git repo hosted e.g. via GitHub we could download it as an archive e.g. tar.gz, optionally defining the hash. Thoughts?

HTH,
Matteo

@stronk7
Copy link
Member

stronk7 commented Feb 26, 2019

Unless I'm wrong there are also some CIs needing to perform git operations. Namely:

  • the db comparator checker. That one needs to be able to checkout, say, 32, 33, 34, 35, 36 and master, install, upgrade to master... and then compare databases, table by table, field by field. While surely that's possible downloading, doing those switches in git sound like a better alternative.
  • and specially, there is CiBoT, that needs to be able to merge any external branch, both against moodle.git and integration.git (any branch), to be able to verify merge-ability, to generate a final .patch file, to calculate changed lines (some reports are filtered based on those lines...). And all that stuff hardly can be replaced by any download. We need git's muscle to to so.

Those are the 2 main reasons I can imagine (specially the 2nd one), requiring to have git at hand. Ciao :-)

@andrewnicols
Copy link
Contributor Author

I cannot imagine the reason why Moodle CI should require even git since we are used to mount the code from outside the image; if we still need to get code within the running container via a git repo hosted e.g. via GitHub we could download it as an archive e.g. tar.gz, optionally defining the hash.

Hi Matteo,

At this point it's specifically for some of our CI jobs. I'm trying to remove git from some of those jobs, but some of them will continue to require it.

For example, some of the CI jobs use git to determine which files have changed between commits. One such job is the php-lint job.
The other place is for testing of upgrades from different Moodle versions.

@andrewnicols
Copy link
Contributor Author

I've updated the branch to fix up those minor issues pointed out by Matteo.
In the case of the entrypoint, I have mimicked what the upstream PHP image does -- place the entrypoint in /usr/local/bin/ and call it without path.

I've also added an nvm wrapper which will source /usr/local/nvm/nvm.sh and execute the nvm function. This allows you to run nvm using docker exec:

docker exec -t www nvm install lts/dubnium

I've also made changes to ensure that it can be run properly by the www-data user (or any other user). Note: Only a privileged user can nvm install, but any user can nvm use or run node.

@scara
Copy link
Contributor

scara commented Feb 27, 2019

Hi @andrewnicols and @stronk7,
thanks for your replies:

CI jobs

What I'm reading behind your lines is that your CI Hosts should be as simple as possible i.e OS + docker engine i.e. CI jobs requirements should be satisfied by the image running/being the target of the job. I'm far from having a clear picture of all the jobs in Moodle CI but wondering if some of the jobs should be performed using a different image e.g. the merge checks should be performed by a job invoking an image providing git - I'm used to play with bash functions e.g. https://hub.docker.com/r/alpine/git/#optional-usage-1, which is nice for any required command but providing someway git in the CI Host so if the goal is to keep the CI Host surface as small as possible for security reasons maybe it's not nice as I'm assuming w/ my arguments.

At the end mine looks like a speculation when coming on discussing how running the db comparator checker.
Forgive my comments about the size increasing 😉, unless you're evaluating to provide two images, moodle-php-apache, w/o node and git, and moodle-php-apache-ci, which actually adds node and git. Maybe in the future, when someone will complain with better arguments, since it will impact on https://github.com/moodlehq/moodle-docker too.

it can be run properly by the www-data user (or any other user)

I suggest to proceed on the or any other user in a different issue since it requires IMHO some more changes and I'm not sure if the upstream image is ready too unless playing some magic with the users map in the image.
See an example of what I mean in a post of mine in the Moodle Community, https://moodle.org/mod/forum/discuss.php?d=351402#p1418352 => CentOS/CentOS-Dockerfiles#130 => https://github.com/mohammedzee1000/CentOS-Dockerfiles/tree/18e29724bff61b9acba0e34d74db5a71a288c865/moodle/centos7.
Key files:

Shortly, any change on file permissions should use $USER_ID and $GROUP_ID and you should test it by running it via -u <any user id, local to the host i.e. not limited to www-data which should not be in the CI Host unless the CI Host should be exposed someway via HTTP(S)>

Edit: upstream image way of supporting an arbitrary user, https://hub.docker.com/_/php/#running-as-an-arbitrary-user => https://github.com/docker-library/php/blob/640a30e8ff27b1ad7523a212522472fda84d56ff/7.1/stretch/apache/Dockerfile#L42 and https://github.com/docker-library/php/blob/640a30e8ff27b1ad7523a212522472fda84d56ff/7.1/stretch/apache/Dockerfile#L60. I'll post my thoughts in #1 after this PR will land - shortly, at least 80 should not be the port Apache should bind to. I'll do some tests on the upstream image.

HTH,
Matteo

@andrewnicols
Copy link
Contributor Author

CI jobs

What I'm reading behind your lines is that your CI Hosts should be as simple as possible i.e OS + docker engine i.e. CI jobs requirements should be satisfied by the image running/being the target of the job.

Correct. Our CI infrastructure is based around Jenkins and makes use of dumb, disposable, nodes which essentially have docker-engine, Java, and a Jenkins client installed. There are a few other tools installed to orchestrate some of the jobs.

I'm far from having a clear picture of all the jobs in Moodle CI but wondering if some of the jobs should be performed using a different image e.g. the merge checks should be performed by a job invoking an image providing git - I'm used to play with bash functions e.g. https://hub.docker.com/r/alpine/git/#optional-usage-1, which is nice for any required command but providing someway git in the CI Host so if the goal is to keep the CI Host surface as small as possible for security reasons maybe it's not nice as I'm assuming w/ my arguments.

I've been testing some of my changes at https://ci.moodle.org/view/Testing/job/DEV.98%20-%20Testing%20standard%20jobs/ recently.
Some of those jobs run on a slim image andrewnicols/moodle-ci (poorly named and will be improved) which is the bash:5.0 official image with the GNU version of grep installed.

I'm looking at exactly this use-case, but some of our jobs really do need PHP + Git, and some of our jobs really do need PHP + Grunt + Git.

Ultimately the moodle-php-apache image is intended to aid developers as an environment in which they can run tests. It is not intended to be a tiny image, but a complete image.

At the end mine looks like a speculation when coming on discussing how running the db comparator checker.
Forgive my comments about the size increasing 😉, unless you're evaluating to provide two images, moodle-php-apache, w/o node and git, and moodle-php-apache-ci, which actually adds node and git. Maybe in the future, when someone will complain with better arguments, since it will impact on https://github.com/moodlehq/moodle-docker too.

I don't think we plan to go down this route at the moment. It's something I was considering, but ultimately it adds too much complexity for 99% of use-cases.

it can be run properly by the www-data user (or any other user)

I suggest to proceed on the or any other user in a different issue since it requires IMHO some more changes and I'm not sure if the upstream image is ready too unless playing some magic with the users map in the image.

It is intended to be suitable for that, yes.
Note: The image is expected to be started as root, but execed as other users.

Shortly, any change on file permissions should use $USER_ID and $GROUP_ID and you should test it by running it via -u <any user id, local to the host i.e. not limited to www-data which should not be in the CI Host unless the CI Host should be exposed someway via HTTP(S)>

Edit: upstream image way of supporting an arbitrary user, https://hub.docker.com/_/php/#running-as-an-arbitrary-user => https://github.com/docker-library/php/blob/640a30e8ff27b1ad7523a212522472fda84d56ff/7.1/stretch/apache/Dockerfile#L42 and https://github.com/docker-library/php/blob/640a30e8ff27b1ad7523a212522472fda84d56ff/7.1/stretch/apache/Dockerfile#L60. I'll post my thoughts in #1 after this PR will land - shortly, at least 80 should not be the port Apache should bind to. I'll do some tests on the upstream image.

We call the existing Apache entrypoint and cmd, so we should support them in the same way the PHP does.

@danpoltawski
Copy link
Contributor

Just to give my (looking from afar) two cents - I like the idea of this being acheived by a fat and thin image variant rather than enforcing the fat image on every non-moodle-ci user.

@andrewnicols
Copy link
Contributor Author

Just to give my (looking from afar) two cents - I like the idea of this being acheived by a fat and thin image variant rather than enforcing the fat image on every non-moodle-ci user.

That was my original thought, but what do you include in each?

Does the thin variant need OCI + MSSQL given so few people actually use it?
Do we go for a variants:

  • apache-7.3
  • apache-7.3-mssql
  • apache-7.3-oci
  • apache-7.3-ci

Do we ever need a CI version with mssql + ocI?

@andrewnicols
Copy link
Contributor Author

Just to note that I've started work on a complete rewrite branch: andrewnicols@complete_rewrite

This uses an update.sh in the same fashion as the php library, and moves all content to the Dockerfile (as per the clarity guidelines).

This generates images in the format:
[version]-[debianrelease]-[db]-[variant]
For example:
7.1-stretch-pgsql-base

There are version for 7.1, 7.2, 7.3 of PHP and currently it supports the stretch release.
It builds separate images for pgsql, mssql, and oci
Variants are base and ci

My thinking is to then create a new official-images repo in the same vein as http://github.com/docker-library/official-images/ with an entry for the php image similar to https://github.com/docker-library/official-images/tree/master/library

We will alias the [version]-stretch-pgsql-base image to [version], [version]-stretch, [version]-pgsql, [version]-mysql, [version]-mariadb.
Likewise the pgsql-ci variant will be tagged with [version]-ci] etc.

This will give us a total of 6 unique images per PHP version, but with 5 of those being smaller layers on top of the others. We'll have tags galore too.

@scara
Copy link
Contributor

scara commented Feb 28, 2019

Hi @andrewnicols,
things are evolving fast from my last comment 👍.

I like your new proposal, even if it impacts on https://github.com/moodlehq/moodle-docker too.
Questions:

  1. Should we really take care of DB drivers splitting? AFAIK anyone, both DEV&CI, requires the whole of the supported DBs to test a change while DEV could work with just the preferred DB which is a development shortcut but at the end the code should be tested on each of the supported DB servers. My proposal: deep DB separation, including pgsql from mysql or nothing i.e. one image including the setup for all of them, the current setup - DEV will be IMHO happy with an all-in-one approach: they should change just the config file since DEV hosts the code outside the container (guessing, CI inside)
  2. What the ci variant should include? IMHO both node and git
    1. For the node part npm caching should be taken seriously to reduce the time to setup node modules e.g. using NPM_CONFIG_PREFIX, mkdir of it, exposing it as a VOLUME and calling everything as global, -g, or via npm ci and not npm install

HTH,
Matteo

@andrewnicols
Copy link
Contributor Author

andrewnicols commented Feb 28, 2019 via email

@scara
Copy link
Contributor

scara commented Feb 28, 2019

When it comes to the Docker scripts you specify a single dB

Yep, if DEV uses a fully containerized setup then she/he is running a "compose" of it i.e. fixing the DB server type too.
In the past I was used to have my own DBs and playing w/ the config file using just moodle-php-apache - i.e. w/o the need to stop it and starting another one - but I must admit that more than recently I play directly from moodle-docker including the DB i.e. you're right 😉.

Matteo

@andrewnicols
Copy link
Contributor Author

What the ci variant should include? IMHO both node and git

I had intended to include Node in the base image. Node is a dev tool, whilst we expect to only use git in our CI infrastructure.

For the node part npm caching should be taken seriously to reduce the time to setup node modules e.g. using NPM_CONFIG_PREFIX, mkdir of it, exposing it as a VOLUME and calling everything as global, -g, or via npm ci and not npm install

What is NPM_CONFIG_PREFIX. I can't see it listed in https://github.com/npm/cli/blob/latest/doc/misc/npm-config.md

Generally speaking we only support one version of Node for each Moodle version. Most people are running a single version per Docker container so I'm not sure that we really need to use a prefix. Could you perhaps explain a bit more?

@scara
Copy link
Contributor

scara commented Feb 28, 2019

I had intended to include Node in the base image. Node is a dev tool

Mmhh... are you confident that a DEV will actually use npm exposed through the container?
Maybe in Moodle HQ - i.e. it is enough to adopt it as a guideline - but I guess that a Front-End developer usually installs it directly on its dev machine since, depending on the IDE, calling npm would not be feasible if not locally installed.
I'm not saying that it is difficult to do: just another food for thought.

What is NPM_CONFIG_PREFIX

A way to define where packages will be globally installed: https://github.com/nodejs/docker-node/blob/master/docs/BestPractices.md#global-npm-dependencies. If you have a fixed folder, you could export it outside the cointainer, on the host, and sharing it to the next running instance of the image at a next run.
Note: node is the user used to run nodejs app, ignore it.

HTH,
Matteo

@stronk7
Copy link
Member

stronk7 commented May 26, 2019

Hi,

so, build changes aside (move from current branches to a directory based structure seems the way to go)... I'm more interested into the final hierarchy of the images, more exactly:

  1. One image to root all DBs vs one image per dbtype. I'm not 100% sold here. I agree that, if you're using moodle-docker (compose), then you already have picked one of the databases. In the other side... as @scara comments, doing that we'll keep out from the equation to people that just uses this image against multiple databases by manually or automatically changing {{config.php}}. So I must confess that I'm not 100% convinced. Maybe we can support both without much effort too... grrr choices!

  2. One unique image to be the "complete" one, containing all the stuff vs having some easy variants, based in current "base" ones, but adding, say "-debug" (for installing phpdebug/xdebug/profiling*), or "-full", based in the "-debug" ones, but adding "git, nvm/nodejs...".

  • About profiling... maybe it should be part of the "base" images, not sure either.

Maybe the problem with all this is that combinations are endless, heh. And many of them are equally "correct/appropriate". But really, if we want to advance with all this... we should make a decision and then, simply, go for it.

Following a previous comment there... how would be:
´´´´
[version]-[debianrelease]-[db]-[variant]
´´´´

  • With "db" being one of mysql, postgres, sqlsrv, oracle and, also, "alldb" ?
  • With variant being one of "base", "debug", "full" ?
  • With default "x.y" tags pointing to "[x.y]-[latestdebianrelease]-[allbd]-[base]" ? (I've picked this one because it's the "BC compatible" with current ones).

Just making a choice and proposing it, I'm not specially strong on it. Of course note that all this will lead to a huge proliferation of images per PHP version: debian releases (1-2) * db (5) * variant (3) = up to 30 images! And of course moodle-docker will need a reorganization to be able to trigget the corresponding ones (that shouldn't be hard).

As said, just aiming to get the organization planned, so it's clear for everybody before the internal changes happen.

Thoughts?

@septatrix
Copy link

I definitely agree that this should not be added to the default image. That image should only contain the minimal requirements for a working runtime. Instead a dev variant should be created.

@andrewnicols
Copy link
Contributor Author

@septatrix, this is the developer image. It is not intended to be used in any kind of production environment and never has been

@stronk7
Copy link
Member

stronk7 commented Mar 18, 2023

Hi @andrewnicols (et all),

I think that we can close this old PR now and continue advancing in recent issues and PRs... what do you think?

(just performing a quick triaging of old stuff)

Ciao :-)

@scara
Copy link
Contributor

scara commented Mar 19, 2023

Hi @stronk7,

I think that we can close this old PR now and continue advancing in recent issues and PRs... what do you think?

+1

I think there is here some valuable work that we could re-evaluate with a new issue, compared to the current upstream (ff023e2) and, potentially, the outcome from #169.

HTH,
Matteo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants