Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Proof-of-Concept #974

Closed
wants to merge 1 commit into from
Closed

Docker Proof-of-Concept #974

wants to merge 1 commit into from

Conversation

andreynering
Copy link
Member

@andreynering andreynering commented Dec 31, 2022

NOTE: This is incomplete work. It's a proof-of-concept that is going well, but the actual behavior and schema will change until it reaches master, so avoid using this unless for testing purposes.

Feedback is welcome on how you expect this to work 🙂.

Implementation

Credits given to @no0dles for the first PoC on #285 and its our tool hammerkit.

On his implementation, @no0dles used the official Docker client library for Go. Sounds reasonable, but I decided to try a different approach: I'm calling docker run ... directly. There's a couple of reasons this may be interesting:

  1. It requires no additional dependencies to the tool.
  2. It allow us to add a flags prop on the YAML to allow the user to set specific Docker flags (flags: [--cpus=1, --pull=always]). This would be harder to implement if using the Go client, would potentially require a prop to each setting. List of flags available here.
  3. The idea of avoiding using libraries make the implementation a bit more generic, opening room to potentially adding support to other container systems in the future, like Podman, without too much additional work.

TODO

  • Add props to highly common settings.
    • Mounting directories?
    • Exposing ports?
    • We don't need to do all because with the generic flags it's also possible to do this.
  • Think about how to handle the dir prop and its relation with the mounts prop.
  • What do you about method/sources/generates? Try to run them inside the container? Run outside? Forbid usage on when task is a container?
    • We could maybe forbid usage on the first release and figure it out the best behavior later.
  • Dynamic variables are still being computed outside the container. Is that OK?
  • Make implementation more generic (change to a interface) to allow Podman/others in the future?
  • Add global container prop, so it's possible to set all tasks on run in a given container?
    • Not sure how useful that would be. It's probably better to wait for feedback before doing this.
  • Write tests
    • These tests should be behind a Go flag so they won't run by default (but should ideally run on CI).
  • Write documentation
    • Usage
    • API
    • JSON Schema

@andreynering andreynering added type: feature A new feature or functionality. state: wip A work in progress. proposal labels Dec 31, 2022
@andreynering andreynering self-assigned this Dec 31, 2022
@endigma
Copy link

endigma commented Dec 31, 2022

Will this support docker-compatibles? (nerdctl, podman, etc...)

@max-sixty
Copy link

This looks awesome @andreynering !

From my perspective — albeit limited — if an early version would mount a volume of the local path, and allow passing some flags, that would be a great start already.

If you want to allow flags, it might be interesting to inherit some of the behavior of docker-compose, which has a YAML format for decoratively running containers: https://docs.docker.com/get-started/08_using_compose/. Some will be overlapping with the existing task args (e.g. command), but there's a very long list of configs we'd get for free: https://docs.docker.com/compose/compose-file/#ports. I'm familiar with projects docker-compose as a bad Taskfile.yaml.

@andreynering
Copy link
Member Author

andreynering commented Dec 31, 2022

Will this support docker-compatibles? (nerdctl, podman, etc...)

@endigma I never had any contact with Podman yet. I mentioned it because some people have asked for it on the past.

Is Podman CLIs compatible with Docker? The way I implemented it, Task is just generating a docker run [flags] [image] [command] command and running it. If we could just replace docker with podman, that would be awesome. If we need a different syntax, I still believe it wouldn't be hard to adapt, though.


EDIT: The website says:

Simply put: alias docker=podman.

So yeah, seems that it would just work. Needs to be tested, though.

https://podman.io/

@no0dles
Copy link

no0dles commented Dec 31, 2022

Hi @andreynering

Thanks for the credits. Looking forward to see support for docker in taskfile.

May I can give you some advice/recommendations, since I had a lot of findings during the development of hammerkit.

Use mounts for sources, volumes for generates.

  • Mounts are slow on mac/windows and require a lot of CPU. It does not perform well on large directories (for example node_modules)
  • Volumes perform very well on mac/windows since they stay in the docker vm and are not constantly synced.
  • Files for volumes can be imported/exported, but thats an additional step before/after.

Keep in mind about file permissions

  • Permissions for Mounts/Volumes on mac/windows are no problem, but on linux
  • In linux all operations made in the container, will write their files with the current users pid/gid. Lots of docker images use the root user, so all files exposed by mounts/volumes will have those pid/gid permissions set. Execution user can be changed by cli/api, but the image then may not behave correctly, since permissions in the container are missing.
  • My solution was to pass in the current uid/gid to the container and for each mount use a "chown" command to adjust the permission inside of the container, before running the actual task commands.

Github Actions on windows with docker is difficult

  • I was able to create github actions that support integration tests on macos/linux. macos is quite slow, but working. For Windows I was not able to create a working stable solution.

I think the approach using the cli can work. The only downside will be getting information back from the cli. Status reporting on longer tasks, needs to read from the stdout or some background task needs to inspect the container.

What you also should consider is, if you want to create multiple "docker run" commands for a task or create one container for each task and the use "docker exec" for the list of commands. The later one will be way faster and saves some resources/prep/post work.

@andreynering
Copy link
Member Author

Thanks for the tips @no0dles!

These are issues I didn't think I would face. Good reminder to test on different OSes as well.

@MDr164
Copy link

MDr164 commented Jan 6, 2023

My current workflow consist of calling Task from inside Daggers Go-SDK. Maybe Task could leverage that to transparently execute tasks in Docker instead? It already gives you a way to interact with the docker daemon using Go, so why reinvent the wheel here? Or do I missunderstand the goals of this PR?

@pioio
Copy link

pioio commented May 28, 2023

I'm looking forward for this one to land.

@mgale
Copy link

mgale commented Dec 28, 2023

I decided to take a different approach, thoughts?
#1448

@trulede
Copy link

trulede commented Mar 29, 2024

We use Task extensively with containers. Generally the approach we have is:

  • set some vars to hold the container image name and version. These might default to values from the environment.
  • define a task that:
    • has the run command: docker run -rm ...
    • maps the current directory into the container (at the expected place, typically the workdir)
    • sets any required ports and environment variables (using task vars to manage defaults)
    • and the command that should be run in the container (again using task vars to manage defaults)
  • use additional tasks to manage the additional requirements, such as logging into a container registry, or even building a container on demand. The existing task dependency mechanism work fine here.

Since it works already ... I wonder is it really necessary to support docker?

Workflows around containers can be remarkably complex. When you try to formalise that within Task, then its possible to be "a sometimes useful, sometimes not" feature.

If I would extend the schema, it might be like this:

tasks:
  build_minimal:  // the minimal case, use the default container runtime (top level taskfile)
    container:
      image: foo:bar
      workdir: /working  // it might be possible to extract this from the image (where it is normally set)
    cmds:
      - go build -v -i main.go
  
  build_extended:  // the extented/full case
    container:
      runtime: docker // explicitly set to override the default.
      image: foo:bar
      workdir: /working // use the existing dir, but interpret as a map a:b ??
      env:
        foo: bar // use the existing task env ??
      ports:
        5000:5000
      volumes: {} // additional volumes
      args: [] // additional args for the container runtime
    cmds:
      - go build -v -i main.go

// there are lots of runtimes, and lots of options, so you have to specify them exactly
container
  default: docker // the default runtime
  runtimes:
    docker:
      cmd: docker // default runtime command
      args: --rm // default args
      // other container runtime settings ...

but as it is, it looks like this:

tasks:
  build:
    cmds:
      - docker run -it --rm -v$(pwd):/workdir IMAGE:TAG go build -v -i main.go

Well, there might be something to it. For the minimal case, its a bit easier to work with. Perhaps, the problem is generalised even further ... a generic "runtime", which could be containers (docker, podman, kubernetes), but also things like ssh, terminal or other command interfaces.

The minimal case, presented here, with a container runtime defined elsewhere ... does look like it would be easier to work with.

@trulede
Copy link

trulede commented Mar 30, 2024

This is an example taskfile that shows how we use docker. You can see, docker itself is not the bulk of the work ...

Taskfile.txt

Its also possible to define the docker command as a variable, and expand that in certain circumstances (typical use case is running task inside a container, then you don't need the docker command.

CMD: '{{if .CI}}{{.CMD}}{{else}}{{DOCKER}}{{.CMD}}{{end}}'

Also, related to #1568, in that file you can see how we solve the skip problem:

        if test -n '{{.FUNCTION_PATCH}}' || test -n '{{.SIGNAL_PATCH}}' ; then
          docker run --rm -v $(pwd):/sim {{.IMAGE}}:{{.TAG}} patch-network
            --network {{.NETWORK_FILE}}
            {{.FUNCTION_PATCH}}
            {{.SIGNAL_PATCH}}
            {{.REMOVE_UNKNOWN}}
        ; else echo "patch-network skipped"; fi

... but perhaps more interesting, is how we have to deal with optional arguments. Improving that might also be interesting.

@pd93
Copy link
Member

pd93 commented Apr 1, 2024

a generic "runtime", which could be containers (docker, podman, kubernetes), but also things like ssh, terminal or other command interfaces

@trulede I lean towards this instead of integrating with specific tools. I think Task works better if the integration with tools is left to the user. Opening the door to integrations with tools is a slippery slope. It wouldn't be long before there is a feature request for an integration with everyone's favourite container tool or ssh client etc and this is a maintenance nightmare. I haven't actually used docker myself for years now. I work with k8s and podman and I don't fancy adding direct support for all of these.

However, we should absolutely make it as easy as possible to integrate with other tools. There is an open issue (#448) that tracks the kind of functionality you're describing. I think this fulfils the spirit of the initial proposal here and I would personally vote to close this in favour of an implementation of that issue.

For example:

version: 3

tasks:
  # Basic docker example
  docker-stuff:
    runtime: "docker exec my-container"
    cmds:
      - some-command-inside-my-container
      - ...

  # Slightly more interesting example using a variable in the `runtime` clause
  kubernetes-stuff:
    vars:
      POD_NAME: service-1
    runtime: "kubectl exec $(kubectl get pod -o name | grep {{.POD_NAME}})"
    cmds:
      - some-command-inside-my-pod
      - ...

  # Slightly more interesting example using a variable in the `runtime` clause
  ssh-stuff:
    vars:
      USER: pete
      SERVER: my-server
    runtime: "ssh {{.USER}}@{{.SERVER}}"
    cmds:
      - some-command-on-my-server
      - ...

@trulede
Copy link

trulede commented Apr 1, 2024

@pd93
Our use of Task has been positive for 4 reasons: dependency management, variable handling, unmodified shell execution and a schema that allows us to dynamically generate (and then execute) Taskfiles.

What I have discovered is that the more complex operation is beyond the understanding of most of our users. To deal with that, I have wrapped most complex operations in a Taskfile. I think that works better than the suggested 'runtime' because it encapsulates all the complexity, rather than splitting it to several places (even in the same file, novice users generally can't visualise the abstractions behind task, and come up with "interesting" outcomes).

That makes be wonder, is there another way to achieve both of these outcomes: 'task wrapping' and 'runtime', and perhaps also wrapping some other concepts to create a session like lifecycle for those commands to run/execute in.

Could we think of it this way: the commands of this task, or even a subset of those commands, should run in that context, the context itself is also a task (i.e. specified by the task name). The context task, itself, runs all its commands to setup the context ... then runs the provided commands ... and then runs its own defer commands to cleanup/close the context. For a long running context (i.e. container/telnet etc.) the mechanism mentioned here (#1082) could be used to keep a context running between commands.

When defining such a context task, it might then be necessary to indicate if such a context should be per invocation, or shared (similar to how 'run' determines how often a task runs), but generally it keeps the existing pattern of use.

tasks:
  build:
    context: builder
    cmds:
      - go build -v -i main.go
      - go build -v -i bar.go
  
  build_mixed:
    cmds:
      - touch somefile.go
      - cmd: go build -v -i main.go
        context: builder_session
      - task: some_task

  builder:
    cmds:
      - start_context
      - defer: stop_context

  builder_session:
    cmds:
      - start_context
      - final: stop_context

Generally I agree. These runtimes, and the usage around them, are typically complex and specalised. A generalised approach would likely be more successful than attempting to support a specific tool.

@andreynering
Copy link
Member Author

Closing in favor of #448.

I started this PR as a proof-of-concept, but without being sure if it was a good idea or not. I end up not having enough time to return to this PR, as there was always smaller things that had higher priority.

Considering that #448 would allow a wider range of use-cases, it looks much wiser to focus on that proposal instead of this one.

Thanks everyone for the discussion! 🙂

@andreynering andreynering deleted the docker-support branch April 1, 2024 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state: wip A work in progress. type: feature A new feature or functionality.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants