Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task "generates" not working as expected #1741

Closed
ameergituser opened this issue Aug 8, 2024 · 13 comments
Closed

Task "generates" not working as expected #1741

ameergituser opened this issue Aug 8, 2024 · 13 comments

Comments

@ameergituser
Copy link

  • Task version: Task version: v3.38.0 (h1:O7kgA6BfwktXHPrheByQO46p3teKtRuq1EpGnFxNzbo=)
  • Operating system: Linux
  • Experiments enabled: None

No fingerprinting for files listed in "generates" and hence it does not affect when the task executes the "cmds".

Example:

# https://taskfile.dev

version: '3'
silent: true

vars:
  GREETING: Hello, World!

tasks:
  default:
    cmds:
      - echo "{{.GREETING}}" > ./test.txt
    generates:
      - ./test.txt

Steps to reproduce:

  1. run task for the first time. It generates the test.txt file
  2. run task again. It always executes the cmds

Am i misunderstanding the generates feature? I assumed that files in the generates would be fingerprinted and when these files are changed, then the cmds would then execute?

@task-bot task-bot added the state: needs triage Waiting to be triaged by a maintainer. label Aug 8, 2024
@ddouglas
Copy link

I'm experiencing this same issue.

version: "3"

tasks:
  fbuild:
    cmds:
      - ./.scripts/fbuild.sh
    sources:
      - ./functions/**/*.go
    generates:
      - ./.handlers/**
  dbuild:
    cmds:
      - ./.scripts/dbuild.sh
    generates:
      - ./prolog-core.tar
  terraform:
    deps: [fbuild, dbuild]
    dir: "./terraform"
    cmds:
      - ls -lha

@vmaerten
Copy link
Member

Hello !
With generates you need to use sources as well. sources is used to calculate the checksum

# https://taskfile.dev/

version: '3'
silent: true

vars:
  GREETING: Hello, World!

tasks:
  default:
    cmds:
      - echo "{{.GREETING}}" > ./test.txt
     sources:
      - ./test.txt
    generates:
      - ./test.txt

Note

As you are modifying the file in the cmds and the checksum is calculated before cmds are run, you will always get an extra run :

task # checksum is calculated with no test.txt, then the file is created
task # checksum is calculated with test.txt with the content
task # task is up to date 

image

@vmaerten vmaerten closed this as not planned Won't fix, can't repro, duplicate, stale Aug 12, 2024
@task-bot task-bot removed the state: needs triage Waiting to be triaged by a maintainer. label Aug 12, 2024
@ameergituser
Copy link
Author

Hi @vmaerten. Thank you for the response. May i ask what is the purpose of "generates" currently?

@vmaerten
Copy link
Member

@ameergituser it's used to split "sources" and "binary generated"
Let's say you have an application to build (no matter the language, can be Go, Ts, Rust, etc.), so you have sources (all go files for example) and a binary resulting from the compilation.
You want the task run if:

  • A change has been made in sources
    OR
  • The binary is not here

You can find an example showing this usecase directly in our codebase :

task/Taskfile.yml

Lines 81 to 88 in 51c569e

sleepit:build:
desc: Builds the sleepit test helper
sources:
- ./cmd/sleepit/**/*.go
generates:
- "{{.BIN}}/sleepit"
cmds:
- go build -o {{.BIN}}/sleepit{{exeExt}} ./cmd/sleepit

That being said, in your example generates is not needed, you can only provide sources :

image

Does my message answer your question ?

@ameergituser
Copy link
Author

Hi @vmaerten.

Thank you for the explanation. Perhaps an update of the documentation is needed to better explain the behavior?

My understanding was as follows:

The sources meant all files used as input to the task. Could be text or binary.
The generates meant all files generated via the task. Could be text or binary.

Why add a generates?

My understanding of generates was to allow fingerprinting at the end of the task to mitigate the "extra" unnecessary task run as we expect that the file will be modified by the task itself.

In your explanation you mention it's used to split "sources" and "binary generated". I still don't understand why we need to split? and how does this affect the task?

@vanackere
Copy link

@ameergituser it's used to split "sources" and "binary generated" Let's say you have an application to build (no matter the language, can be Go, Ts, Rust, etc.), so you have sources (all go files for example) and a binary resulting from the compilation. You want the task run if:

* A change has been made in sources
  OR

* The binary is not here

You can find an example showing this usecase directly in our codebase :

task/Taskfile.yml

Lines 81 to 88 in 51c569e

sleepit:build:
desc: Builds the sleepit test helper
sources:
- ./cmd/sleepit/**/*.go
generates:
- "{{.BIN}}/sleepit"
cmds:
- go build -o {{.BIN}}/sleepit{{exeExt}} ./cmd/sleepit

Hi, I was also surprised by the current behaviour since in my opinion we also would like to run the task if the binary does not match the one that were last generated with the current set of sources... Duplicating the "generates" to the "sources" allows somewhat to work around this issue but it would be nice to have it handled automatically and properly by the task tool if possible. It shouldn't be very hard to keep track of this, by simply also keeping track of the generated fingerprint after the build... @vmaerten would you be open to such a change ?

@tw1nk
Copy link

tw1nk commented Sep 25, 2024

not having checksums for the generated output files can be dangerous.

let's say we have checked in generated files (from protobuf for instance) and someone forgot to check in the sources to generate the files, but checked in the generated files and some extra code that depends on the changed proto files (via the generated files). This would cause taskfile to incorrectly think everything is fine and up to date.

if I then do changes to the proto files (because I don't know someone forgot to check in the changed source file) new code would be generated but with missing changes, which would most likely cause build failures in subsequent tasks that depends on the generated files.

@tw1nk
Copy link

tw1nk commented Sep 25, 2024

According to the documentation for the method attribute on the task:

Defines which method is used to check the task is up-to-date. timestamp will compare the timestamp of the sources and generates files. checksum will check the checksum

https://taskfile.dev/reference/schema/#task

It clearly says that the checksum is compared for both the sources and the generated files so this is clearly a bug in the current implementation.

@adamdicarlo0
Copy link

adamdicarlo0 commented Oct 24, 2024

I don't understand this at all... I can't get task to recognize that the generated files are gone no matter what I do.

For instance:

image

Why does it say "Task build is up to date" after I've done rm -rf dist?

@adamdicarlo0
Copy link

I've got it sort-of working by switching to checksum:

tasks:
  build:
    cmds:
      - npx tsc --project .
    run: when_changed
    sources:
      - "**/*.mts"
      - "dist/**/*"

but, after wiping dist, now it runs twice before it recognizes that the step is up to date. Which makes sense, because the output is now considered part of the input, so after the first run, the checksum will change....

@adamdicarlo0
Copy link

Seems like a bug in the timestamp method. This (checksum) config works as expected:

tasks:
  build:
    cmds:
      - npx tsc --project .
    run: when_changed
    sources:
      - "**/*.mts"
      - exclude: "dist/**/*"
    generates:
      - "dist/**/*"

but with method: timestamp it does not; it never knows that the recipe is out of date, as it apparently does not check timestamps on the generates files.

If this is intended behavior... why does timestamp work so differently than checksum?

(This is my first time trying Task, so this has been pretty confusing!)

@ameergituser
Copy link
Author

ameergituser commented Oct 25, 2024

Hi @andreynering,

This issue seems to either be a bug in the application, or a bug in the spec.

It is a pain point, and factor on whether to use task or not for newcomers.

It is NOT okay to run twice, this can be very expensive and is unnecessary.

From my understanding and from what i wrote above, i believe that the sources and generates should operate as follows:

Sources

  • The sources files are NOT expected to be modified by the task itself, but if it is, the task will be eligible to run its cmds again.
  • Sources should be fingerprinted at the start of a task. So that if the task does modify the sources, the next time the task is run, it will run the cmds.
  • [Nice to have]: When a file is listed with an explicit name, and it does not exist, the task should fail.

Generates

  • The generated files are expected to be modified by the task itself, hence when the modification occurs it should not be a reason for the task to run its cmds again.
  • Generated files should be fingerprinted at the end of the task. So that the task is only eligible to run when the generated files are modified outside of the task.
  • If a generated explicitly named file does not exist, the task should also be eligible to run its cmds.

It would be great to get some feedback on this issue.

Thank you.

@ameergituser
Copy link
Author

Adding this case to document that this current behavior causes confusion.

Case #1945

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants