-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add package caching #6
base: master
Are you sure you want to change the base?
Conversation
To speed up Cygwin installations, allow users to store the downloaded packages using GitHub's caching, rather than needing the packages to be downloaded from the Cygwin mirrors on every run. This speeds up the installation when there's a good cache hit, at the expense of adding some additional complexity, and using the limited cache space (although with 10GB assigned to every repository on GitHub, I can't imagine most folk are getting close to the limit!). To avoid unnecessary cache churn, the new steps check if the cache has actually meaningfully changed before storing a new cache. This uses b2sum for speed, since we're just checking for unexpected changes in the cache, and we don't need cryptographic security.
README.md
Outdated
@@ -26,6 +26,7 @@ Parameters | |||
| site | http://mirrors.kernel.org/sourceware/cygwin/ | Mirror site to install from | |||
| check-sig | true | Whether to check the setup.ini signature | |||
| add-to-path | true | Whether to add Cygwin's `/bin` directory to the system `PATH` | |||
| package-cache-key | '' | The key to use for caching downloaded packages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reasoning behind making this a string, rather than a boolean? I think that the cache-keys are scoped to the repository, so the only possible collision is with other cache action uses in the same workflow?
It would be nice to be able to make this on by default in future, without the user having to choose a key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The store of caches is per-repository, and the actual restore behaviour is scoped even more tightly than that, with restrictions about what caches are allowed to be restored on what branch. But yes, this is something I debated; I went with letting the user decide mostly so they wouldn't be surprised by caches appearing with unexpected names.
Having been playing around with this a bit more, though, I think I'm going to suggest a different approach: hard-code the cache key name, but make the behaviour configurable with a cache-behaviour
setting or similar, that takes values of disabled
(get current cache-free behaviour), enabled
(full new behaviour), or saveonly
or restoreonly
to do the obvious things. The latter are useful when – as with a lot of my cygport jobs – you need to run the action multiple times in the repository.
(As we discussed in #5, the ideal solution might be to swap from a composite action to a JS one, since that would allow post-run steps, which could be responsible for saving the cache after it has been used several times, but that's obviously a much bigger job!)
Thanks! This is awesome! I vagely recall trying to do this initially but running into some problem, so thanks for adding this!
Some numbers showing that restoring the cache is actually typically faster than fetching the packages again would help give me confidence that this is a good change.
Unfortunately, setup allows it's package cache to grow without bound. Something which has needed fixing for literal decades... I don't think that's blocking for this PR, since, yes, the likelhood of hitting the cache limit is small, but something you should be aware of.
Presumably there's some analysis been done that all the commands used are safe because they are in 'base' installation, but a comment to that effect might be worthwhile. |
np! This is about my fifth attempt at getting it working without getting stuck and giving up in disgust…
Yep, very fair. I have some rough anecdata that says it helps, but I've not really stress tested it to prove it makes a difference for installations that are big enough for it to be substantial. I should be able to have a go at that shortly :)
I had been thinking about that. GitHub evicts caches that aren't used for 7+ days, but if a repo were getting rebuilds more regularly than that, the cache would just keep getting larger and larger. I think that's a good reason for it to not be default behaviour for this action (yet), and I should probably note it explicitly in the documentation, but I think it mostly means that anyone using this action might need to manually flush the cache every so often, to force a new smaller cache to be generated.
They're all provided as part of the Git Bash / Git for Windows installation on the GitHub Windows runners. I've not actually checked how this works for self-hosted runners, and I could believe it's not guaranteed to work there. I'll add some notes to that effect… |
Add an option to restore a cache without creating a new one, or to create a cache without restoring an old one. This is useful in the scenario where the action is called multiple times in a run, as it allows the cache to only be restored on the first call, and for a new cache to only be written on the last call. This option replaces the `package-cache-key` option; as discussed in cygwin#6 that option isn't very useful. Additionally, add some more scopes to the cache name. Without this change, a second attempt to create a cache in the same action run (e.g. because one step installs some packages, then another step installs some more) will fail because of the cache name collision. Also update the README and tests, although the tests are only getting a very quick update for this function, as I'm going to do a much more comprehensive test update separately.
c9be7a5
to
79832e3
Compare
This should hopefully show how much of a difference, if any, caching packages makes. There's no automated comparison here, the test just runs one install without a cache, then a second install with a cache, then a human needs to look at what difference it makes.
Add an option to restore a cache without creating a new one, or to create a cache without restoring an old one. This is useful in the scenario where the action is called multiple times in a run, as it allows the cache to only be restored on the first call, and for a new cache to only be written on the last call. This option replaces the `package-cache-key` option; as discussed in cygwin#6 that option isn't very useful. Additionally, add some more scopes to the cache name. Without this change, a second attempt to create a cache in the same action run (e.g. because one step installs some packages, then another step installs some more) will fail because of the cache name collision. Also update the README and tests, although the tests are only getting a very quick update for this function, as I'm going to do a much more comprehensive test update separately.
Add an option to restore a cache without creating a new one, or to create a cache without restoring an old one. This is useful in the scenario where the action is called multiple times in a run, as it allows the cache to only be restored on the first call, and for a new cache to only be written on the last call. This option replaces the `package-cache-key` option; as discussed in cygwin#6 that option isn't very useful. Additionally, add some more scopes to the cache name. Without this change, a second attempt to create a cache in the same action run (e.g. because one step installs some packages, then another step installs some more) will fail because of the cache name collision. Also update the README and tests, although the tests are only getting a very quick update for this function, as I'm going to do a much more comprehensive test update separately.
Add an option to restore a cache without creating a new one, or to create a cache without restoring an old one. This is useful in the scenario where the action is called multiple times in a run, as it allows the cache to only be restored on the first call, and for a new cache to only be written on the last call. This option replaces the `package-cache-key` option; as discussed in cygwin#6 that option isn't very useful. Additionally, add some more scopes to the cache name. Without this change, a second attempt to create a cache in the same action run (e.g. because one step installs some packages, then another step installs some more) will fail because of the cache name collision. Also update the README and tests, although the tests are only getting a very quick update for this function, as I'm going to do a much more comprehensive test update separately.
Add an option to restore a cache without creating a new one, or to create a cache without restoring an old one. This is useful in the scenario where the action is called multiple times in a run, as it allows the cache to only be restored on the first call, and for a new cache to only be written on the last call. This option replaces the `package-cache-key` option; as discussed in cygwin#6 that option isn't very useful. Additionally, add some more scopes to the cache name. Without this change, a second attempt to create a cache in the same action run (e.g. because one step installs some packages, then another step installs some more) will fail because of the cache name collision. Also update the README and tests, although the tests are only getting a very quick update for this function, as I'm going to do a much more comprehensive test update separately.
Aha! I hadn't noticed that all this happens before we modify that PATH, so it'll never be using cygwin tools to do these things, it's assuming they are already provided by something in the VM image. |
On the performance topic: initial numbers seem to show using a cache saves very roughly 4 minutes off a Cygwin installation that would otherwise take about 10 minutes in total. Restoring a cache of a few hundred MB seems to take only a few seconds, which clearly beats the download time from the Cygwin mirrors. Currently the hold-up is the reason I raised #5: if Cygwin is in the PATH, the caching code will still try to use the Git for Windows |
I could live with "if you want to use this action multiple time in the same workflow, you must use package-cache: disabled or add-path: false"? |
I'm hoping it might be something that can get fixed fairly soon; it looks like the problem was added relatively recently, in actions/cache@9b0be58822, and based on actions/toolkit#1311 I'm not the only person to be seeing similar problems. Another option – which I've not tested properly but is next on my idea list – is to try just using the parent of that commit in actions/cache. I created a simple test case to demonstrate the issue at https://github.com/me-and/repro-cygwin-cache-woe, and it looks like the parent commit might let the caching work as currently designed, but without this issue. Otherwise, I agree: just documenting it as a limitation is probably going to be the best option for now! |
------- | ||
|
||
If you're likely to do regular builds, you might want to store the packages | ||
locally rather than needing to download them from the Cygwin mirrors on every |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure "locally" is the best word to use here.
To speed up Cygwin installations, allow users to store the downloaded packages using GitHub's caching, rather than needing the packages to be downloaded from the Cygwin mirrors on every run.
This speeds up the installation when there's a good cache hit, at the expense of adding some additional complexity, and using the limited cache space (although with 10GB assigned to every repository on GitHub, I can't imagine most folk are getting close to the limit!).
To avoid unnecessary cache churn, the new steps check if the cache has actually meaningfully changed before storing a new cache. This uses b2sum for speed, since we're just checking for unexpected changes in the cache, and we don't need cryptographic security.
I was originally working on this for my own builds, e.g. at https://github.com/me-and/Cygwin-inih/blob/fb7534276feeb2a5217df5013c679a370ee6013e/.github/workflows/ci.yml, but it occurred to me that this might be more useful here than in my own repos.