-
Notifications
You must be signed in to change notification settings - Fork 5
/
speed_up_your_ci.html
234 lines (218 loc) · 20.4 KB
/
speed_up_your_ci.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
<!DOCTYPE html>
<html>
<head>
<title>Adventures in CI land, or how to speed up your CI</title>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link type="application/atom+xml" href="/blog/feed.xml" rel="self"/>
<link rel="shortcut icon" type="image/ico" href="/blog/favicon.ico">
<link rel="stylesheet" type="text/css" href="main.css">
<link rel="stylesheet" href="https://unpkg.com/@highlightjs/[email protected]/styles/default.min.css">
<script src="highlight.min.js"></script>
<!-- From https://github.com/odin-lang/odin-lang.org/blob/6f48c2cfb094a42dffd34143884fa958bd9c0ba2/themes/odin/layouts/partials/head.html#L71 -->
<script src="x86asm.min.js"></script>
<script>
window.onload = function() {
hljs.registerLanguage("odin", function(e) {
return {
aliases: ["odin", "odinlang", "odin-lang"],
keywords: {
keyword: "auto_cast bit_field bit_set break case cast context continue defer distinct do dynamic else enum fallthrough for foreign if import in map matrix not_in or_else or_return package proc return struct switch transmute type_of typeid union using when where",
literal: "true false nil",
built_in: "abs align_of cap clamp complex conj expand_to_tuple imag jmag kmag len max min offset_of quaternion real size_of soa_unzip soa_zip swizzle type_info_of type_of typeid_of"
},
illegal: "</",
contains: [e.C_LINE_COMMENT_MODE, e.C_BLOCK_COMMENT_MODE, {
className: "string",
variants: [e.QUOTE_STRING_MODE, {
begin: "'",
end: "[^\\\\]'"
}, {
begin: "`",
end: "`"
}]
}, {
className: "number",
variants: [{
begin: e.C_NUMBER_RE + "[ijk]",
relevance: 1
}, e.C_NUMBER_MODE]
}]
}
});
hljs.highlightAll();
document.querySelectorAll('code').forEach((el, _i) => {
if (0 == el.classList.length || el.classList.contains('language-sh') || el.classList.contains('language-shell') || el.classList.contains('language-bash')){
el.classList.add('code-no-line-numbers');
return;
}
var lines = el.innerHTML.trimEnd().split('\n');
var out = [];
lines.forEach(function(l, i){
out.push('<span class="line-number">' + (i+1).toString() + '</span> ' + l);
});
el.innerHTML = out.join('\n');
});
}
</script>
</head>
<body>
<div id="banner">
<div id="name">
<img id="me" src="me.jpeg">
<span>Philippe Gaultier</span>
</div>
<ul>
<li> <a href="/blog/body_of_work.html">Body of work</a> </li>
<li> <a href="/blog/articles-by-tag.html">Tags</a> </li>
<li> <a href="https://github.com/gaultier/resume/raw/master/Philippe_Gaultier_resume_en.pdf">Resume</a> </li>
<li> <a href="https://www.linkedin.com/in/philippegaultier/">LinkedIn</a> </li>
<li> <a href="https://github.com/gaultier">Github</a> </li>
<li> <a href="/blog/feed.xml">Atom feed</a> </li>
</ul>
</div>
<div class="body">
<div class="article-prelude">
<p><a href="/blog"> ⏴ Back to all articles</a></p>
<p class="publication-date">Published on 2020-09-21</p>
</div>
<div class="article-title">
<h1>Adventures in CI land, or how to speed up your CI</h1>
<div class="tags"> <a href="/blog/articles-by-tag.html#ci" class="tag">CI</a> <a href="/blog/articles-by-tag.html#optimization" class="tag">Optimization</a></div>
</div>
<strong>Table of contents</strong>
<ul>
<li>
<a href="#reduce-the-size-of-everything">Reduce the size of everything</a>
</li>
<li>
<a href="#be-lazy-don-t-do-things-you-don-t-need-to-do">Be lazy: Don't do things you don't need to do</a>
</li>
<li>
<a href="#miscellenaous-tricks">Miscellenaous tricks</a>
</li>
<li>
<a href="#a-note-on-security">A note on security</a>
</li>
<li>
<a href="#i-am-a-devops-engineer-what-can-i-do">I am a DevOps Engineer, what can I do?</a>
</li>
<li>
<a href="#closing-words">Closing words</a>
</li>
</ul>
<p>Every project has a Continuous Integration (CI) pipeline and every one of them complains its CI is too slow. It is more important than you might think; this can be the root cause of many problems, including lackluster productivity, low morale, high barrier of entry for newcomers, and overall suboptimal quality.</p>
<p>But this need not be. I have compiled here a lengthy list of various ways you can simplify your CI and make it faster, based on my experience on open-source projects and my work experience. I sure wish you will find something in here worth your time.</p>
<p>And finally, I hope you will realize this endeavour is not unlike optimizing a program: it requires some time and dedication but you will get tremendous results. Also, almost incidentally, it will be more secure and easier to audit.</p>
<p>Lastly, remember to measure and profile your changes. If a change has made no improvements, it should be reverted.</p>
<p><em>This article assumes you are running a POSIX system. Windows developers, this is not the article you are looking for.</em></p>
<h2 id="reduce-the-size-of-everything">
<a class="title" href="#reduce-the-size-of-everything">Reduce the size of everything</a>
<a class="hash-anchor" href="#reduce-the-size-of-everything" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<p>Almost certainly, your CI pipeline has to download 'something', be it a base docker image, a virtual machine image, some packages, maybe a few company wide scripts. The thing is, you are downloading those every time it runs, 24/7, every day of the year. Even a small size reduction can yield big speed ups. Remember, the network is usually the bottleneck.</p>
<p>In no particular order:</p>
<ul>
<li>Only fetch required git objects. That means running <code>git clone my-repo.git --depth 1 --branch shiny-feature</code>, instead of cloning the whole repository every time, along with every branch and that one class file that your coworker accidentally committed once.</li>
<li>Axe duplicate tools. <code>curl</code> and <code>wget</code> are equivalent, given the right command line options. Settle on using only one and stick to it. All my pipelines use: <code>curl --sSL --retry 5</code>. You can customize further, but that's the gist of it. Other examples: <code>make</code> and <code>ninja</code>, <code>gcc</code> and <code>clang</code>, etc.</li>
<li>Use POSIX tools. They are already present on whatever system you are using. When purely checking that a tool or an API returned 'OK', simply use <code>grep</code> and <code>awk</code>, no need for <code>ripgrep</code>. Prefer <code>sh</code> over <code>bash</code> for simple scripts, <code>make</code> over <code>rake</code> for builds, etc. It's most likely faster, more stable, and more documented, too.</li>
<li>Pay attention to the base image you are using. Prefer a small image where you install only what you need. I have seen docker base images over 1 Gb big. You will spend more time downloading it, uncompressing it, and checksumming it, than running your pipeline. Alpine Linux is great. Debian and Ubuntu are fine. When in doubt, inspect the content of the image. Look for stuff that should not be here. E.g.: <code>X11</code>, man pages, etc.</li>
<li>Don't install documentation. It's obvious but most people do it. While you are at it, don't install <code>man</code>, <code>apropos</code>, <code>info</code>, etc. Alpine Linux gets it right by splitting almost all packages between the package itself and its documentation. E.g.: <code>cmake</code> and <code>cmake-doc</code>.</li>
<li>On the same vein: don't install shell autocompletions. Same idea. Again, on Alpine they are not part of the main package. E.g.: <code>cmake</code> and <code>cmake-bash-completion</code>.</li>
<li>Stay away from aggregate packages (or meta-packages)! Those are for convenience only when developing. E.g.: <code>build-base</code> on Alpine is a meta-package gathering <code>make</code>, <code>file</code>, <code>gcc</code>, etc. It will bring lots of things you do not need. Cherry-pick only what you really required and steer clear of those packages.</li>
<li>Learn how Docker image layers work: avoid doing <code>RUN rm archive.tar</code>, since it simply creates a new layer without removing the file from the previous layer. Prefer: <code>RUN curl -sSL --retry 5 foo.com/archive.tar && tar -xf archive.tar && rm archive.tar</code> which will not add the tar archive to the Docker image.</li>
<li>Use multi-stage Docker builds. It is old advice at this point but it bears repeating.</li>
<li>When using multi-stage: Only copy files you need from a previous stage instead of globbing wildly, thus defeating the purpose of multi-stages.</li>
<li>Tell apart the development and the release variant of a package. For example: on Ubuntu, when using the SDL2 library, it comes in two flavors: <code>libsdl2-dev</code> and <code>libsdl2-2.0</code>. The former is the development variant which you only need when building code that needs the headers and the libraries of the SDL2, while the latter is only useful with software needing the dynamic libraries at runtime. The development packages are usually bigger in size. You can astutely use multi-stage Docker builds to have first a build stage using the development packages, and then a final stage which only has the non-development packages. In CI, you almost never need both variants installed at the same time.</li>
<li>Opt-out of 'recommended' packages. Aptitude on Debian/Ubuntu is the culprit here: <code>apt-get install foo</code> will install much more than <code>foo</code>. It will also install recommended packages that most of the time are completely unrelated. Always use <code>apt-get install --no-install-recommends foo</code>.</li>
<li>Don't create unnecessary files: you use use heredoc and shell pipelines to avoid creating intermediary files.</li>
</ul>
<h2 id="be-lazy-don-t-do-things-you-don-t-need-to-do">
<a class="title" href="#be-lazy-don-t-do-things-you-don-t-need-to-do">Be lazy: Don't do things you don't need to do</a>
<a class="hash-anchor" href="#be-lazy-don-t-do-things-you-don-t-need-to-do" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<ul>
<li>Some features you are not using are enabled by default. Be explicit instead of relying on obscure, ever changing defaults. Example: <code>CGO_ENABLED=0 go build ...</code> because it is (at the time of writing) enabled by default. The Gradle build system also has the annoying habit to run stuff behind your back. Use <code>gradle foo -x baz</code> to run <code>foo</code> and not <code>baz</code>.</li>
<li>Don't run tests from your dependencies. This can happen if you are using git submodules or vendoring dependencies in some way. You generally always want to build them, but not run their tests. Again, <code>gradle</code> is the culprit here. If you are storing your git submodules in a <code>submodules/</code> directory for example, you can run only your project tests with: <code>gradle test -x submodules:test</code>.</li>
<li>Disable the generation of reports files. They frequently come in the form of HTML or XML form, and once again, <code>gradle</code> gets out of his way to clutter your filesystem with those. Of debatable usefulness locally, they are downright wasteful in CI. And it takes some precious time, too! Disable it with:
<pre><code> tasks.withType<Test> {
useJUnitPlatform()
reports.html.isEnabled = false
reports.junitXml.isEnabled = false
}
</code></pre>
</li>
<li>Check alternative repositories for a dependency instead of building it from source. It can happen that a certain dependency you need is not in the main repositories of the package manager of your system. You can however inspect other repositories before falling back to building it yourself. On Alpine, you can simply add the URL of the repository to <code>/etc/apk/repositories</code>. For example, in the main Alpine Docker image, the repository <code>https://<mirror-server>/alpine/edge/testing</code> is not enabled. More information <a href="https://wiki.alpinelinux.org/wiki/Enable_Community_Repository">here</a>. Other example: on OpenBSD or FreeBSD, you can opt-in to use the <code>current</code> branch to get the newest and latest changes, and along them the newest dependencies.</li>
<li>Don't build the static and dynamic variants of the same library (in C or C++). You probably only want one, preferably the static one. Otherwise, you are doing twice the work!</li>
<li>Fetch statically built binaries instead of building them from source. Go, and sometimes Rust, are great for this. As long as the OS and the architecture are the same, of course. E.g.: you can simply fetch <code>kubectl</code> which is a Go static binary instead of installing lots of Kubernetes packages, if you simply need to talk to a Kubernetes cluster. Naturally, the same goes for single file, dependency-less script: shell, awk, python, lua, perl, and ruby, assuming the interpreter is the right one. But this case is rarer and you might as well vendor the script at this point.</li>
<li>Groom your 'ignore' files. <code>.gitignore</code> is the mainstream one, but were you aware Docker has the mechanism in the form of a <code>.dockerignore</code> file? My advice: whitelist the files you need, e.g.:
<pre><code>**/*
!**/*.js
</code></pre>
This can have a huge impact on performance since Docker will copy all the files inside the Docker context directory inside the container (or virtual machine on macOS) and it can be a lot. You don't want to copy build artifacts, images, and so on each time which your image does not need.</li>
<li>Use an empty Docker context if possible: you sometimes want to build an image which does not need any local files. In that case you can completely bypass copying any files into the image with the command: <code>docker build . -f - < Dockerfile</code>.</li>
<li>Don't update the package manager cache: you typically need to start your Dockerfile by updating the package manager cache, otherwise it will complain the dependencies you want to install are not found. E.g.: <code>RUN apk update && apk add curl</code>. But did you know it is not always required? You can simply do: <code>RUN apk --no-cache add curl</code> when you know the package exists and you can bypass the cache.</li>
<li>Silence the tools: most command line applications accept the <code>-q</code> flag which reduces their verbosity. Most of their output is likely to be useless, some CI systems will struggle storing big pipeline logs, and you might be bottlenecked on stdout! Also, it will simplify troubleshooting <em>your</em> build if it is not swamped in thousands of unrelated logs.</li>
</ul>
<h2 id="miscellenaous-tricks">
<a class="title" href="#miscellenaous-tricks">Miscellenaous tricks</a>
<a class="hash-anchor" href="#miscellenaous-tricks" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<ul>
<li>Use <code>sed</code> to quickly edit big files in place. E.g.: you want to insert a line at the top of a Javascript file to skip linter warnings. Instead of doing:
<pre><code>printf '/* eslint-disable */\n\n' | cat - foo.js > foo_tmp && mv foo_tmp foo.js
</code></pre>
which involves reading the whole file, copying it, and renaming it, we can do:
<pre><code>sed -i '1s#^#/* eslint-disable */ #' foo.js
</code></pre>
which is simpler.</li>
<li>Favor static linking and LTO. This will simplify much of your pipeline because you'll have to deal with fewer files, ideally one statically built executable.</li>
<li>Use only one Gitlab CI job. That is because the startup time of a job is very high, in the order of minutes. You can achieve task parallelism with other means such as <code>parallel</code> or <code>make -j</code>.</li>
<li>Parallelize all the things! Some tools do not run tasks in parallel by default, e.g. <code>make</code> and <code>gradle</code>. Make sure you are always using a CI instance with multiple cores and are passing <code>--parallel</code> to Gradle and <code>-j$(nproc)</code> to make. In rare instances you might have to tweak the exact level of parallelism to your particular task for maximum performance. Also, <code>parallel</code> is great for parallelizing tasks.</li>
<li>Avoid network accesses: you should minimize the amount of things you are downloading from external sources in your CI because it is both slow and a source of flakiness. Some tools will unfortunately always try to 'call home' even if all of your dependencies are present. You should disable this behavior explicitly, e.g. with Gradle: <code>gradle build --offline</code>.</li>
<li>In some rare cases, you will be bottlenecked on a slow running script. Consider using a faster interpreter: for shell scripts, there is <code>ash</code> and <code>dash</code> which are said to be much faster than <code>bash</code>. For <code>awk</code> there is <code>gawk</code> and <code>mawk</code>. For Lua there is <code>LuaJIT</code>.</li>
<li>Avoid building inside Docker if you can. Building locally, and then copying the artifacts into the image, is always faster. It only works under certain constraints, of course:
<ul>
<li>same OS and architecture, or</li>
<li>a portable artifact format such as <code>jar</code>, and not using native dependencies, or</li>
<li>your toolchain supports cross-compilation</li>
</ul>
</li>
</ul>
<h2 id="a-note-on-security">
<a class="title" href="#a-note-on-security">A note on security</a>
<a class="hash-anchor" href="#a-note-on-security" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<ul>
<li>Always use https</li>
<li>Checksum files you fetched from third-parties with <code>shasum</code>.</li>
<li>Favor official package repositories, docker images, and third-parties over those of individuals.</li>
<li>Never bypass certificate checks (such as <code>curl -k</code>)</li>
</ul>
<h2 id="i-am-a-devops-engineer-what-can-i-do">
<a class="title" href="#i-am-a-devops-engineer-what-can-i-do">I am a DevOps Engineer, what can I do?</a>
<a class="hash-anchor" href="#i-am-a-devops-engineer-what-can-i-do" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<p>Most of the above rules can be automated with a script, assuming the definition of a CI pipeline is in a text format (e.g. Gitlab CI). I would suggest starting here, and teaching developers about these simple tips than really make a difference.</p>
<p>I would also suggest considering adding strict firewall rules inside CI pipelines, and making sure the setup/teardown of CI runners is very fast. Additionally, I would do everything to avoid a situation where no CI runner is available, preventing developers from working and deploying.</p>
<p>Finally, I would recommend leading by example with the pipelines for the tools made by DevOps Engineers in your organization.</p>
<h2 id="closing-words">
<a class="title" href="#closing-words">Closing words</a>
<a class="hash-anchor" href="#closing-words" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<p>I wish you well on your journey towards a fast, reliable and simple CI pipeline.</p>
<p>I noticed in my numerous projects with different tech stacks that some are friendlier than others towards CI pipelines than others (I am looking at you, Gradle!). If you have the luxury of choosing your technical stack, do consider how it will play out with your pipeline. I believe this is a much more important factor than discussing whether $LANG has semicolons or not because I am convinced it can completely decide the outcome of your project.</p>
<p><a href="/blog"> ⏴ Back to all articles</a></p>
<blockquote id="donate">
<p>If you enjoy what you're reading, you want to support me, and can afford it: <a href="https://paypal.me/philigaultier?country.x=DE&locale.x=en_US">Support me</a>. That allows me to write more cool articles!</p>
</blockquote>
<blockquote>
<p>
This blog is <a href="https://github.com/gaultier/blog">open-source</a>!
If you find a problem, please open a Github issue.
The content of this blog as well as the code snippets are under the <a href="https://en.wikipedia.org/wiki/BSD_licenses#3-clause_license_(%22BSD_License_2.0%22,_%22Revised_BSD_License%22,_%22New_BSD_License%22,_or_%22Modified_BSD_License%22)">BSD-3 License</a> which I also usually use for all my personal projects. It's basically free for every use but you have to mention me as the original author.
</p>
</blockquote>
</div>
</body>
</html>