-
Notifications
You must be signed in to change notification settings - Fork 5
/
tip_of_day_3.html
246 lines (234 loc) · 12.1 KB
/
tip_of_day_3.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
<!DOCTYPE html>
<html>
<head>
<title>Tip of the day #3: Convert a CSV to a markdown or HTML table</title>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link type="application/atom+xml" href="/blog/feed.xml" rel="self"/>
<link rel="shortcut icon" type="image/ico" href="/blog/favicon.ico">
<link rel="stylesheet" type="text/css" href="main.css">
<link rel="stylesheet" href="https://unpkg.com/@highlightjs/[email protected]/styles/default.min.css">
<script src="highlight.min.js"></script>
<!-- From https://github.com/odin-lang/odin-lang.org/blob/6f48c2cfb094a42dffd34143884fa958bd9c0ba2/themes/odin/layouts/partials/head.html#L71 -->
<script src="x86asm.min.js"></script>
<script>
window.onload = function() {
hljs.registerLanguage("odin", function(e) {
return {
aliases: ["odin", "odinlang", "odin-lang"],
keywords: {
keyword: "auto_cast bit_field bit_set break case cast context continue defer distinct do dynamic else enum fallthrough for foreign if import in map matrix not_in or_else or_return package proc return struct switch transmute type_of typeid union using when where",
literal: "true false nil",
built_in: "abs align_of cap clamp complex conj expand_to_tuple imag jmag kmag len max min offset_of quaternion real size_of soa_unzip soa_zip swizzle type_info_of type_of typeid_of"
},
illegal: "</",
contains: [e.C_LINE_COMMENT_MODE, e.C_BLOCK_COMMENT_MODE, {
className: "string",
variants: [e.QUOTE_STRING_MODE, {
begin: "'",
end: "[^\\\\]'"
}, {
begin: "`",
end: "`"
}]
}, {
className: "number",
variants: [{
begin: e.C_NUMBER_RE + "[ijk]",
relevance: 1
}, e.C_NUMBER_MODE]
}]
}
});
hljs.highlightAll();
document.querySelectorAll('code').forEach((el, _i) => {
if (0 == el.classList.length || el.classList.contains('language-sh') || el.classList.contains('language-shell') || el.classList.contains('language-bash')){
el.classList.add('code-no-line-numbers');
return;
}
var lines = el.innerHTML.trimEnd().split('\n');
var out = [];
lines.forEach(function(l, i){
out.push('<span class="line-number">' + (i+1).toString() + '</span> ' + l);
});
el.innerHTML = out.join('\n');
});
}
</script>
</head>
<body>
<div id="banner">
<div id="name">
<img id="me" src="me.jpeg">
<span>Philippe Gaultier</span>
</div>
<ul>
<li> <a href="/blog/body_of_work.html">Body of work</a> </li>
<li> <a href="/blog/articles-by-tag.html">Tags</a> </li>
<li> <a href="https://github.com/gaultier/resume/raw/master/Philippe_Gaultier_resume_en.pdf">Resume</a> </li>
<li> <a href="https://www.linkedin.com/in/philippegaultier/">LinkedIn</a> </li>
<li> <a href="https://github.com/gaultier">Github</a> </li>
<li> <a href="/blog/feed.xml">Atom feed</a> </li>
</ul>
</div>
<div class="body">
<div class="article-prelude">
<p><a href="/blog"> ⏴ Back to all articles</a></p>
<p class="publication-date">Published on 2024-10-31</p>
</div>
<div class="article-title">
<h1>Tip of the day #3: Convert a CSV to a markdown or HTML table</h1>
<div class="tags"> <a href="/blog/articles-by-tag.html#markdown" class="tag">Markdown</a> <a href="/blog/articles-by-tag.html#csv" class="tag">Csv</a> <a href="/blog/articles-by-tag.html#awk" class="tag">Awk</a> <a href="/blog/articles-by-tag.html#tip-of-the-day" class="tag">Tip of the day</a></div>
</div>
<p>The other day at work, I found myself having to produce a human-readable table of all the direct dependencies in the project, for auditing purposes.</p>
<p>There is a <a href="https://github.com/onur/cargo-license">tool</a> for Rust projects that outputs a TSV (meaning: a CSV where the separator is the tab character) of this data. That's great, but not really fit for consumption by a non-technical person.</p>
<p>I just need to convert that to a human readable table in markdown or HTML, and voila!</p>
<p>Here's the output of this tool in my open-source Rust <a href="https://github.com/gaultier/kotlin-rs">project</a>:</p>
<pre><code class="language-sh">$ cargo license --all-features --avoid-build-deps --avoid-dev-deps --direct-deps-only --tsv
name version authors repository license license_file description
clap 2.33.0 Kevin K. <[email protected]> https://github.com/clap-rs/clap MIT A simple to use, efficient, and full-featured Command Line Argument Parser
heck 0.3.1 Without Boats <[email protected]> https://github.com/withoutboats/heck Apache-2.0 OR MIT heck is a case conversion library.
kotlin 0.1.0 Philippe Gaultier <[email protected]>
log 0.4.8 The Rust Project Developers https://github.com/rust-lang/log Apache-2.0 OR MIT A lightweight logging facade for Rust
pretty_env_logger 0.3.1 Sean McArthur <sean@seanmonstar> https://github.com/seanmonstar/pretty-env-logger Apache-2.0 OR MIT a visually pretty env_logger
termcolor 1.1.0 Andrew Gallant <[email protected]> https://github.com/BurntSushi/termcolor MIT OR Unlicense A simple cross platform library for writing colored text to a terminal.
</code></pre>
<p>Not really readable. We need to transform this data into a <a href="https://github.github.com/gfm/#tables-extension-">markdown table</a>, something like that:</p>
<pre><code class="language-markdown">| First Header | Second Header |
| ------------- | ------------- |
| Content Cell | Content Cell |
| Content Cell | Content Cell |
</code></pre>
<p>Technically, markdown tables are an extension to standard markdown (if there is such a thing), but they are very common and supported by all the major platforms e.g. Github, Azure, etc. So how do we do that?</p>
<p>Once again, I turn to the trusty AWK. It's always been there for me. And it's present on every UNIX system out of the box.</p>
<p>AWK neatly handles all the 'decoding' of the CSV format for us, we just need to output the right thing:</p>
<ul>
<li>Given a line (which AWK calls 'record'): output each field interleaved with the <code>|</code> character</li>
<li>Output a delimiting line between the table headers and rows. The markdown table spec states that this delimiter should be at least 3 <code>-</code> characters in each cell.</li>
<li>Alignment is not a goal, it does not matter for a markdown parser. If you want to produce a pretty markdown table, it's easy to achieve, it simply makes the implementation a bit bigger</li>
</ul>
<p>Here's the full implementation (don't forget to mark the file executable). The shebang line instructs AWK to use the tab character <code>\t</code> as the delimiter between fields:</p>
<pre><code class="language-awk">#!/usr/bin/env -S awk -F '\t' -f
{
printf("|");
for (i = 1; i <= NF; i++) {
# Note: if a field contains the character `|`,
# it will mess up the table.
# In this case, we should replace this character
# by something else e.g. `,`:
gsub(/\|/, ",", $i);
printf(" %s |", $i);
}
printf("\n");
}
NR==1 { # Output the delimiting line
printf("|");
for(i = 1; i <= NF; i++) {
printf(" --- | ");
}
printf("\n");
}
</code></pre>
<p>The first clause will execute for each line of the input.
The for loop then iterates over each field and outputs the right thing.</p>
<p>The second clause will execute only for the first line (<code>NR</code> is the line number).</p>
<p>The same line can trigger multiple clauses, here, the first line of the input will trigger both clauses, whilst the remaining lines will only trigger the first clause.</p>
<p>So let's run it!</p>
<pre><code class="language-sh">$ cargo license --all-features --avoid-build-deps --avoid-dev-deps --direct-deps-only --tsv | ./md-table.awk
| name | version | authors | repository | license | license_file | description |
| --- | --- | --- | --- | --- | --- | --- |
| clap | 2.33.0 | Kevin K. <[email protected]> | https://github.com/clap-rs/clap | MIT | | A simple to use, efficient, and full-featured Command Line Argument Parser |
| heck | 0.3.1 | Without Boats <[email protected]> | https://github.com/withoutboats/heck | Apache-2.0 OR MIT | | heck is a case conversion library. |
| kotlin | 0.1.0 | Philippe Gaultier <[email protected]> | | | | |
| log | 0.4.8 | The Rust Project Developers | https://github.com/rust-lang/log | Apache-2.0 OR MIT | | A lightweight logging facade for Rust |
| pretty_env_logger | 0.3.1 | Sean McArthur <sean@seanmonstar> | https://github.com/seanmonstar/pretty-env-logger | Apache-2.0 OR MIT | | a visually pretty env_logger |
| termcolor | 1.1.0 | Andrew Gallant <[email protected]> | https://github.com/BurntSushi/termcolor | MIT OR Unlicense | | A simple cross platform library for writing colored text to a terminal. |
</code></pre>
<p>Ok, it's hard to really know if that's correct or not. Let's pipe it into <a href="https://github.com/github/cmark-gfm">cmark-gfm</a> to render this markdown table as HTML:</p>
<pre><code class="language-sh">$ cargo license --all-features --avoid-build-deps --avoid-dev-deps --direct-deps-only --tsv | ./md-table.awk | cmark-gfm -e table
</code></pre>
<p>And voila:</p>
<table>
<thead>
<tr>
<th>name</th>
<th>version</th>
<th>authors</th>
<th>repository</th>
<th>license</th>
<th>license_file</th>
<th>description</th>
</tr>
</thead>
<tbody>
<tr>
<td>clap</td>
<td>2.33.0</td>
<td>Kevin K. <a href="mailto:[email protected]">[email protected]</a></td>
<td>https://github.com/clap-rs/clap</td>
<td>MIT</td>
<td></td>
<td>A simple to use, efficient, and full-featured Command Line Argument Parser</td>
</tr>
<tr>
<td>heck</td>
<td>0.3.1</td>
<td>Without Boats <a href="mailto:[email protected]">[email protected]</a></td>
<td>https://github.com/withoutboats/heck</td>
<td>Apache-2.0 OR MIT</td>
<td></td>
<td>heck is a case conversion library.</td>
</tr>
<tr>
<td>kotlin</td>
<td>0.1.0</td>
<td>Philippe Gaultier <a href="mailto:[email protected]">[email protected]</a></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>log</td>
<td>0.4.8</td>
<td>The Rust Project Developers</td>
<td>https://github.com/rust-lang/log</td>
<td>Apache-2.0 OR MIT</td>
<td></td>
<td>A lightweight logging facade for Rust</td>
</tr>
<tr>
<td>pretty_env_logger</td>
<td>0.3.1</td>
<td>Sean McArthur <a href="mailto:sean@seanmonstar">sean@seanmonstar</a></td>
<td>https://github.com/seanmonstar/pretty-env-logger</td>
<td>Apache-2.0 OR MIT</td>
<td></td>
<td>a visually pretty env_logger</td>
</tr>
<tr>
<td>termcolor</td>
<td>1.1.0</td>
<td>Andrew Gallant <a href="mailto:[email protected]">[email protected]</a></td>
<td>https://github.com/BurntSushi/termcolor</td>
<td>MIT OR Unlicense</td>
<td></td>
<td>A simple cross platform library for writing colored text to a terminal.</td>
</tr>
</tbody>
</table>
<p>All in all, very little code. I have a feeling that I will use this approach a lot in the future for reporting or even inspecting data easily, for example from a database dump.</p>
<p><a href="/blog"> ⏴ Back to all articles</a></p>
<blockquote id="donate">
<p>If you enjoy what you're reading, you want to support me, and can afford it: <a href="https://paypal.me/philigaultier?country.x=DE&locale.x=en_US">Support me</a>. That allows me to write more cool articles!</p>
</blockquote>
<blockquote>
<p>
This blog is <a href="https://github.com/gaultier/blog">open-source</a>!
If you find a problem, please open a Github issue.
The content of this blog as well as the code snippets are under the <a href="https://en.wikipedia.org/wiki/BSD_licenses#3-clause_license_(%22BSD_License_2.0%22,_%22Revised_BSD_License%22,_%22New_BSD_License%22,_or_%22Modified_BSD_License%22)">BSD-3 License</a> which I also usually use for all my personal projects. It's basically free for every use but you have to mention me as the original author.
</p>
</blockquote>
</div>
</body>
</html>