-
Notifications
You must be signed in to change notification settings - Fork 1
/
contribute-recipe.html
333 lines (298 loc) · 19.9 KB
/
contribute-recipe.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Contributing a ggd recipe — GGD documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" type="text/css" href="_static/style.css" />
<link rel="stylesheet" type="text/css" href="_static/font-awesome-4.7.0/css/font-awesome.min.css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Creating a ggd meta-recipe" href="making-meta-recipes.html" />
<link rel="prev" title="Setting up with Github" href="github-setup.html" />
<link href="https://fonts.googleapis.com/css?family=Lato|Raleway" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=Inconsolata" rel="stylesheet">
<meta name="msapplication-TileColor" content="#ffffff">
<meta name="msapplication-TileImage" content="_static/ms-icon-144x144.png">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/selectize.js/0.12.6/css/selectize.bootstrap3.min.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.3.1/css/bootstrap.min.css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/datatables/1.10.21/js/jquery.dataTables.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/selectize.js/0.12.6/js/standalone/selectize.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.3.1/js/bootstrap.bundle.min.js"></script>
</head><body>
<div class="document">
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<p class="logo">
<a href="index.html">
<img class="logo" src="_static/logo/GoGetData_name_logo.png" alt="Logo"/>
</a>
</p>
<h3>Navigation</h3>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="quick-start.html">GGD Quick Start</a></li>
<li class="toctree-l1"><a class="reference internal" href="using-ggd.html">Using GGD</a></li>
<li class="toctree-l1"><a class="reference internal" href="GGD-CLI.html">GGD Commands</a></li>
<li class="toctree-l1"><a class="reference internal" href="meta-recipes.html">GGD meta-recipes</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="contribute.html">Contribute</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="github-setup.html">Setting up with Github</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">Contributing a ggd recipe</a></li>
<li class="toctree-l2"><a class="reference internal" href="making-meta-recipes.html">Creating a ggd meta-recipe</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="private_recipes.html">Private Recipes</a></li>
<li class="toctree-l1"><a class="reference internal" href="workflows.html">Using GGD in Workflows</a></li>
<li class="toctree-l1"><a class="reference internal" href="recipes.html">Available Data Packages</a></li>
</ul>
<ul>
<li class="toctree-l1"><a href="https://github.com/gogetdata/ggd-recipes">ggd-recipes @ Github</a></li>
<li class="toctree-l1"><a href="https://github.com/gogetdata/ggd-cli">ggd-cli @ Github</a></li>
</ul>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="contributing-a-ggd-recipe">
<span id="contrib-recipe"></span><h1>Contributing a ggd recipe<a class="headerlink" href="#contributing-a-ggd-recipe" title="Permalink to this headline">¶</a></h1>
<p>[<a class="reference internal" href="index.html#home-page"><span class="std std-ref">Click here to return to the home page</span></a>]</p>
<p>This page is specific to creating a ggd <strong>recipe</strong>. For information on creating a meta-recipe see <a class="reference internal" href="making-meta-recipes.html#contribute-meta-recipe"><span class="std std-ref">Creating a ggd meta-recipe</span></a></p>
<p>The following steps outline how to create, check, and add a ggd data recipe.</p>
<div class="section" id="update-local-forked-repo">
<h2>1. Update local forked repo<a class="headerlink" href="#update-local-forked-repo" title="Permalink to this headline">¶</a></h2>
<p>You will need to update the forked ggd-recipes repo on your local machine before
you add a recipe to it.</p>
<ul class="simple">
<li><p>Navigate to the forked ggd-recipes repo on your local machine</p></li>
<li><p>Once in the directory run the following commands</p></li>
</ul>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ git checkout master
$ git pull upstream master
$ git push origin master
</pre></div>
</div>
</div>
<div class="section" id="writing-a-bash-script-to-get-data">
<h2>2. Writing a bash script to get data<a class="headerlink" href="#writing-a-bash-script-to-get-data" title="Permalink to this headline">¶</a></h2>
<p>Here you will need create a bash script that extract and process the data you would
like to add to ggd.</p>
<p>The following will outline steps used to create the hg19-gaps ggd data recipe:</p>
<ul>
<li><dl>
<dt>First locate the data you would like to extract.</dt><dd><ul class="simple">
<li><p>Example: hg19-gaps from the USCS genome browser track:</p></li>
</ul>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="n">hgdownload</span><span class="o">.</span><span class="n">cse</span><span class="o">.</span><span class="n">ucsc</span><span class="o">.</span><span class="n">edu</span><span class="o">/</span><span class="n">goldenpath</span><span class="o">/</span><span class="n">hg19</span><span class="o">/</span><span class="n">database</span><span class="o">/</span><span class="n">gap</span><span class="o">.</span><span class="n">txt</span><span class="o">.</span><span class="n">gz</span>
</pre></div>
</div>
</dd>
</dl>
</li>
<li><p>Next, identify if you need a genome coordinates file. Many of these are hosted on the ggd-recipes repo.
If the coordinates file is not available you can either add one to the ggd-recipes repo or ask a member of the
ggd team to add one by requesting it using the <a class="reference external" href="https://forms.gle/3WEWgGGeh7ohAjcJA">GGD Recipe Request</a> Form.</p></li>
<li><p>Example: hg19 genome build coordinates file</p></li>
</ul>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">raw</span><span class="o">.</span><span class="n">githubusercontent</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">gogetdata</span><span class="o">/</span><span class="n">ggd</span><span class="o">-</span><span class="n">recipes</span><span class="o">/</span><span class="n">master</span><span class="o">/</span><span class="n">genomes</span><span class="o">/</span><span class="n">Homo_sapiens</span><span class="o">/</span><span class="n">hg19</span><span class="o">/</span><span class="n">hg19</span><span class="o">.</span><span class="n">genome</span>
</pre></div>
</div>
<ul class="simple">
<li><dl class="simple">
<dt>Next, identify what format you want the final files to be in and what processing needs to be done</dt><dd><ul>
<li><dl class="simple">
<dt>Example:</dt><dd><ol class="arabic simple">
<li><p>need to decompress a gzipped file</p></li>
<li><p>need to extract the chrom, start, end, size, type, and strand columns</p></li>
<li><p>needs to sort the resulting extraction</p></li>
<li><p>need to bgzip the new sorted extraction</p></li>
<li><p>need to tabix the bgzip sorted file</p></li>
</ol>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
</li>
</ul>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>If a data file can contain a header please add one. If a header can be added to a data file and it is not added,
the data recipe will be rejected during the PR until a header is added. (See next step for an example of adding
a header to the gaps data file.)</p>
</div>
<ul>
<li><dl>
<dt>Next, create bash script that contains the steps in order to extract and process the data file:</dt><dd><ul class="simple">
<li><p>Example (bash script)</p></li>
</ul>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>genome=https://raw.githubusercontent.com/gogetdata/ggd-recipes/master/genomes/Homo_sapiens/hg19/hg19.genome
wget --quiet -O - http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/gap.txt.gz \
| gzip -dc \
| awk -v OFS="\t" 'BEGIN {print "#chrom\tstart\tend\tsize\ttype\tstrand"} {print $2,$3,$4,$7,$8,"+"}' \
| gsort /dev/stdin $genome \
| bgzip -c > hg19-gaps-ucsc-v1.bed.gz
tabix hg19-gaps-ucsc-v1.bed.gz
</pre></div>
</div>
</dd>
</dl>
</li>
<li><p>In this data processing script please change the resulting data file names to be short and include all necessary
genomic file extensions. (See the NOTE bellow)</p></li>
</ul>
<p>You should run the script to make sure it works and that the processed files are what you expect them to be.</p>
<blockquote>
<div><div class="admonition note">
<p class="admonition-title">Note</p>
<p>The final data file names will be changed to reflect the new ggd recipe name. To keep the data file name as
short as possible please rename data files to include only a short name and the genomic file extensions. The name
will be replaced with the ggd recipe name, and the genomic file extension will be kept. For example, in the
hg19-gaps example above <em>gaps.bed.gz</em> and the tabix companion <em>gaps.bed.gz.tbi</em> will be renamed to <em>hg19-gaps.bed.gz</em>
and <em>hg19-gaps.bed.gz.tbi</em>. Because of the complexities with genomic file extensions all extensions will be retained
and only the beginning name before the first ‘.’ will be replaced with the recipe name.</p>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>Make sure that any intermediate files or other files used for data processing are removed after processing. Only the
final processed data files should remain once the script has finished. If extra files are not removed they will be
added as members of the data recipe, which is most likely un-wanted and un-needed.</p>
</div>
</div></blockquote>
</div>
<div class="section" id="create-a-ggd-recipe-using-the-ggd-cli">
<h2>3. Create a ggd recipe using the ggd cli<a class="headerlink" href="#create-a-ggd-recipe-using-the-ggd-cli" title="Permalink to this headline">¶</a></h2>
<p>The ggd command line interface (cli) contains tools to create and test a data recipe.</p>
<p>If it has not been installed, install the ggd cli following the steps outlined in <a class="reference internal" href="using-ggd.html#using-ggd"><span class="std std-ref">Using GGD</span></a>.</p>
<p>With the ggd cli installed you can now transform your bash script into a ggd recipe.</p>
<div class="section" id="example">
<h3>Example:<a class="headerlink" href="#example" title="Permalink to this headline">¶</a></h3>
<blockquote>
<div><p>Assuming your bash script created in step 2 is called <em>hg19_data_recipe.sh</em>, run the following command to turn
it into a ggd recipe:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ ggd make-recipe -s Homo_sapiens -g hg19 --author name \
--package-version 1 --data-version 27-Apr-2009 \
--data-provider UCSC -cb 0-based-inclusive \
--summary 'Assembly gaps from USCS' \
-k gaps -k region --name gaps hg19_data_recipe.sh
</pre></div>
</div>
<p>The <code class="code docutils literal notranslate"><span class="pre">ggd</span> <span class="pre">make-recipe</span></code> tool transforms the bash script you created into a data recipe. Running the above code will create
a data recipe called <em>hg19-gaps-ucsc-v1</em>, which will be a directory and will contain three files. For more information on the
<code class="code docutils literal notranslate"><span class="pre">ggd</span> <span class="pre">make-recipe</span></code> command see <a class="reference internal" href="make-recipe.html#ggd-make-recipe"><span class="std std-ref">make-recipe</span></a>.</p>
</div></blockquote>
</div>
</div>
<div class="section" id="build-install-and-check-the-data-recipe">
<h2>4. Build, install, and check the data recipe<a class="headerlink" href="#build-install-and-check-the-data-recipe" title="Permalink to this headline">¶</a></h2>
<p>Now that you have created a ggd data recipe you need to test it to make sure it not only extracts and processes the data, but
that the recipe was correctly created and provides the necessary instruction for data package creation.</p>
<p>To do this use the <code class="code docutils literal notranslate"><span class="pre">ggd</span> <span class="pre">check-recipe</span></code> command.</p>
<div class="section" id="id1">
<h3>Example:<a class="headerlink" href="#id1" title="Permalink to this headline">¶</a></h3>
<blockquote>
<div><p>Using the hg19-gaps recipe created in step 3, run the following command:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ ggd check-recipe hg19-gaps-ucsc-v1
</pre></div>
</div>
<p>Or if you are in a different directory on your machine run:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ ggd check-recipe <Path_To_hg19-gaps-ucsc-v1>
</pre></div>
</div>
<p>This command will build, install, and check the validity of the new ggd data recipe.
For more information about <code class="code docutils literal notranslate"><span class="pre">ggd</span> <span class="pre">check-recipe</span></code> see <a class="reference internal" href="check-recipe.html#ggd-check-recipe"><span class="std std-ref">check-recipe</span></a></p>
</div></blockquote>
</div>
</div>
<div class="section" id="submit-the-new-ggd-recipe-to-the-original-ggd-recipes-repo">
<h2>5. Submit the new ggd recipe to the original ggd-recipes repo<a class="headerlink" href="#submit-the-new-ggd-recipe-to-the-original-ggd-recipes-repo" title="Permalink to this headline">¶</a></h2>
<p>Once the ggd recipe you created passes step 4 you are ready to add it to the original ggd-recipes repo.</p>
<p>To do this you will need to create a <strong>pull request</strong>.</p>
<p>From your local machine, add the new data recipe you created to the forked ggd-recipes repo. You will add it
to the <code class="docutils literal notranslate"><span class="pre">recipes/</span></code> directory. If you do not put it in the right directory it will be rejected.
The recipes file convention is as follows:</p>
<blockquote>
<div><ul>
<li><p>All recipes are stored within the <strong>ggd-recipes/recipes</strong> directory</p></li>
<li><p>The recipes directory has the following format:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="o">/<</span><span class="n">path</span> <span class="n">to</span> <span class="n">forked</span> <span class="n">ggd</span><span class="o">-</span><span class="n">recipes</span> <span class="n">repo</span><span class="o">>/</span><span class="n">recipes</span><span class="o">/<</span><span class="n">ggd</span> <span class="n">channel</span><span class="o">>/<</span><span class="n">species</span><span class="o">>/<</span><span class="n">genome</span><span class="o">-</span><span class="n">build</span><span class="o">>/</span>
</pre></div>
</div>
<ul class="simple">
<li><p><code class="code docutils literal notranslate"><span class="pre"><path</span> <span class="pre">to</span> <span class="pre">forked</span> <span class="pre">ggd-recipes</span> <span class="pre">repo></span></code> is the path to the forked ggd-recipes repo on your local machine.</p></li>
<li><p><code class="code docutils literal notranslate"><span class="pre">recipes</span></code> is the <strong>recipes</strong> directory.</p></li>
<li><p><code class="code docutils literal notranslate"><span class="pre"><ggd</span> <span class="pre">channel></span></code> is the ggd channel that recipe should go in. This depends on the type of data you are adding.
For the hg19-gaps example the channel would be <strong>genomics</strong>.</p></li>
<li><p><code class="code docutils literal notranslate"><span class="pre"><species></span></code> is the species corresponding to the data. For the hg19-gaps example this would be <strong>Homo_sapiens</strong>.</p></li>
<li><p><code class="code docutils literal notranslate"><span class="pre"><genome-build></span></code> is the genome build for the data. For the hg19-gaps example this would be <strong>hg19</strong>.</p></li>
</ul>
</li>
</ul>
</div></blockquote>
<p>For the hg19-gaps recipe above you would use the following commands:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mv hg19-gaps-ucsc-v1 /<forked ggd-recipes>/recipes/genomics/Homo_sapiens/hg19/
</pre></div>
</div>
<p>Once the recipe is there you will need to add the recipe to your forked ggd-recipe repo.
Navigate to the forked ggd-recipe directory and use the following commands:</p>
<blockquote>
<div><ul>
<li><p>Add the recipe to the git repo:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ git add /recipes/genomics/Homo_sapiens/hg19/hg19-gaps-ucsc-v1/
</pre></div>
</div>
</li>
<li><p>Commit the addition to the repo (The vim text editor will open up. Add a comment about the new recipe and save it):</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ git commit
</pre></div>
</div>
</li>
<li><p>Push the commit to your fork repo on github (You will be asked to fill out your github credentials):</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ git push origin
</pre></div>
</div>
</li>
<li><p>Go to the ggd-recipes github page for your username (<a class="reference external" href="https://github.com">https://github.com</a>/<USERNAME>/ggd-recipes/).</p></li>
<li><p>Under the green “Clone or download” button click on <strong>Pull request</strong>.</p></li>
<li><p>Where it says <strong>base fork:</strong> make sure it is on <strong>gogetdata/ggd-recipes</strong>. And where it says <strong>base:</strong> make sure it
is on <strong>master</strong>.</p></li>
<li><p>Click the green <strong>Create pull request</strong> button.</p></li>
<li><p>Add some comments and complete the pull request.</p></li>
</ul>
</div></blockquote>
<p>You have now created a pull request with your new data recipe. The recipe will go through a continuous integration
step where the recipe will be tested.</p>
<p>If it passes, the recipe will be added to the gogetdata/ggd-recipes repo and anyone using the ggd tool will be
able to access it.</p>
<p>If it does not pass, you will be informed by the ggd team, and they will work with you on getting it working.</p>
</div>
</div>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="footer">
©2016-2021, The GoGetData team.
|
<a href="_sources/contribute-recipe.rst.txt"
rel="nofollow">Page source</a>
</div>
</body>
</html>