-
Notifications
You must be signed in to change notification settings - Fork 1
/
Lesson 1: Linux Shell
517 lines (381 loc) · 16.9 KB
/
Lesson 1: Linux Shell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
# Software Carpentry Workshop
## Lesson1: Linux Shell: Part 1
This lesson is based on [this Software Carpentry lesson](http://swcarpentry.github.io/shell-novice/)
#### Please make sure your directory structure is setup as described [here](https://github.com/uta-carpentries/SoftwareCarpentryWorkshops_general/blob/master/Data_DirectoryStructure_Setup.md)
### 1. Why learn Linux?
+ Most widely used OS in research and technology
+ Open-source: free to get, free to use, free to modify.
+ Availability: Linux VM, Linux servers/clusters (UTA, TACC)
+ CLI (command line interface) and GUI (graphical user interface) are available
### 2. Bash as shell program to interact with Linux OS
+ Bash is one of shell programs that can be used to interact with Linux OS. It is a program that runs other programs and it is also a programming language itself.
+ Bash is available on Unix/Linux/mac machines by default through Terminal. On Windows, bash is available through GitBash that you installed earlier. Both applications provide command-line interface (CLI) and will be referred in our lessons as "terminal" or "terminal window".
Let’s get familiar with the terminal first: prompt, commands, output. Open your terminal - you will see text with `$` as the last character. The text displays your username, name of the computer and your location. The system is waiting for your commands after `$` sign.
```
# who am I? Try
$ whoami
```
When the user types a command and then presses the enter (or return) key, the computer reads it, executes it, and prints its output. This is known as Read-Evaluate-Print loop of CLI. For example, here you ask to print a message with 'echo' command. Try:
```
$ echo "Welcome to our workshop"
```
`echo` command is read, executed and the output is printed. Only commands known to bash will be executed. Try:
```
$ stop
```
You will see `command not found` message. This means that Bash does not recognize this command.
Here is the list of the commands we will use in the first half of the lesson. Not too many, but when used effectively, do simplify and speed up your work enormously!
``` shell
$ echo # print
$ whoami # prints username
$ pwd # print working directory path
$ cd # change directory
$ mkdir # make directory
$ touch # create empty file
$ cat # view file/concatinate files
$ mv # move/rename file
$ cp # copy file
$ rm # delete file
```
You can get an idea about the functionality of Linux commands from [this cheat sheet](https://files.fosswire.com/2007/08/fwunixref.pdf)
### 2a. Bash: navigate through your file system
Know who you are and where you are! Let’s talk about file system (folders and files) and how to find your way around …
Note: when you see `#` in the code below, it serves as a comment to explain the code, not a command. Commands that you should try are typed after `$` sign
And if I were to ask you where are you? Minimize your terminal for a second. You are back in your familiar GUI environment. Where are you? What folders are immediately available to you? Desktop folders? Open the terminal. Can you navigate to desktop using CLI?
```
# where am I? Try
$ pwd
```
`pwd` command prints your current location (or path to working directory) in the computer. The path you see now should display the path to your home directory. You can check this by typing `echo $HOME`
Let's talk more about the output of `pwd`.
**File system**: root, working, current, parent directories…
![](http://i1.wp.com/mycodinglab.com/wp-content/uploads/2014/01/Linux-File-System-Mycodinglab.jpg)
File systems have a hierarchical structure. In a hierarchical file system, the folders and files are displayed in groups, which allows the user to see only the files they’re interested in seeing. For example, in the picture above, the root folder contains folders named `bin`, `etc`, `home`, *etc.* Each of these folders could have hundreds of their own files, but unless they are opened the files are not displayed.
In GUI the user views the contents of each folder by double-clicking the folder.
With command-line user interface , the folders (also known as directories) are listed as text.
Let's see how you would list the files inside directory using command-line interface:
```
# list files
$ ls
# use flags to show different details: Try
$ ls -F
```
`-F` indicates file type — Adds a symbol to the end of each listing:
`/` to indicate a directory;
`@` to indicate a symbolic link to another file;
`*` to indicate an executable file
There are special help pages (manuals) that explain what every command does and what options (flags) can be used with it. To access these pages, use `man` command
```
$ man ls
# or
$ ls --help
```
The output provides full documentation for a command, everything you need to know is there!
Other useful options of `ls` command are:
```
# Option -a (all) — Lists all files in the
#directory, including hidden files (.filename).
$ ls -a
#Option -l (long) — Lists details about contents,
#including permissions (modes), owner, group,
#size, creation date, whether the file is a link
#to somewhere else on the system and where its link points.
$ ls -l
```
So far, you were looking at the contents of your home directory. Let's see the contents of the Desktop directory
```
$ ls -F Desktop
```
If you do not have `Desktop` directory, it will give you an error...
#### **Challenge 1**
Remember, we added `SCW` folder to the Destktop in the beginning of the workshop? Can you now list contents of `SCW` directory from your current directory?
#### **Solution:**
```
ls -F Desktop/SCW/
```
In the exercise above, you viewed files in `SCW` directory while in home directory. But you can change the directory and move directly into `SCW` directory.
```
$ cd Desktop
$ pwd
$ cd SCW
$ pwd
# you could have done it in one step:
$ cd Desktop/SCW
$ ls -F
```
So, `cd` lets you change directories into daughter subdirectories. What if you need to go back?
```
$ cd Desktop
#does not work, cd gets you into
#directories inside the current directory only
# But try this:
$ cd .. #works!
# Why? Let's list ALL files in `SCW` folder. Option -a displays hidden directories as well
$ ls -F -a
```
`..` is a special directory name meaning “the directory containing this one”, or the parent of the current directory and `.` means “the current working directory”. In our case, `..` refers to `Desktop` directory and `.` refers to `SCW` directory.
But `cd` without any arguments takes you ... where?
```
$ cd
$ pwd
```
**Relative vs Absolute Paths**
Navigation with `cd ..` is an example of the relative path - you indicate where you want to go relative to the location you are currently in. In contrast, when you give an absolute path to the folder or file, you will end up there no matter what your current location within file system is.
```
$ pwd #specifies absolute path to current directory
```
Notice the format of the absolute path: it starts with forward slash indicate the root directory with every subsequent forward slash separating folders that you go through to get the final folder or file
Other shortcuts:
```
$ cd ~ #takes you home
$ cd - #takes you BACK one directory, NOT UP!!!
```
#### **Challenge 2**
Make a diagram of our directory structure (~/Desktop/SCW) and practice navigation commands.
a) Find out where you currently are.
b) Go to `Data` folder, what is there?
c) Go back where you came from (with `cd -` command).
d) Go one directory up.
e) Try relative and absolute paths.
f) Get comfortable navigating across file system - try changing directories into different folders within `SCW` folder and explore there contents. Use `-a` flag, does every directory have hidden files?
### 2b. Bash: make new files and directories
Now that you know how to navigate your file system and list files that are already there, let’s see how we can create new folders and files.
You can create, delete, move, copy, rename files and directories using linux commands.
```
#Navigate to `SCW` folder
$ cd # takes you home no matter where you are at the momemt
$ cd Desktop/SCW
#check you are in the correct place
$ pwd
#go into Lesson1_UnixShell directory
$ cd Lesson1_UnixShell
#now lets make two new directories inside Lesson1_UnixShell folder
$ mkdir India
$ mkdir Japan
#now check to see if you have these two directories in the folder
$ ls
#create file
$ touch MyFirstFile.txt # creates empty file
#view file
$ cat MyFirstFile.txt
#Or create and open MyFirstFile.txt
#if you have Sublime Text as test editor
$ subl MyFirstFile.txt
#if you are on mac working with default text editor
$ open MyFirstFile.txt
```
When you open `MyFirstFile.txt` in a text editor, type:
```
This is Software Carpentry Workshop
```
Save and close it.
Now let's see the contents of the file in Bash:
```
$ cat MyFirstFile.txt
```
You should see the line `This is Software Carpentry Workshop` in the terminal
Now lets try to rename, move, copy and delete files
```
# move/rename file
# general syntax: mv $source $destination
$ mv MyFirstFile.txt MyFirstScript.sh
# Now try to move (but do not rename) MyFirstScript.sh to Japan folder
$ mv MyFirstScript.sh Japan/
# Copy file from `Japan` to `Lesson1_UnixShell`
# General cp syntax: cp $source $destination
$ cp Japan/MyFirstScript.sh .
# remember `.` means working directory, which is `Lesson1_UnixShell` in our case
#Now lets navigate to folder ‘India’
$ cd India/
#Now copy the file ‘MyFirstScript.sh’ to India
cp ../MyFirstScript.sh .
# Delete file !!! CANNOT BE UNDONE
# clean up `India`
$ rm MyFirstScript.sh
# now lets clean up `Lesson1_UnixShell`
#go back to Lesson1_UnixShell
$ cd ..
#try to delete folders India and Japan
rm -r India Japan
#deletes the folders India and Japan
```
#### **Challenge 3**
After the break we will work with files from the `Data` folder located inside `SCW` folder. Your challenge is to copy `Data` folder to `Lesson1_UnixShell` folder. Make sure you understand the directory structure of `SCW` before we continue.
**Solution**
```
cp -r ../Data .
# SCW directory structure
SCW
-> Data/
-> Data_Copy
-> Lesson1_UnixShell/
-> Data/
-> Lesson2_IntroToR
-> Lesson3_Git_GitHub
-> Lesson4_ProgrammingR
-> Lesson5_DataVisualizationR
-> Lesson6_ReproducibleResearchR
```
-------- coffee break ---------------
## Lesson1: Linux Shell: Part 2
#### Learning objectives:
- learn linux tools to manipulate text files
- learn how to work with multiple files
- learn how to write simple shell scripts
### 2c. Bash: manipulate text files
Here is the list of the commands we will use in this second half of the lesson.
```
$ wc #word count
$ head/tail #display start/end of file
$ cut #extract fields(columns) of interest from file
$ sort #sort file
$ uniq #select uniq lines only
$ grep #select rows based on content
```
Let's start with looking at some real data.
You copied `Data` folder into `Lesson1_UnixShell` before the break. Let's see what is in `Data` folder.
How will you do this?
Start from `SCW` folder.
```
$ cd Lesson1_UnixShell
# view folder contents:
$ ls
# go to Data
$ cd Data
# What is here?
$ ls Data
```
Here we have a file, `gapminder.txt` and a `ByCountry` folder. We will be working with one of the files in `ByCountry` folder and return to `gapminder.txt` later.
```
$ cd ByCountry
$ ls
```
Looks like we have some information about different countries here. Let's explore `Uruguay.txt`
```
$ cat Uruguay.txt
# note: use tab completion feature by typing the beginning of the text file name and then `tab`
```
What information do we have here? This file is short and you can examine it by eye. But in case the file is long, you want to be able to examine datasets like that with Linux commands.
Let's copy`Uruguay.txt` to `Lesson1_UnixShell` folder and then play with the dataset.
```
#how would you copy to Lesson1_UnixShell?
$ cp Uruguay.txt ../../
# change directory to Lesson1_UnixShell
$ cd ../../
```
Some questions you might want to ask about this file:
- What is in the file?
- How big is this file?
- What is the highest life expectancy?
Q1: what is in the file?
```
$ cat Uruguay.txt
$ less Uruguay.txt # type `q` to return to command prompt
$ head -n3 Uruguay.txt
$ tail -n5 Uruguay.txt
```
Q2: how big is your file?
```
#how big is your file `wc` outputs lines, words, bytes
$ wc Uruguay.txt
$ wc -l Uruguay.txt # remember to use `man wc` or `wc --help` to see all command options
```
Q3: What is the highest life expectancy? We need more than one step here! We need to extract numerical values in the 4th column,sort them numerically and record the max value to file.
```
#remove the header
# Use 'grep' to select lines that contain the word 'Uruguay'
$ grep 'Uruguay' Uruguay.txt >Uruguay_noH.txt
#or you can select lines that DO NOT contain the word 'country' that our header starts with
grep -v 'country' Uruguay.txt > Uruguay_noH.txt
# get 4th column
# Use 'cut' to select column
$ cut -f4 Uruguay_noH.txt > Uruguay_LE.txt
#sort output numerically and write to new file
$ sort -n Uruguay_LE.txt > Uruguay_LE_sorted.txt
#finally get the last line with max value
$ tail -n 1 Uruguay_LE_sorted.txt > Uruguay_LE_max.txt
```
Notice that to get max life expectancy in this dataset we created three intermediate files that we do not really need... One way to avoid generating such files is to string different commands together. This is known as ‘piping’
**Pipes**
Now let’s see how we can string commands together.
We use `|` symbol to pass the output of one command as an input to the next command.
Our question 3 can be answered with one line of code without creating unnecessary intermediate files!
```
$ grep 'Uruguay' Uruguay.txt | cut -f4 | sort -n |tail -n1 > Uruguay_LE_max2.txt
```
#### **Challenge 4**
Use pipes to find the year with the highest life expectancy in `Sweden.txt`. Record the output to `Sweden_YearWithMaxLE.txt`.
First, copy `Sweden.txt` to Lesson1_UnixShell:
`cp Data/ByCountry/Sweden.txt .`
Hints: First, describe the steps you need to get the answer in natural language (English or any other). Then think how to implement them using the commands you just learned. To select multiple columns, use `cut -fcol1,col2` For example, to select columns 2 and 3, use `cut -f2,3`. To sort by column 2, use `sort -k2`
**Solution**
```
$ grep 'Sweden' Sweden.txt | cut -f3,4 |sort -n -k2|tail -n1|cut -f1 >Sweden_YearWithMaxLE.txt
```
### 2d. Working with multiple files
There are two ways we can work with multiple files at the same time:
- using wildcards
- using loops
**Wild cards**
If we want to apply a command to a group of files , we can use `*` wildcard:
`*` substitutes for any 0 or more characters
```
$ cd Data/ByCountry
#list all .txt files
$ ls *.txt
#count lines in all .txt files that start with 'G
$ wc -l G*.txt
```
**Loops**
First, let's talk about variables in Bash.
Varible name: myName; value assigned to it: Your Name
```
#try this
$ myName=Joe #variable assignment
$ echo Joe
$ echo myName
# need `$` to access the value of the variable
$ echo $myName
```
General syntax of for loop in Bash:
```
for VARIABLE in file1 file2 file3
do
command $VARIABLE
done
```
You can write it as one line:
```
$ for VARIABLE in file1 file2 file3; do command $VARIABLE; done
```
When the shell sees the keyword for, it knows to repeat a commands between `do` and `done` once for each item in a list that follows `in` keyword. Each time the loop runs (called an iteration), an item in the list is assigned in sequence to the variable, and the commands inside the loop are executed, before moving on to the next item in the list. Inside the loop, we get the value of the variable by putting `$` in front of it.
Here is an example:
```
$ for file in Zambia.txt Zimbabwe.txt; do echo $file;done
# OR using wildcard:
$ for file in Z*.txt; do head -n3 $file; done
#file here is a variable; its value is Zambia.txt
#when loop executes the first time(first iteration)and Zimbabwe.txt
#when loop executes the second time(second iteration)
```
#### **Challenge 5**
1. Compare the output of the two for loops below.
Can you explain why the outputs are different?
```
$ for datafile in G*.txt;do ls G*.txt;done
$ for datafile in G*.txt;do ls $datafile;done
```
2. Write a for loop that outputs the highest life expectancy with a corresponding year and country name for every file that starts with `U` and contains `a`
**Solution**
```
$ for file in U*a*; do cut -f1,3,4 $file|sort -nk3|tail -n1 ;done
# And if you want to write output to file
$ for file in U*a*; do cut -f1,3,4 $file|sort -nk3|tail -n1 >> Highest_LE.txt;done
```
***Note:***
Notice that you are using the append symbol >> here to save the output to a file before 'done'. This file will be created after the loop runs for the first time, and then for each reiteration, the outputs will be appended to this file.
Instead, you could also use
```
$ for file in U*a*; do cut -f1,3,4 $file|sort -nk3|tail -n1; done > Highest_LE.txt
```
In this case, the final output is saved to the file once the loop is completed.