-
Notifications
You must be signed in to change notification settings - Fork 103
Tutorial: Command Line Crash Course
Khi Pin, Chua 19/01/2022
- Background and Introduction
-
Simple and useful Bash commands
pwd
(Present working directory)- Useful to know: Unix folder structures
cd
(Change directory)ls
- How to create a folder?
mkdir
“make a directory”! rm
andrmdir
: Removing files or directories (Warning, be careful!)touch
to create empty filecp
to copy files/directories aroundmv
to move files/directories around- View any text file on command line with
cat
,less
,head
andtail
history
of commandsgrep
-ing at the needle in the haystackcut
the specific columns you want to seesort
the way you like it
- Some useful command line’s “magic”
- Researchers often exchange data (Sequencing data, clinical data,
etc) in file format that’s not easily manipulated with graphical
user interface (GUI).
- The data may be too big and the softwares are not designed to work with them.
- Some softwares may accidentally introduce formatting errors. E.g. numbers get changed to date.
- With command line, it’s easy to peek quickly into what do you have on hand.
- Unix tools are ubiquitous and fast.
- Many tasks can be simplified and automated, saving valuable time. There are tasks that can be done quickly just using command line without going into a full-blown programming language.
- Combining scripting and command line softwares is key to reproducible research.
- Finally, access to high performance computing (HPC) resources is often done via command line. Today’s approach of using SSH is just one example.
- After logging into a server/computer via SSH in a terminal, you will
often see a “prompt” that contains informations on the user and the
name/address of the computer. The “prompt” ends in
$
. You can start typing any command after the$
mark and pressEnter
to execute the command. For example, try typingecho "hello world
and pressEnter
to show a message “hello world”
echo "hello world"
## hello world
- The command line “language” that you are using is called
Bash
which is the most common type of command line language you will see on most Unix-based system. Other shell/command language includesZsh
which is the default language for many newer MacOS system. - Most of the tools we’ll use can be found universally in most Unix-based systems regardless of the shell language.
-
Importantly, you can look at the manual for any command simply
by typing
man COMMAND
. For example, tryman echo
to read the manual ofecho
(You can pressq
to exit from the manual). -
man
provides very detailed documentations for many tools. However, most of them also come with a short “help” that you can print on the command line for quick reference in the form of “command --help
”. For example, trygrep --help
-
--help
is almost always available for many third-party command line softwares, including PacBio’s.
- The
pwd
command simply tell you where are you currently.
pwd
## /Users/khipinchua/OneDrive/Documents/PacBio/trainings_meetings/2022-1-19_KDRI_Japan_IsoSeq_Workshop/command_line_crash_course
- Whenever you see something like
/home/users/file1.txt
, it is a location on the server where the file is located in. - File paths always start with “/” to indicate the “root” (uppermost
directory containing the file). Imagine peeling an onion layer by
layer before you get to the innermost layer which is the file that
you want. The outermost layer/skin is the “root”
- Exception, you may see something like
~/file1.txt
. The~
sign is an alias that represents the “home folder” for the user (You). In most Linux, your home folder is usually/home/users/USERNAME
. In other words, if your username iskpin
, your home folder would be/home/users/kpin
and typing~
is the same as typing/home/users/kpin
- For example, you can type
echo ~
to see what~
means.
- Exception, you may see something like
echo ~
## /Users/khipinchua
- The
cd
command means “change directory”. Simply put, it helps you to bring your “present working directory” to another place! Try:
pwd
## /Users/khipinchua/OneDrive/Documents/PacBio/trainings_meetings/2022-1-19_KDRI_Japan_IsoSeq_Workshop/command_line_crash_course
# We will talk about "~" later
cd ~/command_line_crash_course/
pwd
## /Users/khipinchua/command_line_crash_course
- You can go back to the folder one layer up (e.g. before you enter
the
cd
command) by typing:
cd ..
pwd
## /Users/khipinchua/OneDrive/Documents/PacBio/trainings_meetings/2022-1-19_KDRI_Japan_IsoSeq_Workshop
-
In general,
..
refers to the directory one “layer” up, and.
refers to the current directory. -
Question: What would
cd ~
does? -
Question: What would
cd ../../
do?
-
ls
can be used to list the files and folders in the current directory
# Let's go into the crash course folder
cd ~/command_line_crash_course/
ls
## command_line_crash_course.Rmd
## command_line_crash_course.md
## folder
## folder_x
## folder_y
## large_text.txt
## small_text.tsv
-
ls
has parameters that allows you to format the output according to your liking, e.g.:-
-l
output the files line by line and show additional informations such as permissions -
-h
output the sizes in human-readable format (e.g. in kilobytes instead of bytes) -
-t
sort the files by date of which the file was modified -
-a
shows hidden files -
-R
list “recursively”, i.e. list all the files/directories as well as those in the sub-directories -
-d
list just directories
-
- The parameters can be combined instead of typing
-
multiple times, e.g.-lhtar
ls -lhtaR
## total 416
## -rw-r--r-- 1 khipinchua staff 17K Jan 18 19:16 command_line_crash_course.Rmd
## -rw-r--r-- 1 khipinchua staff 14K Jan 18 18:25 .Rhistory
## drwxr-xr-x 10 khipinchua staff 320B Jan 18 18:25 .
## -rw-r--r-- 1 khipinchua staff 24K Jan 18 18:06 command_line_crash_course.md
## drwxr-xr-x 19 khipinchua staff 608B Jan 17 16:52 ..
## -rw-r--r-- 1 khipinchua staff 153B Jan 10 21:23 small_text.tsv
## drwxr-xr-x 2 khipinchua staff 64B Jan 10 21:07 folder_y
## drwxr-xr-x 2 khipinchua staff 64B Jan 10 21:07 folder_x
## -rw-r--r-- 1 khipinchua staff 139K Jan 10 20:55 large_text.txt
## drwxr-xr-x 4 khipinchua staff 128B Jan 10 18:33 folder
##
## ./folder_y:
## total 0
## drwxr-xr-x 10 khipinchua staff 320B Jan 18 18:25 ..
## drwxr-xr-x 2 khipinchua staff 64B Jan 10 21:07 .
##
## ./folder_x:
## total 0
## drwxr-xr-x 10 khipinchua staff 320B Jan 18 18:25 ..
## drwxr-xr-x 2 khipinchua staff 64B Jan 10 21:07 .
##
## ./folder:
## total 8
## drwxr-xr-x 10 khipinchua staff 320B Jan 18 18:25 ..
## -rw-r--r-- 1 khipinchua staff 20B Jan 10 18:33 file1.txt
## drwxr-xr-x 4 khipinchua staff 128B Jan 10 18:33 .
## -rw-r--r-- 1 khipinchua staff 0B Jan 10 18:29 file2.txt
- Sometimes you executed a command that’s taking too long or it was
being executed accidentally, you can interrupt the command while
it’s running by pressing
Ctrl + C
(Ctrl
andc
key at the same time).- Note that this does not reverse what has been done by the
command before you interrupt it, e.g. if you accidentally
execute a command to delete a file, pressing
Ctrl + C
will not recover it
- Note that this does not reverse what has been done by the
command before you interrupt it, e.g. if you accidentally
execute a command to delete a file, pressing
- Run the following command line which literally tells the command line to wait for 5000 seconds and try to terminate it.
sleep 5000
- You can add a folder path to
ls
command to directly list the files and folders inside that path instead of listing what’s in the present working directory.
ls -lh folder
## total 8
## -rw-r--r-- 1 khipinchua staff 20B Jan 10 18:33 file1.txt
## -rw-r--r-- 1 khipinchua staff 0B Jan 10 18:29 file2.txt
- The command
mkdir
makes a directory at your present working directory.- Question How do you find your present working directory?
# Remember "-d" list only directories. If "tmpdir" does not exist, it'll
# complain
ls -ld tmpdir
## ls: tmpdir: No such file or directory
mkdir tmpdir
ls -ld tmpdir
## drwxr-xr-x 2 khipinchua staff 64 Jan 18 19:20 tmpdir
- Sometimes you want to make a folder 2 “levels” down, for example a
folder called
test2
inside a foldertest1
, you can either:
mkdir test1
mkdir test1/test2
# This will list everything inside "test1" folder
ls -lh test1
## total 0
## drwxr-xr-x 2 khipinchua staff 64B Jan 18 19:20 test2
or you can use the -p
parameter to instruct mkdir
to create any
“parents” folder necessary for what you want to create:
mkdir -p test2/test3
# This will list everything inside "test1" folder
ls -lh test2
## total 0
## drwxr-xr-x 2 khipinchua staff 64B Jan 18 19:20 test3
-
Question: Can you make a directory called
isoseq_cli
inside the folderworkshop_data
? -
Question: What would
mkdir ./test1
do?
-
rm FILE
is used to delete a file. For example, if we want to delete a file called “tmp1.txt”:
# There's a file called "tmp1.txt" here
ls tmp1.txt
## tmp1.txt
# Let's delete it
rm tmp1.txt
# Is it still there?
ls tmp1.txt
## ls: tmp1.txt: No such file or directory
- What if you want to delete a folder/directory? Both
rmdir
orrm -r
(remove recursively) can be used:
# We want to delete the folders created just now "tmpdir", "test1" and "test2"
ls -ld tmpdir test1 test2
## drwxr-xr-x 3 khipinchua staff 96 Jan 18 19:20 test1
## drwxr-xr-x 3 khipinchua staff 96 Jan 18 19:20 test2
## drwxr-xr-x 2 khipinchua staff 64 Jan 18 19:20 tmpdir
rm -r tmpdir test1 test2
ls -ld tmpdir test1 test2
## ls: test1: No such file or directory
## ls: test2: No such file or directory
## ls: tmpdir: No such file or directory
# Make the folder again so we can delete it with rmdir
mkdir tmp_folder
ls -ld tmp_folder
## drwxr-xr-x 2 khipinchua staff 64 Jan 18 19:20 tmp_folder
# With rmdir
rmdir tmp_folder
# Still there?
ls -ld tmp_folder
## ls: tmp_folder: No such file or directory
-
One crucial difference is that
rmdir
will not remove a folder with any content inside (including empty folder(s)), so it’s “safer” thanrm -r
which **will* remove a folder even if the folder is not empty. -
Question: Can you make a directory called
isoseq
inside a folder calledjapan
, then delete them?
- There’s a utility called
touch
that creates empty file. Says for example you want to create an empty text file to practice therm
command:
# Look for a file called "useless.txt"
ls -lht useless.txt
## ls: useless.txt: No such file or directory
# Create an empty file called "useless.txt"
touch useless.txt
ls -lht useless.txt
## -rw-r--r-- 1 khipinchua staff 0B Jan 18 19:20 useless.txt
# Delete it
rm useless.txt
ls -lht useless.txt
## ls: useless.txt: No such file or directory
- Note that if the file that you are already
touch
-ing already exists,touch
will only update the timestamp (modified time of the time) without creating or modifying the content of the file.
-
cp FILE/FOLDER DESTINATION
is used to make a copy of the file and folder that you point it to. - For example, let’s create an empty file called “file1.txt” and make a copy of it called “file2.txt”
# Look for a file called "file1.txt"
ls -lht file1.txt
## ls: file1.txt: No such file or directory
# Create an empty file called "file1.txt"
touch file1.txt
ls -lht file1.txt
## -rw-r--r-- 1 khipinchua staff 0B Jan 18 19:20 file1.txt
# Make a copy of it called file2.txt
cp file1.txt file2.txt
ls -lht file1.txt file2.txt
## -rw-r--r-- 1 khipinchua staff 0B Jan 18 19:20 file2.txt
## -rw-r--r-- 1 khipinchua staff 0B Jan 18 19:20 file1.txt
# What happens if you copy the file to itself?
cp file1.txt file1.txt
## cp: file1.txt and file1.txt are identical (not copied).
# You can also copy the file into a folder
mkdir tmp_folder
# Empty inside
ls -lhtr tmp_folder
## total 0
cp file1.txt tmp_folder/
# file1.txt has a copy inside tmp_folder now
ls -lht tmp_folder
## total 0
## -rw-r--r-- 1 khipinchua staff 0B Jan 18 19:20 file1.txt
# Let's delete unused files and folders
rm -r file1.txt file2.txt tmp_folder
-
mv FILE/FOLDER DESTINATION
moves the file/folder to a destination.
# Make a directory called tmp_folder and a file called file1.txt
mkdir tmp_folder
touch file1.txt
ls -lh file1.txt tmp_folder
## -rw-r--r-- 1 khipinchua staff 0B Jan 18 19:20 file1.txt
##
## tmp_folder:
## total 0
# Now let's move file1.txt into tmp_folder
mv file1.txt tmp_folder/
# file1.txt is now in the folder tmp_folder
ls -lht file1.txt tmp_folder
## ls: file1.txt: No such file or directory
## tmp_folder:
## total 0
## -rw-r--r-- 1 khipinchua staff 0B Jan 18 19:20 file1.txt
-
mv
is also used to rename file/folder, since renaming is identical to moving a file with “name1” to a file with “name2”!
mv tmp_folder tmp_folder_renamed
# The "-d" parameter list only the folder without listing what's in the folder
ls -ld tmp_folder
## ls: tmp_folder: No such file or directory
ls -ld tmp_folder_renamed
## drwxr-xr-x 3 khipinchua staff 96 Jan 18 19:20 tmp_folder_renamed
# Delete the folder after use
rm -r tmp_folder_renamed
- Question: What if we want to make a copy of the folder? (Hint: Copy recursively)
- Notice that for both copy and moving into a folder, there’s a
trailing “
/
” following the destination folder? (e.g.cp file1.txt tmp_folder/
andmv file1.txt tmp_folder/
) If the destination folder does not exist, thecp
andmv
command will either make a copy (withcp
) or rename (withmv
) your file/folder with the supposedly folder name.
- We often want to “peek” at the content of a text file (e.g. a
sequence FASTA, although with PacBio the sequence may be too long to
fit into the screen!). This can be done via
cat
:
# There's thousands of lines printing on the screen!
cat large_text.txt
- As you can see, sometimes the text is too large! You can use the
“pager” utility called
less
that opens up the text file but allow you to navigate “page by page”. You can move up and down using the arrow key, or usew
andSpacebar
to move up and down by one page. Pressq
orCtrl + C
to exit fromless
.
less large_text.txt
- What if you want to look at just the first few lines or the last few
lines? Two intuitively named tools called
head
andtail
come to rescue:
# First few lines
head large_text.txt
# Specify the number of lines you want to see with "-"
head -5 large_text.txt
## japan_1 iso-seq line 1
## japan_2 iso-seq line 2
## japan_3 iso-seq line 3
## japan_4 iso-seq line 4
## japan_5 iso-seq line 5
# Tail few lines
tail large_text.txt
# Similarly, specify the number of line with "-"
tail -5 large_text.txt
## japan_4996 iso-seq line 4996
## japan_4997 iso-seq line 4997
## japan_4998 iso-seq line 4998
## japan_4999 iso-seq line 4999
## japan_5000 iso-seq line 5000
-
You can use the arrow key to go back and forth to the previous commands (and all other previous commands) that you’ve executed.
-
Alternatively, in most shells/terminals you can press
Ctrl + R
on your keyboard, and search for the command using specific keywords (e.g. try typingls
after pressingCtrl + R
). -
Most of the shells come with a command called
history
that will show you a history of the command that you’ve typed. -
Try: Combine
history
withtail
to look at the last few commands you’ve executed.
history | tail -5
-
grep
is a very powerful command that uses “Regular expression” to search for texts in a file. - In the most basic use,
grep
searches for the regular expression pattern provided and output the lines that matches it - Let’s say we want to find “line 1254” in the
large_text.txt
file:
# Search for line 1254
grep 'line 1254' large_text.txt
## japan_1254 iso-seq line 1254
- Regular expression is very powerful tool. Search for “regular expression tutorial” on the internet if you want to learn more about it. For example, if I want to find any line that ends with “5”
# The $ "anchor" looks for anything that ends with the character before it
# Note that the "-m" parameter here tells grep to stop searching after finding
# 10 matches
grep -m 10 '5$' large_text.txt
## japan_5 iso-seq line 5
## japan_15 iso-seq line 15
## japan_25 iso-seq line 25
## japan_35 iso-seq line 35
## japan_45 iso-seq line 45
## japan_55 iso-seq line 55
## japan_65 iso-seq line 65
## japan_75 iso-seq line 75
## japan_85 iso-seq line 85
## japan_95 iso-seq line 95
- In bioinformatics, we often work with
csv
andtsv
files, which are files whereby a table is saved in a format of which the columns are separated by comma (csv
) or tab (tsv
). Sometimes, we only need a specific column:
# What's inside small_text.tsv?
cat small_text.tsv
## japan_1 iso-seq line 1 1
## japan_2 iso-seq line 2 2
## japan_3 iso-seq line 3 3
## japan_4 iso-seq line 4 4
## japan_5 iso-seq line 5 5
## japan_11 iso-seq line 11 11
# Only need the first column of the file
cut -f1 small_text.tsv
## japan_1
## japan_2
## japan_3
## japan_4
## japan_5
## japan_11
# What if I want the first and the third column?
cut -f1,3 small_text.tsv
## japan_1 line 1
## japan_2 line 2
## japan_3 line 3
## japan_4 line 4
## japan_5 line 5
## japan_11 line 11
- In the example above, we extracted the first column of a
tsv
file. What if we want to sort thattsv
file by the 4th column in reverse order?
# "-k4" tells "sort" we want to sort the tsv file by the 4th column
# "-r" tells "sort" we want the sorting to be in the reverse order
sort -k4 -r small_text.tsv
## japan_5 iso-seq line 5 5
## japan_4 iso-seq line 4 4
## japan_3 iso-seq line 3 3
## japan_2 iso-seq line 2 2
## japan_11 iso-seq line 11 11
## japan_1 iso-seq line 1 1
- Note that
sort
by default sort using all ASCII characters. You would have noticed that in the example above, “11” comes before “2”. You can tellsort
to sort numerically by:
sort -k4 -r -n small_text.tsv
## japan_11 iso-seq line 11 11
## japan_5 iso-seq line 5 5
## japan_4 iso-seq line 4 4
## japan_3 iso-seq line 3 3
## japan_2 iso-seq line 2 2
## japan_1 iso-seq line 1 1
- In bash, the character “
*
” is called a wildcard and is often used to represent “anything” before or after a string. For example, if I want to find any folder that starts with “folder”:
# The '-d' parameter specify that we only want to list folders and not files
ls -ld folder*
## drwxr-xr-x 4 khipinchua staff 128 Jan 10 18:33 folder
## drwxr-xr-x 2 khipinchua staff 64 Jan 10 21:07 folder_x
## drwxr-xr-x 2 khipinchua staff 64 Jan 10 21:07 folder_y
# If you want to remove all folders starting with "folder"
rm -r folder*
- Try typing
mkd
and pressTab
on your keyboard. - Now try typing
mk
and pressTab
on your keyboard - In general,
Tab
will help you to complete what you are about to type if there’s an unique command that starts with what you’ve typed (e.g. Onlymkdir
starts withmkd
). - When there’s multiple commands that start with what you’ve typed,
the
bash
shell will provide suggestions on all the possibilities.
- We often want to use the output of a command for our next
command. * The
pipe
operator is “|
”s is designed for such scenario. E.g. after searching for all the lines that end with 5 usinggrep
, we want to print the final 5 matches withtail
:
# Print the last 5 lines that end with "5" in large_text.txt
grep '5$' large_text.txt | tail -5
## japan_4955 iso-seq line 4955
## japan_4965 iso-seq line 4965
## japan_4975 iso-seq line 4975
## japan_4985 iso-seq line 4985
## japan_4995 iso-seq line 4995
- Let’s say after you search for the lines that end with “5”, how do you “save” it?
- In bash, the
redirect
operator is “>
”:- Note that
>
will create a new file. If there’s a file with the same name of what you want to redirect to, it’ll be overwritten! - Be very careful not to overwrite any file that you don’t want to.
- Note that
# Look for all the lines that end with 5 and redirect ("save") it in a file
# called "end_with_5.txt"
grep '5$' large_text.txt > end_with_5.txt
ls -lh end_with_5.txt
## -rw-r--r-- 1 khipinchua staff 14K Jan 18 19:20 end_with_5.txt
# Check the first and last 5 lines of the file
head -5 end_with_5.txt
tail -5 end_with_5.txt
## japan_5 iso-seq line 5
## japan_15 iso-seq line 15
## japan_25 iso-seq line 25
## japan_35 iso-seq line 35
## japan_45 iso-seq line 45
## japan_4955 iso-seq line 4955
## japan_4965 iso-seq line 4965
## japan_4975 iso-seq line 4975
## japan_4985 iso-seq line 4985
## japan_4995 iso-seq line 4995
- The “
>>
” operator (Two “>
” joined together) will append (add to the end of the file) instead of writing a new file:
# Look for first 5 lines that end with 6 and append them to end_with_5.txt
grep -m 5 '6$' large_text.txt >> end_with_5.txt
# Check the last 10 lines of the file
tail -10 end_with_5.txt
## japan_4955 iso-seq line 4955
## japan_4965 iso-seq line 4965
## japan_4975 iso-seq line 4975
## japan_4985 iso-seq line 4985
## japan_4995 iso-seq line 4995
## japan_6 iso-seq line 6
## japan_16 iso-seq line 16
## japan_26 iso-seq line 26
## japan_36 iso-seq line 36
## japan_46 iso-seq line 46
-
sed
is a tool that can be used to edit text usingregex
. -
awk
is a fantastic tool that can be used to carry out text processing. -
for
loop and conditional (e.g.if
) statements. - Editing file on command line using text editors such as
vi
,nano
andemacs
.- Check out
vimtutor
for a quick 30 minutes crash course onvim
(improved version ofvi
)
- Check out
- Writing bash scripts (multiple bash commands in a file).
- Using startup script such as
.bashrc
or.bash_profile
to setup the environment the way you like it. - Concept of stdout and stderr.
- Concept of Unix permission.
-
scp
andrsync
to transfer/sync between SSH session and local server.