Intermediate Linux command line tutorial
My wife recently made the jump from experimental physics to software engineering. She mentioned she had a hard time finding intermediate-level resources for using the command line in Linux. There are lots of articles that just teach you how to use cd
, ls
, mv
, and rm
, and then there are man pages that tell you the options for something you already know how to use, but there's not as much in between.
This article tries to bridge that gap. This is a concise collection of tips that will help you be more productive on the command line without getting into Linux internals or non-standard tools. I won't go into too much depth on any one topic, so I encourage you to experiment and research. Advice here generally applies not only to Linux but also to macOS (which has the same shell).
Bash
Bash is a shell, a program that accepts commands and lets you run other programs from the terminal. Bash stands for Bourne Again SHell; it's the successor to the Bourne shell. Bash isn't the only shell program around: there are csh, ksh, fish, dash, and many others. However, bash is the default on many systems, and I prefer it for simplicity.
Bash is also a programming language. You'll spend most of your time writing simple commands in the terminal interactively, but you can also chain commands together and write loops and functions. You can write many commands in a script file and execute the script as a program. I won't focus on scripting in this article, but it's important to know that it's an option.
Built-in commands and programs
Most of the time, when you run a command in bash, you're running a program installed somewhere on your system. Bash has a few commands built-in though. You can tell whether a command is built-in using the type
command. type
is also useful for finding where a program is installed.
$ type ls ls is /bin/ls $ type echo echo is a shell builtin $ type cd cd is a shell builtin $ type type type is a shell builtin
Style note: in this article's examples, lines that start with $
are commands typed into the terminal. Lines that start with #
are comments.
Globs and string expansion
A glob is a string of characters that expands to match files in the current directory. A glob includes wildcard characters: ?
matches any single character, *
matches any sequence of characters.
# List files ending with .txt $ ls *.txt foo.txt bar.txt # This is equivalent (after expansion) $ ls foo.txt bar.txt foo.txt bar.txt # Delete files starting with a $ rm a*
Bash will replace globs on the command line with the list of matching file names in the current directory. If no files match the glob, the shell will leave the glob in place rather than replacing it with nothing. This is why the command ls *.xyz
does something different than ls
when there are no .xyz files.
Some commands, like find
, need to match files themselves without Bash's help. Others, like grep
, use special characters that look like globs. To prevent the shell from expanding globs in an argument, wrap the argument in quotes. More on quoting in the next section.
# Find files in subdirectories ending with .txt $ find . -name '*.txt' # Print lines in a file matching a regular expression $ grep 'abc.*xyz' foo.txt
Braces are another useful way to expand strings. You can write strings separated by commas in braces as part of an argument. The argument will be repeated with each string substituted for the braces. This is hard to describe, so here are a few examples. Play around with it.
# Simple expansion $ echo foo{a,b,c} fooa foob fooc # List files starting with IMG_, ending with jpg or jpeg $ ls IMG_*.{jpg,jpeg} IMG_001.jpg IMG_002.jpeg # Make a backup copy of a file $ cp foo.go{,~}
Quoting
Quotes prevent Bash from interpreting special characters in arguments. It's especially important for arguments that contain spaces, since Bash normally uses spaces to split arguments. Quotes are also useful to prevent characters like *
from being expanded.
# Use quotes to write an argument with spaces $ git commit -m 'Fixed a bug with A* pathfinding' # Use quotes to prevent globbing $ find . -name '*.txt'
Bash treats single and double quotes differently. Single quotes are literal: what appears in a string is what gets passed to a program. Double quotes are similar to single quotes, but they also allow variable expansion.
$ echo 'no place like $HOME' no place like $HOME $ echo "no place like $HOME" no place like /home/jay
Variables
You may see words prefixed with dollar signs, like $HOME
above. These are variables. A variable is an association between a name like HOME
and a string of characters like /home/jay
.
# Set a variable $ FOO=abc # Print a variable $ echo $FOO abc # Remove a variable $ unset FOO # Print all variables $ set BASH=/bin/bash BASH_ALIASES=() BASH_ARGC=() BASH_ARGV=() ...
Note that the dollar sign is only used when the value of a variable is being used as part of a larger expression. The dollar sign is not needed when you're setting or unsetting the variable.
Environment variables are special variables that are visible to programs you run. Most of these have well-known names like HOME
or DISPLAY
or USER
. You can turn a regular variable into an environment variable with the export
command.
# Export an environment variable $ export FOO # Export and assign at the same time $ export FOO=abc # Run a program that reads the variable $ python -c 'import os; print os.getenv("FOO")' abc # Print all environment variables $ env
PATH
is a particularly important environment variable. It is a list of directories that contain programs that may be run from the command line, separated by colons. When you run a command like ls
, the shell searches each of the directories in PATH
for an executable file named ls
.
# Print PATH $ echo $PATH /bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin # Add a new directory to PATH $ PATH=$PATH:$HOME/bin # Find out where a program is installed $ type ls ls is /bin/ls
Variables keep their value only while the shell is running. If you set PATH
to something else, then close the terminal and reopen it, it will go back to the original value. To permanently change a variable, you'll need to set it in one of startup files in your home directory, typically .bashrc
. Start your favorite text editor, open .bashrc
, look for where PATH
is set, add something to the end of that line, save the file, and restart your terminal.
Input and output
Programs produce output on a channel called stdout (standard output) and receive input on a channel called stdin (standard input). Normally, these channels are connected to the terminal: the program can print text on the screen through stdout and can read what you type through stdin.
It's possible to connect stdin and stdout to other things using pipes and redirects. The <
and >
operators let you redirect stdin and stdout to files, respectively. This means that when a program reads from stdin, it will be reading from a file instead of the keyboard. When it writes to stdout, it will be writing to a file.
# Redirect the output of ls to a file $ ls *.txt >list-of-text-files # Count the words in that file $ wc <list-of-text-files 42
You can provide a block of text on stdin using a here document. This is useful for creating short files, possible with expanded variables.
A here document starts with <<
, followed by a word that marks the end of the document, usually EOF
.
$ cat >output <<EOF lorem ipsum dolor sit amet my path is $PATH EOF $ cat <output lorem ipsum dolor sit amet my path is /bin:/sbin:/usr/bin:/usr/sbin
Pipes allow you to connect the stdout of one program to the stdin of another program in the same command. They are one of the most useful features of Bash. You can chain any number of programs together using pipes.
# Count lines containing the word "wherefore" in two files $ cat romeo.txt hamlet.txt | grep -o wherefore | wc
There are several programs that help you process and redirect inputs and output. You've seen a few of them already.
echo
prints its command line arguments on stdout.
cat
treats each of its command line arguments as file names. It reads each file and prints the contents on stdout (concatenating them). This is a handy way to read the contents of a short file in the terminal without opening it in an editor. When cat
is started without arguments, it reads from stdin.
tee
copies data from stdin to stdout, and it also copies data to a file. This is useful for saving something in the middle of a pipeline. tee
is named tee
because it creates a T-junction in a pipeline.
$ ls *.txt | tee ls-output foo.txt bar.txt baz.txt $ cat ls-output foo.txt bar.txt baz.txt
head
and tail
read the first and last 10 lines of a file. This is useful if you're looking for an error near the beginning or end of a large log file. When run without a file name, they read stdin, which makes them useful in pipelines. The -n
option controls the number of lines they read.
$ ls *.txt | tee ls-output | head -n 1 foo.txt $ wc -l ls-output 3
Text filtering
Linux has lots of commands for searching and transforming text. Many of them use regular expressions, which are patterns used to match strings. (As a side note, regular expressions are incredibly useful and are available in every programming language. I won't go into detail about them here, but if you write programs that deal with text in any capacity, you should learn about them). When you use regular expressions on the command line, make sure to quote them since they often contain characters like '*' that would be interpreted by the shell.
grep
is probably the best known text filtering tool. It prints lines of text that match a regular expression. It normally reads text from the files given to it on the command line and prints filtered lines on stdout. If you give it the -R
flag, it will also recurse through directories. If you don't give it any files or directories, it will read stdin, which makes it useful in pipelines.
# Print lines that end with a colon $ grep ':$' data.txt # Print numbers that look like a US zip code $ grep -o '\b[0-9]{5}\b' addresses.txt # Print lines that do NOT contain C++-style comments $ grep -v '//' foo.cpp
sed
is a tool for transforming text using regular expressions. Its name is an abbreviation for "stream editor". It's most commonly used to replace text according to regular expressions or delete matching lines. You can use it in a pipeline with the -e
flag, or you can edit a file in place with the -i
flag.
# Read foo.txt, replace all occurrences of "foo" with "bar, and # write the result to bar.txt. $ sed -e 's/foo/bar/g' <foo.txt >bar.txt # Do the same thing, but write the result back to foo.txt. $ sed -i 's/foo/bar/g' foo.txt
awk
is a general purpose tool for manipulating text. It is particularly good at dealing with tables of text organized in columns.
# List files, printing only file name and size $ ls -l | awk '{print $9,$5}' foo.go 491 bar.go 5550 baz.go 1734
awk
is a powerful tool for manipulating text using a concise scripting language. Look for a tutorial if you want to learn more about it.
File names
It's frequently useful to be able to manipulate file names, especially when writing scripts. There are several useful tools for this.
dirname
takes a path as an argument and prints everything except the last component of the path.
basename
does the opposite: it prints the last component of a path. If you give basename
a suffix, it will also remove the suffix from the file name.
$ dirname foo/bar/baz.txt foo/bar $ basename foo/bar/baz.txt baz.txt $ basename foo/bar/baz.txt .txt baz
dirname
and basename
just manipulate the paths that are given to them. They don't actually look at the file systems. It's fine to give them paths to files that don't exist.
Loops
Like any programming language, Bash has loops and other control flow structures. I'm only going to cover for
loops here, but bash also has while
loops, if
statements, and case
statements.
for
loops let you execute a series of commands for each argument provided to the loop. This is most commonly used to process a series of files matching a glob.
# Rename .JPG files to .jpg $ for i in *.JPG; do mv $i $(basename $i .JPG).jpg; done
There are several parts to this, so let's break it down. The first part of the loop (before the first semicolon) defines a variable i
and globs a list of files to process. In each iteration of the loop, $i
is the name of one file in this list. The body of the loop is the word do
followed by a sequence of commands, separated by semicolons. We just have one mv
command here, which renames the file. The loop ends with the word done
.
You may not have seen the $(...)
syntax before. Bash will evaluate the command in the parenthesis, then replace the parenthesis with the output of the command. In this case, suppose $i
is set to IMG123.JPG
. The command basename IMG123.JPG .JPG
will be run, and bash will replace the parenthesis with IMG123
, which is followed by the new file extension, .jpg
. So after this expansion, the command inside the loop becomes mv IMG123.JPG IMG123.jpg
.
I'm glossing over a lot of things to keep this example simple. I should point out that this loop only works when there is at least one .JPG file, and none of the .JPG files have spaces in their names. If there are no .JPG files and the glob doesn't match anything, the glob won't be expanded and the loop will iterate over the literal string *.JPG
. If there is a .JPG file with a space in its name, bash will treat it as two separate files (probably neither of which really exists), since it breaks up arguments using spaces. These problems are not specific to for
loops. It will probably be obvious when something goes wrong on the command line, but if you're writing bash scripts, be careful.
Searching for files
The find
command is incredibly useful for locating files. It can search a directory tree for files matching a set of criteria. By default, it prints matching files, but it can also delete files or run commands.
# Print all .cc files in the src directory $ find src -name '*.cc' src/foo/foo.cc src/bar/bar.cc # Print executable files over 1MB $ find . -size +1MB -executable bin/cover bin/gocode bin/gorename # Delete .o files $ find . -name '*.o' -delete # Search for the string 'foo' in .go files $ find . -name '*.go' -exec grep -Hn foo {} \; compare_test.go:1391: Devices: []string{"foo", "bar", "baz"},
When using find
, make sure to quote patterns like '*.go'
above. It's important to do this because find
needs to match these patterns, not Bash.
Job control
Normally when you run a program, Bash waits for it to exit before returning you to the command line. Bash is capable of running multiple processes concurrently.
There are some keyboard shortcuts for controlling processes. To interrupt a running process, type ^C (that's Control-C). Most programs will quit when you interrupt them. To suspend a running process and return to the command prompt, type ^Z. This "pauses" a process; it will still exist in memory, but it won't be scheduled on the CPU.
To see a list of active processes (both suspended and backgrounded), use the jobs
command. Bash will print a job number for each process that can be used to refer to it.
To resume a suspended process in the background, use the bg
command. If there is more than one suspended process, you can use the job number to select the process to background. While the process is backgrounded, you'll continue to have access to the command prompt, but you won't be able to interact with the process since stdin won't be connected to the keyboard. You'll still see anything the program prints on stdout though.
To start a process in the background, add a &
at the end of the command line.
To resume a backgrounded or suspended process in the foreground, use the fg
command. Again, you can specify a job number if there's more than one process. This is the state that processes normally run in when you start them.
# Edit a file and suspend the editor $ emacs server.go # edit...edit...edit... ^Z # Try to compile a program $ go build server.go ./server.go:7: syntax error: unexpected EOF, expecting } # Resume the editor and fix the bug $ fg # edit...edit...edit... ^Z # Build the program, then start it in the background $ go build server.go $ ./server & # Stop the server $ jobs [1]- Running emacs server.go & [2]+ Running ./server & $ fg 2 ^C
Learning more
This article was a whirlwind tour of a lot of shell features and small, useful commands. Hopefully it was helpful. I tried to cover a wide breadth of functionality instead of focusing on any command in depth, so you'll probably want to look at some reference material.
Most commands will print some brief usage instructions when you give then an -h
or --help
flag on the command line.
If you want to read the manual for a given command use man
. For example, man grep
. Man pages are also easy to find on the web, too. Personally, I prefer reading documentation in a web browser. Man pages tend to be very dry. They'll tell you what every option does, but they won't give many examples or tell you when to use them.
Note that man pages only exist for programs, not for shell builtin commands. For those, use help
. For example, help echo
.
If you'd rather talk to a person, try Unix & Linux or Ask Ubuntu stack exchanges.