Intermediate Linux command line tutorial

Published on 2017-10-23
Tagged: bash linux

My wife recently made the jump from experimental physics to software engineering. She mentioned she had a hard time finding intermediate-level resources for using the command line in Linux. There are lots of articles that just teach you how to use cd, ls, mv, and rm, and then there are man pages that tell you the options for something you already know how to use, but there's not as much in between.

This article tries to bridge that gap. This is a concise collection of tips that will help you be more productive on the command line without getting into Linux internals or non-standard tools. I won't go into too much depth on any one topic, so I encourage you to experiment and research. Advice here generally applies not only to Linux but also to macOS (which has the same shell).

Bash

Bash is a shell, a program that accepts commands and lets you run other programs from the terminal. Bash stands for Bourne Again SHell; it's the successor to the Bourne shell. Bash isn't the only shell program around: there are csh, ksh, fish, dash, and many others. However, bash is the default on many systems, and I prefer it for simplicity.

Bash is also a programming language. You'll spend most of your time writing simple commands in the terminal interactively, but you can also chain commands together and write loops and functions. You can write many commands in a script file and execute the script as a program. I won't focus on scripting in this article, but it's important to know that it's an option.

Built-in commands and programs

Most of the time, when you run a command in bash, you're running a program installed somewhere on your system. Bash has a few commands built-in though. You can tell whether a command is built-in using the type command. type is also useful for finding where a program is installed.

$ type ls
ls is /bin/ls

$ type echo
echo is a shell builtin

$ type cd
cd is a shell builtin

$ type type
type is a shell builtin

Style note: in this article's examples, lines that start with $ are commands typed into the terminal. Lines that start with # are comments.

Globs and string expansion

A glob is a string of characters that expands to match files in the current directory. A glob includes wildcard characters: ? matches any single character, * matches any sequence of characters.

# List files ending with .txt
$ ls *.txt
foo.txt
bar.txt

# This is equivalent (after expansion)
$ ls foo.txt bar.txt
foo.txt
bar.txt

# Delete files starting with a
$ rm a*

Bash will replace globs on the command line with the list of matching file names in the current directory. If no files match the glob, the shell will leave the glob in place rather than replacing it with nothing. This is why the command ls *.xyz does something different than ls when there are no .xyz files.

Some commands, like find, need to match files themselves without Bash's help. Others, like grep, use special characters that look like globs. To prevent the shell from expanding globs in an argument, wrap the argument in quotes. More on quoting in the next section.

# Find files in subdirectories ending with .txt
$ find . -name '*.txt'

# Print lines in a file matching a regular expression
$ grep 'abc.*xyz' foo.txt

Braces are another useful way to expand strings. You can write strings separated by commas in braces as part of an argument. The argument will be repeated with each string substituted for the braces. This is hard to describe, so here are a few examples. Play around with it.

# Simple expansion
$ echo foo{a,b,c}
fooa foob fooc

# List files starting with IMG_, ending with jpg or jpeg
$ ls IMG_*.{jpg,jpeg}
IMG_001.jpg
IMG_002.jpeg

# Make a backup copy of a file
$ cp foo.go{,~}

Quoting

Quotes prevent Bash from interpreting special characters in arguments. It's especially important for arguments that contain spaces, since Bash normally uses spaces to split arguments. Quotes are also useful to prevent characters like * from being expanded.

# Use quotes to write an argument with spaces
$ git commit -m 'Fixed a bug with A* pathfinding'

# Use quotes to prevent globbing
$ find . -name '*.txt'

Bash treats single and double quotes differently. Single quotes are literal: what appears in a string is what gets passed to a program. Double quotes are similar to single quotes, but they also allow variable expansion.

$ echo 'no place like $HOME'
no place like $HOME

$ echo "no place like $HOME"
no place like /home/jay

Variables

You may see words prefixed with dollar signs, like $HOME above. These are variables. A variable is an association between a name like HOME and a string of characters like /home/jay.

# Set a variable
$ FOO=abc

# Print a variable
$ echo $FOO
abc

# Remove a variable
$ unset FOO

# Print all variables
$ set
BASH=/bin/bash
BASH_ALIASES=()
BASH_ARGC=()
BASH_ARGV=()
...

Note that the dollar sign is only used when the value of a variable is being used as part of a larger expression. The dollar sign is not needed when you're setting or unsetting the variable.

Environment variables are special variables that are visible to programs you run. Most of these have well-known names like HOME or DISPLAY or USER. You can turn a regular variable into an environment variable with the export command.

# Export an environment variable
$ export FOO

# Export and assign at the same time
$ export FOO=abc

# Run a program that reads the variable
$ python -c 'import os; print os.getenv("FOO")'
abc

# Print all environment variables
$ env

PATH is a particularly important environment variable. It is a list of directories that contain programs that may be run from the command line, separated by colons. When you run a command like ls, the shell searches each of the directories in PATH for an executable file named ls.

# Print PATH
$ echo $PATH
/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin

# Add a new directory to PATH
$ PATH=$PATH:$HOME/bin

# Find out where a program is installed
$ type ls
ls is /bin/ls

Variables keep their value only while the shell is running. If you set PATH to something else, then close the terminal and reopen it, it will go back to the original value. To permanently change a variable, you'll need to set it in one of startup files in your home directory, typically .bashrc. Start your favorite text editor, open .bashrc, look for where PATH is set, add something to the end of that line, save the file, and restart your terminal.

Input and output

Programs produce output on a channel called stdout (standard output) and receive input on a channel called stdin (standard input). Normally, these channels are connected to the terminal: the program can print text on the screen through stdout and can read what you type through stdin.

It's possible to connect stdin and stdout to other things using pipes and redirects. The < and > operators let you redirect stdin and stdout to files, respectively. This means that when a program reads from stdin, it will be reading from a file instead of the keyboard. When it writes to stdout, it will be writing to a file.

# Redirect the output of ls to a file
$ ls *.txt >list-of-text-files

# Count the words in that file
$ wc <list-of-text-files
42

You can provide a block of text on stdin using a here document. This is useful for creating short files, possible with expanded variables. A here document starts with <<, followed by a word that marks the end of the document, usually EOF.

$ cat >output <<EOF
lorem ipsum
dolor sit amet
my path is $PATH
EOF

$ cat <output
lorem ipsum
dolor sit amet
my path is /bin:/sbin:/usr/bin:/usr/sbin

Pipes allow you to connect the stdout of one program to the stdin of another program in the same command. They are one of the most useful features of Bash. You can chain any number of programs together using pipes.

# Count lines containing the word "wherefore" in two files
$ cat romeo.txt hamlet.txt | grep -o wherefore | wc

There are several programs that help you process and redirect inputs and output. You've seen a few of them already.

echo prints its command line arguments on stdout.

cat treats each of its command line arguments as file names. It reads each file and prints the contents on stdout (concatenating them). This is a handy way to read the contents of a short file in the terminal without opening it in an editor. When cat is started without arguments, it reads from stdin.

tee copies data from stdin to stdout, and it also copies data to a file. This is useful for saving something in the middle of a pipeline. tee is named tee because it creates a T-junction in a pipeline.

$ ls *.txt | tee ls-output
foo.txt
bar.txt
baz.txt

$ cat ls-output
foo.txt
bar.txt
baz.txt

head and tail read the first and last 10 lines of a file. This is useful if you're looking for an error near the beginning or end of a large log file. When run without a file name, they read stdin, which makes them useful in pipelines. The -n option controls the number of lines they read.

$ ls *.txt | tee ls-output | head -n 1
foo.txt

$ wc -l ls-output
3

Text filtering

Linux has lots of commands for searching and transforming text. Many of them use regular expressions, which are patterns used to match strings. (As a side note, regular expressions are incredibly useful and are available in every programming language. I won't go into detail about them here, but if you write programs that deal with text in any capacity, you should learn about them). When you use regular expressions on the command line, make sure to quote them since they often contain characters like '*' that would be interpreted by the shell.

grep is probably the best known text filtering tool. It prints lines of text that match a regular expression. It normally reads text from the files given to it on the command line and prints filtered lines on stdout. If you give it the -R flag, it will also recurse through directories. If you don't give it any files or directories, it will read stdin, which makes it useful in pipelines.

# Print lines that end with a colon
$ grep ':$' data.txt

# Print numbers that look like a US zip code
$ grep -o '\b[0-9]{5}\b' addresses.txt

# Print lines that do NOT contain C++-style comments
$ grep -v '//' foo.cpp

sed is a tool for transforming text using regular expressions. Its name is an abbreviation for "stream editor". It's most commonly used to replace text according to regular expressions or delete matching lines. You can use it in a pipeline with the -e flag, or you can edit a file in place with the -i flag.

# Read foo.txt, replace all occurrences of "foo" with "bar, and
# write the result to bar.txt.
$ sed -e 's/foo/bar/g' <foo.txt >bar.txt

# Do the same thing, but write the result back to foo.txt.
$ sed -i 's/foo/bar/g' foo.txt

awk is a general purpose tool for manipulating text. It is particularly good at dealing with tables of text organized in columns.

# List files, printing only file name and size
$ ls -l | awk '{print $9,$5}'
foo.go 491
bar.go 5550
baz.go 1734

awk is a powerful tool for manipulating text using a concise scripting language. Look for a tutorial if you want to learn more about it.

File names

It's frequently useful to be able to manipulate file names, especially when writing scripts. There are several useful tools for this.

dirname takes a path as an argument and prints everything except the last component of the path.

basename does the opposite: it prints the last component of a path. If you give basename a suffix, it will also remove the suffix from the file name.

$ dirname foo/bar/baz.txt
foo/bar

$ basename foo/bar/baz.txt
baz.txt

$ basename foo/bar/baz.txt .txt
baz

dirname and basename just manipulate the paths that are given to them. They don't actually look at the file systems. It's fine to give them paths to files that don't exist.

Loops

Like any programming language, Bash has loops and other control flow structures. I'm only going to cover for loops here, but bash also has while loops, if statements, and case statements.

for loops let you execute a series of commands for each argument provided to the loop. This is most commonly used to process a series of files matching a glob.

# Rename .JPG files to .jpg
$ for i in *.JPG; do mv $i $(basename $i .JPG).jpg; done

There are several parts to this, so let's break it down. The first part of the loop (before the first semicolon) defines a variable i and globs a list of files to process. In each iteration of the loop, $i is the name of one file in this list. The body of the loop is the word do followed by a sequence of commands, separated by semicolons. We just have one mv command here, which renames the file. The loop ends with the word done.

You may not have seen the $(...) syntax before. Bash will evaluate the command in the parenthesis, then replace the parenthesis with the output of the command. In this case, suppose $i is set to IMG123.JPG. The command basename IMG123.JPG .JPG will be run, and bash will replace the parenthesis with IMG123, which is followed by the new file extension, .jpg. So after this expansion, the command inside the loop becomes mv IMG123.JPG IMG123.jpg.

I'm glossing over a lot of things to keep this example simple. I should point out that this loop only works when there is at least one .JPG file, and none of the .JPG files have spaces in their names. If there are no .JPG files and the glob doesn't match anything, the glob won't be expanded and the loop will iterate over the literal string *.JPG. If there is a .JPG file with a space in its name, bash will treat it as two separate files (probably neither of which really exists), since it breaks up arguments using spaces. These problems are not specific to for loops. It will probably be obvious when something goes wrong on the command line, but if you're writing bash scripts, be careful.

Searching for files

The find command is incredibly useful for locating files. It can search a directory tree for files matching a set of criteria. By default, it prints matching files, but it can also delete files or run commands.

# Print all .cc files in the src directory
$ find src -name '*.cc'
src/foo/foo.cc
src/bar/bar.cc

# Print executable files over 1MB
$ find . -size +1MB -executable
bin/cover
bin/gocode
bin/gorename

# Delete .o files
$ find . -name '*.o' -delete

# Search for the string 'foo' in .go files
$ find . -name '*.go' -exec grep -Hn foo {} \;
compare_test.go:1391:    	Devices: []string{"foo", "bar", "baz"},

When using find, make sure to quote patterns like '*.go' above. It's important to do this because find needs to match these patterns, not Bash.

Job control

Normally when you run a program, Bash waits for it to exit before returning you to the command line. Bash is capable of running multiple processes concurrently.

There are some keyboard shortcuts for controlling processes. To interrupt a running process, type ^C (that's Control-C). Most programs will quit when you interrupt them. To suspend a running process and return to the command prompt, type ^Z. This "pauses" a process; it will still exist in memory, but it won't be scheduled on the CPU.

To see a list of active processes (both suspended and backgrounded), use the jobs command. Bash will print a job number for each process that can be used to refer to it.

To resume a suspended process in the background, use the bg command. If there is more than one suspended process, you can use the job number to select the process to background. While the process is backgrounded, you'll continue to have access to the command prompt, but you won't be able to interact with the process since stdin won't be connected to the keyboard. You'll still see anything the program prints on stdout though.

To start a process in the background, add a & at the end of the command line.

To resume a backgrounded or suspended process in the foreground, use the fg command. Again, you can specify a job number if there's more than one process. This is the state that processes normally run in when you start them.

# Edit a file and suspend the editor
$ emacs server.go
# edit...edit...edit...
^Z

# Try to compile a program
$ go build server.go
./server.go:7: syntax error: unexpected EOF, expecting }

# Resume the editor and fix the bug
$ fg
# edit...edit...edit...
^Z

# Build the program, then start it in the background
$ go build server.go
$ ./server &

# Stop the server
$ jobs
[1]-  Running                 emacs server.go &
[2]+  Running                 ./server &

$ fg 2
^C

Learning more

This article was a whirlwind tour of a lot of shell features and small, useful commands. Hopefully it was helpful. I tried to cover a wide breadth of functionality instead of focusing on any command in depth, so you'll probably want to look at some reference material.

Most commands will print some brief usage instructions when you give then an -h or --help flag on the command line.

If you want to read the manual for a given command use man. For example, man grep. Man pages are also easy to find on the web, too. Personally, I prefer reading documentation in a web browser. Man pages tend to be very dry. They'll tell you what every option does, but they won't give many examples or tell you when to use them.

Note that man pages only exist for programs, not for shell builtin commands. For those, use help. For example, help echo.

If you'd rather talk to a person, try Unix & Linux or Ask Ubuntu stack exchanges.