Code style guidelines

Author: Adrian Perez <aperez@igalia.com>
License:GPL3
Copyright: 2008-2009 Igalia S.L.

Abstract

Provides general guidelines on how to write and organize shell code in general and when using the Bill library in particular.

Contents

Shell-independent tips

Quoting

Just in terms of allocation of time resources, religion is not very efficient. There's a lot more I could be doing on a Sunday morning.

—Bill Gates

Tip

Use always double-quotes to enclose variable expansions. Unless you are really sure that you do not want them.

You should be aware that apparently excessive quoting is not only allowed in shell code, but even desirable.

Shells use quoting to allow passing characters which usually would need to be escaped in a more convenient way (e.g. spaces). Compare the following and decide which do you prefer:

ls /path/to\ somwhere/with\ embedded\ spaces
ls "/path/to somwhere/with embedded spaces"

As you can see, you can save some typing, and it makes code clearer. Quoting is very important when using variables, because expanding an unquoted variable will split it at spaces (more exactly: using the first character of the IFS variable), which is not always what you want. Enclosing a variable into double quotes instructs the shell not to perform splitting. The following loop will print out three lines, one character per line:

data='a b c'
for item in $data ; do
    echo $item
done

But using quotes when expanding the data variable, the same loop will print only a single line:

data='a b c'
for item in "$data" ; do
    echo $item
done

Subtle may appear due to due to missing quotes, and in general you will always want to quote variable expansions.

Exceptions

There are some cases where quoting is not necessary at all:

  • When assigning one variable the contents of another. The shell parser knows about this case and will always do the right thing. This includes expressions in the form a=$b:

    b='What the heck? - said the penguin'
    a="$b"  # This works, extra quoting does not harm
    a=$b    # Good one, too: this is just an assignment
    
  • When assigning a variable the output of a command. Explanation is the same as for the previous entry, as the shell knows how to do a=$(pwd) right:

    a="$(pwd)"  # Works. Did I say extra quoting does not harm?
    a=$(pwd)    # No problem, no ambiguity in the expression.
    
  • When you really want to split things out. For example when gathering command line options in a variable which will be passed to a command:

    flags="-a"
    flags="$flags --verbose"
    [ -o /some/file ] || flags="$flags --output /some/file"
    
    run_command $flags
    

    It is okay to omit quotes here, but if you are using Bash, you better would be using arrays.

  • When using [[ instead of [ in Bash. The [[ built-in is defined in the shell grammar and it does expansion “properly” because it is not a command:

    if [[ :$PATH: = *:/usr/bin:* ]] ; then
        echo '$PATH contains /usr/bin'
    fi
    

Sometimes it is really needed to split variable contents, but bear in mind that you usually are perfectly aware of that situations, so as a rule of thumb do always quote your variables. As an example, the following function will iterate over elements in a string separated by a given separator (for example the PATH environment variable):

print_list_items ()
{
    local item= old_IFS=$IFS  # Save the current separator characters.
    IFS=$1

    for item in $2 ; do
        echo "$item"
    done

    IFS=$old_IFS              # Restore separators.
}

print_list_items ':' "$PATH"

Single quotes

Tip

Use single quotes to enclose string literals.

Text enclosed in single quotes is never interpreted by the shell. The only character which needs to be escaped is the quote character itself: all the other characters, even carriage returns, will be part of the string. This makes them a great feature for defining multi-line strings.

Structured programming

Tip

Use shell functions to organize your code, as you would do in other programming languages.

Clean and structured code is a lot easier to read and maintain than unstructured “spaghetti” mash-ups. All decent-enough shells have a way of defining shell functions. Also, writing functions is the first step needed to write modular, reusable code.

Using shell functions instead of separate scripts has also the advantage of avoiding calling fork+exec when invoking them.

fork+exec

Tip

Avoid calling external commands when possible, especially inside tight loops.

Launching external commands from the shell incurs in calling the fork() and exec() system calls. The first one duplicates the running process into a new child process with its own identifier, memory area, open file descriptors... The second one replaces that new process with the code of the program being run. Although this is done quite efficiently in recent operating systems, it is far more work than calling internal commands defined by the shell.

Internal commands

Tip

Use shell built-in commands whenever possible.

Most shells include implementations for a number of usual commands in it, thus avoiding expensive fork+exec operations. Sometimes built-in commands are even a superset of the standard ones and include extra features.

Commands eligible for such implementations are:

:
[
echo
printf
test
true
false
pwd
kill

Avoid using backticks

Tip

Use $(command) (POSIX command expansion) instead of `command` (traditional backtick expansion).

Use always $( ) for command expansion, always. There is no reason to keep using backticks (`) because functionality is exactly the same, but the firs method is more powerful. The nasty thing about backticks is that you cannot nest them, like in the following example:

# Will not work: the shell believes the command is "grep foo"
# (from the first to the second backtick)
output=`grep foo `find /usr/share/doc -type f -name '*.txt'``

But it will work like a charm when using the POSIX variant:

# This works, POSIX command expansion works great!
output=$(grep foo $(find /usr/share/doc -type f -name '*.txt'))

In short: beware of backticks!

Last things occur... last

Tip

If the exit status of a function is the result of executing a command, just leave that command at the end of the function.

At a first attempt one may be tempted to write “defensive” code that ensures that the exit status is always set before exiting a functions:

string_is_magic ()
{
    if [ "$1" = "magic" ] ; then
        return 0
    else
        return 1
    fi
}

The code is correct and will work as expected, but it is hairy and has not pleasant aesthetics. As a first refinement one may found that the following works:

string_is_magic () {
    [ "$1" = "magic" ]
    return $?
}

Be aware that the last command always sets the exit status of the entire function, so the optimal way of implementing this function is the following:

string_is_magic () {
    [ "$1" = "magic" ]
}

“Bashisms”

Bash is a better sh, but it is more than that: it encourages some ways of doing things which use specific extensions provided by the shell. This is the reason why some people call them “bashisms”.

Locality is good

Tip

Use local to define local variables inside functions.

When using Bash one can mark shell variables as local in shell functions. Polluting the global namespace too much is a bad thing, so it is very common to see code that uses the local keyword in most variable definitions:

say_hello () {
    # The 'whom' variable will not be seen outside the function
    local whom=${1:-'world'}
    echo "Hello ${whom}!"
}

When using built-ins that define the variables themselves (like the read built-in) the variables must be marked as local before using the command:

add_prefix () {
    local line
    while read line ; do
        echo "$1 - $line"
    done
}

Cleaning up

Tip

The unset built-in can be used to delete variables and function definitions from the environment.

If you use a temporary variable you can use unset to remove its definition. For functions it is better to use local variables but sometimes it may be needed to remove items from the environment. This command will also remove definitions of functions.

Reading file contents

Tip

Use $(< ...) instead of $(cat ...) when possible.

Warning

Keep in mind that $(< ...) will only work when just reading the contents of a file. If there are a pipeline in the expansion (e.g. $(cat foo | grep bar) this shorthand will render an empty output.

Reading all the contents of a file in shell code is usually done by using the cat command in a command expansion. For example we can read the contents of the password database file with the following snippet:

# The usual code for reading files:
contents=$(cat /etc/passwd)

This is a rather common construction that incurs in the fork+exec operation. There is an alternative recognized by Bash which produces the same result but is implemented inside the shell:

# This is faster than using 'cat':
contents=$(< /etc/passwd)

Tip

Use the read built-in for processing line-based input.

Finally, sometimes you need to read a big file (which may even not fit in memory), but only one line at a time is needed. Imagine you are a spammer which wants to send a zillion e-mails to a list of addresses. Here the read builtin fits perfectly: it will read content enough up to a delimiter (end of line by default). Moreover, its exit code will always be zero until the end of the input is reached. This way one can easily parse huge line-based text files:

# Read one line at a time, assign it to the 'recipient' variable
# and use 'sendmail' to spam out people:
message='Subject: Improve your relations

Buy V14gr4 online NOW!'

while read myline ; do
    sendmail "$recipient" <<< "$message"
done < email-list.txt

Arrays

Tip

Consider using arrays when handling lists of data insteads of lists with a character separating items.

Bash includes supports for arrays right away as an extension over traditional Sh-like shells. Although they are somewhat limited, they are very useful in some cases. Defining an array and using it is easy:

my_array=( item_1 item_2 item_3 )

echo ${my_array[1]}
echo ${my_array[2]}

The limitation of arrays is that it is not possible to have multidimensional arrays... but they are not needed for most shell tasks after all. This makes the syntax for adding elements to an array rather funny:

my_array=( "${my_array[@]}" another_element )

One nice example of using arrays is gathering arguments which will be further passed to another function or command. It is safer to use an array rather than a space-separated string: you can have embedded spaces in command line arguments using arrays, which would screw your program when using a plain string. For example:

flags=( -a )
[ -n "$VERBOSE"  ] && flags=( "${flags[@]}" --verbose )
[ -o 'some file' ] || flags=( "${flags[@]}" -o 'some file' )

run_command "${flags[@]}"

The above code will always work as expected, even when the flags have bizarre characters inside, which would normally confuse the shell. Well, not the shell but you when trying to debug the problem.

Built-in trick galore

Some of the Bash built-in commands have some unusual features which may look as nonsense, but they serve very concrete purposes and you will be glad of knowing about them when the time to use them arrives. Some of the commands are only outlined, so make sure to have the Bash manual at hand.

printf -v

Using the -v flag you output of printf will be assigned to a shell variable instead of printed to standard output. As an example, the following two sentences produce the same result, but the second one avoids doing an extra shell expansions and capturing output for assignment:

version_string=$(printf "Bash %i.%02i" ${BASH_VERSINFO[0]} ${BASH_VERSINFO[1]})
printf -v version_string "Bash %i.%02i" ${BASH_VERSINFO[0]} ${BASH_VERSINFO[1]}

This can be very useful in conjunction with printf %q.

printf %q

The %q specifier formats the given argument in a way that it is reusable as further shell code input, i.e. to be reused as input for the eval command, by adding escape sequences to those characters which are interpreted by the shell. This is very useful with untrusted input supplied by the user or third party applications:

read -p 'Text: ' line
eval "foo='$(printf %q "$line")'"

[[-regexp

It is wise to know that the [[ built-in can be used to match regular expressions using the =~ operator. This is specially handy to avoid usage of external tools like sed or grep. Note that if you are processing huge amounts of input and you want to extract some lines, using grep can still be faster; but if you need to do line-by-line processing it is advised to use [[.

Take the following example where a string is checked against a regular expression using grep:

read line
# Check whether the line is a MIME-like header
if grep -E '^[^:]:' <<< "$line ; then
    echo "You entered a MIME header"
fi

It can be rewritten the following way:

read line
# Check whether the line is a MIME-like header
if [[ $line =~ ^[^:]: ]] ; then
    echo "You entered a MIME header"
fi

Code written this way will be faster, especially in tight loops which do process input in a line-by-line basis.

caller

This will output the line number and file name from which the current function was called. This can be used to craft your own debugging functions or printing backtraces when an error occurs.

declare

Being declarative and using declare means that Bash will be able of producing better error messages and do some basic type checking.

Note

When declare is used inside a function, it declares local variables, like local does.

Quick summary of available flags, which can be used with local as well:

-a Declare arrays.
-f Declare functions.
-i Declare integers.
-r Set variables as read-only. Assigning new values will produce an error.
-t Mark functions to trigger the DEBUG and RETURN traps when called.
-x Variables are exported to subshells. This is similar to use export.

Use the source, Luke

Tip

Use source as it is clearer than . (dot).

The source keywords works exactly the same way as . (dot): it reads a file containing shell code in the current context (i.e. without invoking a subshell). Apart from being longer, source is easier in the eyes and makes clear that one wants to read source code.

Philosophy basics

And now for something completely different: a quick tour on The Unix Philosophy. Having a quick summary of the basics at hand is, er, handy. As we are dealing with the shell, which is an angular piece of Unix, it is nice to review some of the precepts which make it a great operating system, and apply them to the shell. Alas, some of them are of particular interest for shell programming. In no particular order:

Rule of Modularity
Write simple parts connected by clean interfaces.
Rule of Composition
Design programs to be connected to other programs.
Rule of Extensibility
Design for the future, because it will be here sooner than you think.
Rule of Silence
When a program has nothing surprising to say, it should say nothing.
Rule of Repair
When something must fail, fail noisily and as soon as possible.
Rule of Clarity
Clarity is better than cleverness.