Author: | Adrian Perez <aperez@igalia.com> |
---|---|
License: | GPL3 |
Copyright: | 2008-2009 Igalia S.L. |
Abstract
Provides general guidelines on how to write and organize shell code in general and when using the Bill library in particular.
Just in terms of allocation of time resources, religion is not very efficient. There's a lot more I could be doing on a Sunday morning.
—Bill Gates
Tip
Use always double-quotes to enclose variable expansions. Unless you are really sure that you do not want them.
You should be aware that apparently excessive quoting is not only allowed in shell code, but even desirable.
Shells use quoting to allow passing characters which usually would need to be escaped in a more convenient way (e.g. spaces). Compare the following and decide which do you prefer:
ls /path/to\ somwhere/with\ embedded\ spaces
ls "/path/to somwhere/with embedded spaces"
As you can see, you can save some typing, and it makes code clearer. Quoting is very important when using variables, because expanding an unquoted variable will split it at spaces (more exactly: using the first character of the IFS variable), which is not always what you want. Enclosing a variable into double quotes instructs the shell not to perform splitting. The following loop will print out three lines, one character per line:
data='a b c'
for item in $data ; do
echo $item
done
But using quotes when expanding the data variable, the same loop will print only a single line:
data='a b c'
for item in "$data" ; do
echo $item
done
Subtle may appear due to due to missing quotes, and in general you will always want to quote variable expansions.
There are some cases where quoting is not necessary at all:
When assigning one variable the contents of another. The shell parser knows about this case and will always do the right thing. This includes expressions in the form a=$b:
b='What the heck? - said the penguin'
a="$b" # This works, extra quoting does not harm
a=$b # Good one, too: this is just an assignment
When assigning a variable the output of a command. Explanation is the same as for the previous entry, as the shell knows how to do a=$(pwd) right:
a="$(pwd)" # Works. Did I say extra quoting does not harm?
a=$(pwd) # No problem, no ambiguity in the expression.
When you really want to split things out. For example when gathering command line options in a variable which will be passed to a command:
flags="-a"
flags="$flags --verbose"
[ -o /some/file ] || flags="$flags --output /some/file"
run_command $flags
It is okay to omit quotes here, but if you are using Bash, you better would be using arrays.
When using [[ instead of [ in Bash. The [[ built-in is defined in the shell grammar and it does expansion “properly” because it is not a command:
if [[ :$PATH: = *:/usr/bin:* ]] ; then
echo '$PATH contains /usr/bin'
fi
Sometimes it is really needed to split variable contents, but bear in mind that you usually are perfectly aware of that situations, so as a rule of thumb do always quote your variables. As an example, the following function will iterate over elements in a string separated by a given separator (for example the PATH environment variable):
print_list_items ()
{
local item= old_IFS=$IFS # Save the current separator characters.
IFS=$1
for item in $2 ; do
echo "$item"
done
IFS=$old_IFS # Restore separators.
}
print_list_items ':' "$PATH"
Tip
Use single quotes to enclose string literals.
Text enclosed in single quotes is never interpreted by the shell. The only character which needs to be escaped is the quote character itself: all the other characters, even carriage returns, will be part of the string. This makes them a great feature for defining multi-line strings.
Tip
Use shell functions to organize your code, as you would do in other programming languages.
Clean and structured code is a lot easier to read and maintain than unstructured “spaghetti” mash-ups. All decent-enough shells have a way of defining shell functions. Also, writing functions is the first step needed to write modular, reusable code.
Using shell functions instead of separate scripts has also the advantage of avoiding calling fork+exec when invoking them.
Tip
Avoid calling external commands when possible, especially inside tight loops.
Launching external commands from the shell incurs in calling the fork() and exec() system calls. The first one duplicates the running process into a new child process with its own identifier, memory area, open file descriptors... The second one replaces that new process with the code of the program being run. Although this is done quite efficiently in recent operating systems, it is far more work than calling internal commands defined by the shell.
Tip
Use shell built-in commands whenever possible.
Most shells include implementations for a number of usual commands in it, thus avoiding expensive fork+exec operations. Sometimes built-in commands are even a superset of the standard ones and include extra features.
Commands eligible for such implementations are:
: [ echo printf test true false pwd kill
Tip
Use $(command) (POSIX command expansion) instead of `command` (traditional backtick expansion).
Use always $( ) for command expansion, always. There is no reason to keep using backticks (`) because functionality is exactly the same, but the firs method is more powerful. The nasty thing about backticks is that you cannot nest them, like in the following example:
# Will not work: the shell believes the command is "grep foo"
# (from the first to the second backtick)
output=`grep foo `find /usr/share/doc -type f -name '*.txt'``
But it will work like a charm when using the POSIX variant:
# This works, POSIX command expansion works great!
output=$(grep foo $(find /usr/share/doc -type f -name '*.txt'))
In short: beware of backticks!
Tip
If the exit status of a function is the result of executing a command, just leave that command at the end of the function.
At a first attempt one may be tempted to write “defensive” code that ensures that the exit status is always set before exiting a functions:
string_is_magic ()
{
if [ "$1" = "magic" ] ; then
return 0
else
return 1
fi
}
The code is correct and will work as expected, but it is hairy and has not pleasant aesthetics. As a first refinement one may found that the following works:
string_is_magic () {
[ "$1" = "magic" ]
return $?
}
Be aware that the last command always sets the exit status of the entire function, so the optimal way of implementing this function is the following:
string_is_magic () {
[ "$1" = "magic" ]
}
Bash is a better sh, but it is more than that: it encourages some ways of doing things which use specific extensions provided by the shell. This is the reason why some people call them “bashisms”.
Tip
Use local to define local variables inside functions.
When using Bash one can mark shell variables as local in shell functions. Polluting the global namespace too much is a bad thing, so it is very common to see code that uses the local keyword in most variable definitions:
say_hello () {
# The 'whom' variable will not be seen outside the function
local whom=${1:-'world'}
echo "Hello ${whom}!"
}
When using built-ins that define the variables themselves (like the read built-in) the variables must be marked as local before using the command:
add_prefix () {
local line
while read line ; do
echo "$1 - $line"
done
}
Tip
The unset built-in can be used to delete variables and function definitions from the environment.
If you use a temporary variable you can use unset to remove its definition. For functions it is better to use local variables but sometimes it may be needed to remove items from the environment. This command will also remove definitions of functions.
Tip
Use $(< ...) instead of $(cat ...) when possible.
Warning
Keep in mind that $(< ...) will only work when just reading the contents of a file. If there are a pipeline in the expansion (e.g. $(cat foo | grep bar) this shorthand will render an empty output.
Reading all the contents of a file in shell code is usually done by using the cat command in a command expansion. For example we can read the contents of the password database file with the following snippet:
# The usual code for reading files:
contents=$(cat /etc/passwd)
This is a rather common construction that incurs in the fork+exec operation. There is an alternative recognized by Bash which produces the same result but is implemented inside the shell:
# This is faster than using 'cat':
contents=$(< /etc/passwd)
Tip
Use the read built-in for processing line-based input.
Finally, sometimes you need to read a big file (which may even not fit in memory), but only one line at a time is needed. Imagine you are a spammer which wants to send a zillion e-mails to a list of addresses. Here the read builtin fits perfectly: it will read content enough up to a delimiter (end of line by default). Moreover, its exit code will always be zero until the end of the input is reached. This way one can easily parse huge line-based text files:
# Read one line at a time, assign it to the 'recipient' variable
# and use 'sendmail' to spam out people:
message='Subject: Improve your relations
Buy V14gr4 online NOW!'
while read myline ; do
sendmail "$recipient" <<< "$message"
done < email-list.txt
Tip
Consider using arrays when handling lists of data insteads of lists with a character separating items.
Bash includes supports for arrays right away as an extension over traditional Sh-like shells. Although they are somewhat limited, they are very useful in some cases. Defining an array and using it is easy:
my_array=( item_1 item_2 item_3 )
echo ${my_array[1]}
echo ${my_array[2]}
The limitation of arrays is that it is not possible to have multidimensional arrays... but they are not needed for most shell tasks after all. This makes the syntax for adding elements to an array rather funny:
my_array=( "${my_array[@]}" another_element )
One nice example of using arrays is gathering arguments which will be further passed to another function or command. It is safer to use an array rather than a space-separated string: you can have embedded spaces in command line arguments using arrays, which would screw your program when using a plain string. For example:
flags=( -a )
[ -n "$VERBOSE" ] && flags=( "${flags[@]}" --verbose )
[ -o 'some file' ] || flags=( "${flags[@]}" -o 'some file' )
run_command "${flags[@]}"
The above code will always work as expected, even when the flags have bizarre characters inside, which would normally confuse the shell. Well, not the shell but you when trying to debug the problem.
Some of the Bash built-in commands have some unusual features which may look as nonsense, but they serve very concrete purposes and you will be glad of knowing about them when the time to use them arrives. Some of the commands are only outlined, so make sure to have the Bash manual at hand.
Using the -v flag you output of printf will be assigned to a shell variable instead of printed to standard output. As an example, the following two sentences produce the same result, but the second one avoids doing an extra shell expansions and capturing output for assignment:
version_string=$(printf "Bash %i.%02i" ${BASH_VERSINFO[0]} ${BASH_VERSINFO[1]})
printf -v version_string "Bash %i.%02i" ${BASH_VERSINFO[0]} ${BASH_VERSINFO[1]}
This can be very useful in conjunction with printf %q.
The %q specifier formats the given argument in a way that it is reusable as further shell code input, i.e. to be reused as input for the eval command, by adding escape sequences to those characters which are interpreted by the shell. This is very useful with untrusted input supplied by the user or third party applications:
read -p 'Text: ' line
eval "foo='$(printf %q "$line")'"
It is wise to know that the [[ built-in can be used to match regular expressions using the =~ operator. This is specially handy to avoid usage of external tools like sed or grep. Note that if you are processing huge amounts of input and you want to extract some lines, using grep can still be faster; but if you need to do line-by-line processing it is advised to use [[.
Take the following example where a string is checked against a regular expression using grep:
read line
# Check whether the line is a MIME-like header
if grep -E '^[^:]:' <<< "$line ; then
echo "You entered a MIME header"
fi
It can be rewritten the following way:
read line
# Check whether the line is a MIME-like header
if [[ $line =~ ^[^:]: ]] ; then
echo "You entered a MIME header"
fi
Code written this way will be faster, especially in tight loops which do process input in a line-by-line basis.
This will output the line number and file name from which the current function was called. This can be used to craft your own debugging functions or printing backtraces when an error occurs.
Being declarative and using declare means that Bash will be able of producing better error messages and do some basic type checking.
Note
When declare is used inside a function, it declares local variables, like local does.
Quick summary of available flags, which can be used with local as well:
-a | Declare arrays. |
-f | Declare functions. |
-i | Declare integers. |
-r | Set variables as read-only. Assigning new values will produce an error. |
-t | Mark functions to trigger the DEBUG and RETURN traps when called. |
-x | Variables are exported to subshells. This is similar to use export. |
Tip
Use source as it is clearer than . (dot).
The source keywords works exactly the same way as . (dot): it reads a file containing shell code in the current context (i.e. without invoking a subshell). Apart from being longer, source is easier in the eyes and makes clear that one wants to read source code.
And now for something completely different: a quick tour on The Unix Philosophy. Having a quick summary of the basics at hand is, er, handy. As we are dealing with the shell, which is an angular piece of Unix, it is nice to review some of the precepts which make it a great operating system, and apply them to the shell. Alas, some of them are of particular interest for shell programming. In no particular order: