visit
This is part V of a tutorial on how to build a Linux shell. You can read the previous parts of this tutorial from the following links: part I, part II, part III, part IV.
NOTE: You can download the complete source code for Part V from .
As we've seen in the previous parts, a simple command consists of one or more arguments, also known as words. The first word contains the name of the command we want to execute, while the rest of the words contain the command's arguments.
Before the shell executes a command, it performs word expansion on the command words. is the process by which the shell takes a command word, checks it to see if it contains variable names, pathnames, commands, and arithmetic expressions, and substitutes each name/command/expression with its value. The resultant word, which is usually (but not always) longer than the original word, might be broken down into one or more subwords (or fields), in a process known as field splitting.
In this part, we'll implement the 7 word expansions defined by POSIX, which are: , , , , , , and . There are other word expansions, such as and , which are not defined by POSIX, and which we won't be discussing here. After finishing this lesson, it would be a good exercise if you extended the shell by implementing the non-POSIX word expansions.After we finish discussing the individual word expansion functions, we'll write the main function that will tie everything together, then we'll update our existing shell's code to make our shell use the word expansion functions we've just implemented. To make the code easier to read, I've defined all our word expansion functions in the source file
wordexp.c
, which you'll find in (the pattern matching functions are defined in pattern.c
, which you can read in ).Now let's get started.Word expansions are preceded by a
$
sign. The characters following the $
sign indicate the type of expansion to be performed by the shell. These characters are interpreted by the shell as follows:@
, *
, #
, ?
, -
, $
, !
, or 0
, which indicate variable expansion of a (we'll discuss these in a later lesson in this tutorial).{
and }
.(
and )
.((
and ))
.struct word_s
{
char *data;
int len;
struct word_s *next;
};
data
=> the string representing this word.len
=> the length of the data
field.next
=> pointer to the next word, or NULL
if this is the last word (we'll use a linked list to represent our expanded words).Of course, we'll need some functions to allocate and free our
struct word_s
structures. For this, we'll use the following functions:struct word_s *make_word(char *str);
void free_all_words(struct word_s *first);
The first function allocates memory for the structure, creates a copy of the word string, and returns the newly allocated word. The second function frees the memory used by a list of word structures. You can read the functions' code in our
wordexp.c
.The following list shows the function prototypes for our helper functions, all of which we'll define in the
:wordexp.c
char *wordlist_to_str(struct word_s *word);
void delete_char_at(char *str, size_t index);
int is_name(char *str);
size_t find_closing_quote(char *data);
size_t find_closing_brace(char *data);
char *substitute_str(char *s1, char *s2, size_t start, size_t end);
int substitute_word(char **pstart, char **p, size_t len, char *(func)(char *), int add_quotes);
wordlist_to_str()
=> converts a linked list of expanded words to a single string.delete_char_at()
=> removes the character at the given index from a string.is_name()
=> checks if a string represents a valid variable name (refer back to The Word Expansion Process section above).find_closing_quote()
=> when a word expansion contains an opening quote character ("
, '
, or `
), we need to find the matching closing quote character that encloses the quoted string (more on quoting below). This function returns the zero-based index of the closing quote character in the word.find_closing_brace()
=> similar to the above, except that it finds the matching closing brace. That is, if the opening brace is {
, (
, or [
, this function returns the zero-based index of the matching }
, )
, or ]
character, respectively. Finding pairs of quotes is important for processing parameter expansion, arithmetic expansion, and command substitution.substitute_str()
=> substitutes the substring of s1
, from the character at position start
to that at position end
(both positions are zero-based), with the s2
string. This is useful when the word expansion is part of a longer word, such as ${PATH}/ls
, in which case we need only expand ${PATH}
, then attach /ls
to the end of the expanded string.substitute_word()
=> a helper function that calls the other word expansion functions, which we'll define in the following sections.Additionally, we'll define some functions to help us work with strings. We'll define all of these functions in the
:strings.c
char *strchr_any(char *string, char *chars);
char *quote_val(char *val, int add_quotes);
int check_buffer_bounds(int *count, int *len, char ***buf);
void free_buffer(int len, char **buf);
strchr_any()
=> similar to , except that it searches the given string for any of the given characters.quote_val()
=> does the reverse of quote removal, that is, converts a string to a quoted string (this might seem trivial at first, but quoting might become complicated when we have to escape quotes and backslash characters, for example).check_buffer_bounds()
and free_buffer()
functions will allow our backend executor to support variable number of command arguments, instead of the hard limit we've set in Part II, which was 255.During , the shell substitutes a tilde character (followed by an optional username) with the pathname of the user's home directory. For example,
~
and ~/
are tilde-expanded to the current user's home directory, while ~john
is tilde-expanded to user John's home directory, and so on. The tilde character, in addition to all the following characters up to the first unquoted forward slash, are known as the tilde prefix (we'll deal with quoting later on in this lession).To perform tilde expansion, we'll define the
tilde_expand()
function, which has the following prototype:char *tilde_expand(char *s);
The function accepts a single argument: the tilde prefix we want to expand. If the expansion succeeds, the function returns an 'd string representing the tilde-expanded prefix. Otherwise, it returns
NULL
. Here's a quick breakdown of what the function does in order to expand a tilde prefix:~
, get the value of the $HOME
shell variable. If $HOME
is defined and is not NULL
, return its value. Otherwise, get the current User ID (UID) by calling , and pass the UID to to get the corresponding to the current user. The pw_dir
field of the password database entry contains the pathname of the home directory, which the function returns.~
), we take these letters as the name of the user whose home directory we want to get. We call , passing it the username, and return the value of the pw_dir
field.NULL
. Otherwise, we return an 'd copy of the home directory path.Take a minute to read the
tilde_expand()
function's code in our .In , the shell replaces the name of a shell variable with the variable's value (hence the other name, variable expansion). Parameter Expansion is what allows the shell to execute a command such as
echo $PATH
. In this example, the shell performs parameter expansion on the $PATH
variable, replacing it with the actual executable path (something like /bin:/sbin:/usr/bin:/usr/sbin
). What the echo
command sees is not the $PATH
word, but its expanded value (which can be NULL
, of course).To signal to the shell that we want to expand a shell variable, we precede the variable name with a
$
sign. That is, to expand the PATH
, USER
, and SHELL
variables, we need to pass the $PATH
, $USER
, and $SHELL
words to the shell, respectively (alternatively, we can pass these variable expansions to shell shell as ${PATH}
, ${USER}
, and ${SHELL}
). Shell variable names can contain any combination of letters, digits, and underscores. Names can contain capital or small letters although, by convention, capitalized names are reserved for standard shell variables, such as those .We can control how the shell performs parameter expansion by using a parameter expansion modifier, which tells the shell which part of the value we want expanded, as well as what to do if there is no shell variable with the given name. The table below summarizes the parameter expansion modifiers (the ones are marked by the POSIX word in the Description column). Most shells support other modifiers (found in the bottom half of the table), which we won't discuss here. Refer to your shell's manual page for more information on the non-POSIX modifiers.
To perform parameter (or variable) expansion, we'll define the
var_expand()
function, which has the following prototype:char *var_expand(char *orig_var_name);
The function accepts a single argument: the parameter (i.e. the variable name) we want to expand. If the expansion succeeds, the function returns an 'd string containing the expanded value. Otherwise, it returns
NULL
. Here's a quick breakdown of what the function does in order to expand a variable name to get its value:${PATH}
), remove the braces, as they are not part of the variable name itself.#
, we need to get the variable name's length (the ${#parameter}
expansion, which is seen in the 5th row in the table above).${parameter#word}
and ${parameter%word}
expansions), we'll need two helper functions: match_suffix()
and match_prefix()
. We won't discuss these functions here, but you can read their code from .${parameter:=word}
, we need to set the value of the symbol table entry to the value we've just expanded.#
, get the length of the expanded value and use it as the final result.Take a minute to read the
var_expand()
function's code in our .for i in $(ls); do echo $i; done
the shell forks a process, in which the
ls
command is run. The output of this command is the list of files in the current directory. The shell takes that output, splits it into a list of words, then feeds those words, one at a time, to the loop. In each iteration of the loop, variable $i
is assigned the name of the next file from the list. This name is passed to the
echo
command, which outputs the name on a separate line (Practically, it would be better to execute ls
directly, but this example is just to show how command substitution can be used in the shell).Command substitution can be written as
$(command)
, or `command`
. To perform command substitution, we'll define the command_substitute()
function, which has the following prototype:char *command_substitute(char *orig_cmd);
The function accepts a single argument: the command we want to execute. If the expansion succeeds, the function returns an 'd string representing the command's output. If the expansion fails, or if the command outputs nothing, the function returns
NULL
. Here's a quick breakdown of what the function does in order to expand a command substitution:$()
or the backquotes ``
. This leaves us with the command we need to execute.popen()
, and we get a pointer to a FILE
stream from which we'll read the command's output.Take a minute to read the
command_substitute()
function's code in our .Arithmetic expansion is written as
$((expression))
. To perform the expansion, we'll define the
arithm_expand()
function, which has the following prototype:char *arithm_expand(char *expr);
The
arithm_expand()
function receives a string containing the arithmetic expression, performs the necessary calculations, and returns the result in the form of an 'd string (or NULL
in case of error, e.g. an invalid arithmetic operation). The function and its associated helper functions are complex and lengthy, but here's the main highlights:x - y
is x y -
, and that of 3 + 4 × (2 − 1)
is 3 4 2 1 − × +
.Take a minute to read the
arithm_expand()
function's code in our .During , the shell takes the result(s) of parameter expansion, command substitution, and arithmetic expansion, and splits them into one or more parts, which we call fields (hence the name, field splitting). The process depends on the value of the
$IFS
shell variable. IFS is a historical term that stands for Internal (or Input) Field Separators, and it originates from the time when Unix shells didn't have a builtin array type. As a workaround, early Unix shells had to find another way of representing multi-member arrays. The shell would join the array members in a single string, separated by spaces. When the shell needs to retrieve the array members, it breaks down the string into one or more fields. The
$IFS
variable tells the shell where exactly to break that string. The shell interprets
$IFS
characters as follows (this description is ):$IFS
is a space, tab, and newline, or if the variable is not set, any sequence of space, tab, or newline characters at the beginning or end of the input shall be ignored and any sequence of those characters within the input shall delimit a field.$IFS
is null, no field splitting shall be performed.$IFS
white space shall be ignored at the beginning and end of the input. (b) Each occurrence in the input of an $IFS
character that is not $IFS
white space, along with any adjacent $IFS
white space, shall delimit a field, as described previously. (c) Non-zero-length $IFS
white space shall delimit a field.To perform the expansion, we'll define the
field_split()
function, which has the following prototype:struct word_s *field_split(char *str);
Take a minute to read the
field_split()
function's code in our .During (also known as filename globbing), the shell matches one or more file names with a given pattern. The pattern can contain normal characters (which are matched to themselves), in addition to the special characters
*
, ?
, and []
, which are also known as the globbing characters. The asterisk
*
matches any length of characters (including zero characters), the ?
matches one character, and the brackets introduce a . The result of the expansion is the list of files whose names match the pattern.To perform the expansion, we'll define the
pathnames_expand()
function, which has the following prototype:struct word_s *pathnames_expand(struct word_s *words);
*
, ?
, and []
), by calling the helper function has_glob_chars()
, which we'll define in the source file . If the word contains globbing characters, we treat it as the pattern we need to match; otherwise, we move to the next word (if any)..
and ..
(which point to the current directory and the parent directory, respectively). We delegate pattern matching to another helper function, get_filename_matches()
, which we'll define in the same source file .Take a minute to read the
pathnames_expand()
function's code in our .Double quotes as similar to single quotes, except that the shell recognizes the special meaning of the backquote, backslash, and
$
sign. That is, the shell can perform word expansion inside double-quoted strings.To perform quote removal, we'll define the
remove_quotes()
function, which has the following prototype:void remove_quotes(struct word_s *wordlist);
Take a minute to read the
remove_quotes()
function's code in our .Our main function is
, which we'll define in the source file :word_expand()
struct word_s *word_expand(char *orig_word);
~
, "
, '
, `
, =
, \
, and $
.substitute_word()
, which in turn will call the appropriate word expansion function.field_split()
.pathnames_expand()
.remove_quotes()
.Take a minute to read the
function's code in our .word_expand()
In Part II of this tutorial, we wrote our
function, which we used to get input tokens. So far, our tokenize()
function doesn't know how to handle quoted strings and escaped characters. To add this functionality, we need to update our code. Open the tokenize()
file and add the following code to the scanner.c
function, right after the tokenize()
switch
statement's opening brace: case '"':
case '\'':
case '`':
add_to_buf(nc);
i = find_closing_quote(src->buffer+src->curpos);
if(!i)
{
src->curpos = src->bufsize;
fprintf(stderr, "error: missing closing quote '%c'\n", nc);
return &eof_token;
}
while(i--)
{
add_to_buf(next_char(src));
}
break;
case '\\':
nc2 = next_char(src);
if(nc2 == '\n')
{
break;
}
add_to_buf(nc);
if(nc2 > 0)
{
add_to_buf(nc2);
}
break;
case '$':
add_to_buf(nc);
nc = peek_char(src);
if(nc == '{' || nc == '(')
{
i = find_closing_brace(src->buffer+src->curpos+1);
if(!i)
{
src->curpos = src->bufsize;
fprintf(stderr, "error: missing closing brace '%c'\n", nc);
return &eof_token;
}
while(i--)
{
add_to_buf(next_char(src));
}
}
else if(isalnum(nc) || nc == '*' || nc == '@' || nc == '#' ||
nc == '!' || nc == '?' || nc == '$')
{
add_to_buf(next_char(src));
}
break;
Open the
file, navigate to the beginning of the executor.c
function and find the following lines (I reduced the loop to do_simple_command()
...
to preserve space): int argc = 0;
long max_args = 255;
char *argv[max_args+1];
char *str;
while(child)
{
...
}
argv[argc] = NULL;
int argc = 0;
int targc = 0;
char **argv = NULL;
char *str;
while(child)
{
str = child->val.str;
struct word_s *w = word_expand(str);
if(!w)
{
child = child->next_sibling;
continue;
}
struct word_s *w2 = w;
while(w2)
{
if(check_buffer_bounds(&argc, &targc, &argv))
{
str = malloc(strlen(w2->data)+1);
if(str)
{
strcpy(str, w2->data);
argv[argc++] = str;
}
}
w2 = w2->next;
}
free_all_words(w);
child = child->next_sibling;
}
if(check_buffer_bounds(&argc, &targc, &argv))
{
argv[argc] = NULL;
}
With this code, the executor calls
on each command argument and adds the expanded word(s) to the arguments list, which we'll eventually pass on to the command. The list can grow as much as needed, thanks to our word_expand()
function, which allocates memory to the buffer as needed.check_buffer_bounds()
free_buffer(argc, argv);
in three different places: after executing a builtin utility, if
fork()
returns error status (which means we can't execute an external command), and after waitpid()
has returned. Check out the updated executor code in .make
If everything goes well,
should not output anything (apart from a few harmless warnings), and there should be an executable file named gcc
shell
in the current directory:Now invoke the shell by running
./shell
, and try using our word expansion functions and checking the results:$ echo *
Makefile build builtins executor.c executor.h initsh.c main.c node.c node.h parser.c parser.h pattern.c prompt.c scanner.c scanner.h shell shell.h shunt.c source.c source.h strings.c symtab wordexp.c
$ echo '*'
*
$ echo ~
/home/user
$ echo ~/Downloads
/home/user/Downloads
$ echo ${A=value}
value
$ echo $A
value
$ echo $((2+7))
9
In the next part, we’ll talk about positional parameters and special parameters, what they are, why we need them, and how to implement them in our shell.
Stay tuned!Previously published at