visit
This is part IV of a tutorial on how to build a Linux shell. You can read the previous parts of this tutorial from the following links: part I, part II, part III.
NOTE: You can download the complete source code for Part IV from .
In this part, we're going to add to our shell. The symbol table is a data structure that is used by and to store as entries in the table. Each entry consists of a key (the variable's name) and an associated value (the variable's value). Keys are usually unique, that is, we can't have two entries that share the same key (i.e., can't have two variables that share the same variable name).
Typically, a Linux shell populates its symbol table on startup. After populating the symbol table, the compiler or interpreter can easily search for a variable in the table to retrieve that variable's value. We can also perform , enforce (e.g., to make a variable visible only to the function in which it was declared), and to external commands.To populate the symbol table, the shell reads the environment variables list, which is passed to the shell from its parent process (usually the process that logged the user in, or a child process of the login process). The shell adds each variable (and its value) to the symbol table. Afterwards, we can edit, remove, or export shell variables at will, using the proper builtin utilities (which we'll talk about later on in this series).
Whenever you ask the shell to echo, export, or unset the value of a shell variable, you are effectively asking the shell to access and/or modify its symbol table. All shells have some sort of symbol table implementation, although some shells might have different names for it.
For example, say you invoked the following command:echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
You probably know that the
echo
command had nothing to do with the output you see on the screen, except for the fact that echo
printed the path out. It is the shell who actually understood that $PATH
stands for a shell variable name. It is also the shell who replaced the word
$PATH
with the actual path value, which was then passed to echo
. The echo
command simply echoed back the argument it was passed by the shell, which is the executable path you see printed on the screen.So in order to be able to define, modify, unset, and export shell variables, we first need to implement our symbol table. Let's see how we can do that next.There are different ways to implement a symbol table, the common ones being , , and . There are pros and cons to each method, and we don't have the time, nor the space, to discuss each one in detail. For our purposes, we'll use linked lists, which are the easiest to implement, and which fair well in terms of speed of access and memory usage.
(NOTE: If you want to use the shell for anything other than learning, you should consider changing the symbol table implementation to one that uses hash tables or binary trees. You can find an example of the hash table implementation ).
Now let's get cracking on that code. In your source directory, create a subdirectory named
(invoke symtab
from your terminal emulator). Navigate to that directory (mkdir symtab
cd symtab
) and create a file named symtab.h
. Add the following code to the header file you've just created:#ifndef SYMTAB_H
#define SYMTAB_H
#include "../node.h"
#define MAX_SYMTAB 256
/* the type of a symbol table entry's value */
enum symbol_type_e
{
SYM_STR ,
SYM_FUNC,
};
/* the symbol table entry structure */
struct symtab_entry_s
{
char *name;
enum symbol_type_e val_type;
char *val;
unsigned int flags;
struct symtab_entry_s *next;
struct node_s *func_body;
};
/* the symbol table structure */
struct symtab_s
{
int level;
struct symtab_entry_s *first, *last;
};
/* values for the flags field of struct symtab_entry_s */
#define FLAG_EXPORT (1 << 0) /* export entry to forked commands */
/* the symbol table stack structure */
struct symtab_stack_s
{
int symtab_count;
struct symtab_s *symtab_list[MAX_SYMTAB];
struct symtab_s *global_symtab, *local_symtab;
};
struct symtab_s *new_symtab(int level);
struct symtab_s *symtab_stack_push(void);
struct symtab_s *symtab_stack_pop(void);
int rem_from_symtab(struct symtab_entry_s *entry, struct symtab_s *symtab);
struct symtab_entry_s *add_to_symtab(char *symbol);
struct symtab_entry_s *do_lookup(char *str, struct symtab_s *symtable);
struct symtab_entry_s *get_symtab_entry(char *str);
struct symtab_s *get_local_symtab(void);
struct symtab_s *get_global_symtab(void);
struct symtab_stack_s *get_symtab_stack(void);
void init_symtab(void);
void dump_local_symtab(void);
void free_symtab(struct symtab_s *symtab);
void symtab_entry_setval(struct symtab_entry_s *entry, char *val);
#endif
The
enumeration defines the types of our symbol table entries. We'll use the type symbol_type_e
SYM_STR
to represent shell variables, and SYM_FUNC
to represent functions (we'll deal with shell functions later on in this series).The
struct symtab_entry_s
structure represents our symbol table entries. The structure contains the following fields:name
=> the name of the shell variable (or function) represented by this entry.val_type
=> SYM_STR
for shell variables, SYM_FUNC
for shell functions.val
=> string value (for shell variables only).flags
=> indicates the different properties we'll assign to variables and functions, such as the export and readonly flags (we'll deal with these later in this series).next
=> pointer to the next symbol table entry (because we're implementing the table as a singly linked list).func_body
=> for shell functions, the Abstract Syntax Tree, or AST, of the function body (we talked about AST's in part I of this tutorial).The
struct symtab_s
structure represents a single symbol table. For starters, we'll use one symbol table, in which we'll define all our shell variables. Later on, when we discuss shell functions and begin working with script files, we'll need to define more symbol tables. The zeroth symbol table will be the global table, in which we'll define our global variables (the ones that are accessible to the shell, and all functions and scripts executed by it).
Symbol tables number one and above are the local tables, in which we'll define our local variables (the ones that are only accessible to the shell function or script in which they were declared). By cascading symbol tables in this way, we are effectively implementing .
Our
struct symtab_s
structure contains the following fields:level
=> 0 for the global symbol table, 1 and above for local symbol tables.first
, last
=> pointers to the first and last entries in the table's linked list, respectively.Now to be able to cascade symbol tables as we discussed above, we need to define and implement a symbol table . A stack is a Last-In-First-Out, or LIFO, data structure, in which the last item added (or pushed) is the first item removed (or popped). The
struct symtab_stack_s
structure represents our symbol table stack. The structure contains the following fields:symtab_count
=> the number of symbol tables currently on the stack.symtab_list
=> an array of pointers to the stack's symbol tables. The zeroth item points to the global symbol table, and the symtab_count-1
item points to the last (or local) symbol table. The stack can hold up to MAX_SYMTAB
entries, which we defined at the beginning of the header file to be 256.global_symtab
, local_symtab
=> pointers to the global and local symbol tables, respectively (for ease of access).Create the
file (in the symtab.c
subdirectory) and start by adding the following code:symtab
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "../shell.h"
#include "../node.h"
#include "../parser.h"
#include "symtab.h"
struct symtab_stack_s symtab_stack;
int symtab_level;
void init_symtab(void)
{
symtab_stack.symtab_count = 1;
symtab_level = 0;
struct symtab_s *global_symtab = malloc(sizeof(struct symtab_s));
if(!global_symtab)
{
fprintf(stderr, "fatal error: no memory for global symbol table\n");
exit(EXIT_FAILURE);
}
memset(global_symtab, 0, sizeof(struct symtab_s));
symtab_stack.global_symtab = global_symtab;
symtab_stack.local_symtab = global_symtab;
symtab_stack.symtab_list[0] = global_symtab;
global_symtab->level = 0;
}
symtab_stack
=> pointer to our symbol table stack (we only need one stack per shell).symtab_level
=> our current level in the stack (0 if we're working with the global symbol table, non-zero otherwise).The
init_symtab()
function initializes our symbol table stack, then allocates memory for, and initializes, our global symbol table.Next, add the following function:struct symtab_s *new_symtab(int level)
{
struct symtab_s *symtab = malloc(sizeof(struct symtab_s));
if(!symtab)
{
fprintf(stderr, "fatal error: no memory for new symbol table\n");
exit(EXIT_FAILURE);
}
memset(symtab, 0, sizeof(struct symtab_s));
symtab->level = level;
return symtab;
}
We'll call the
new_symtab()
function whenever we want to create a new symbol table (for example, when we're about to execute a shell function).Next, add the following function:void free_symtab(struct symtab_s *symtab)
{
if(symtab == NULL)
{
return;
}
struct symtab_entry_s *entry = symtab->first;
while(entry)
{
if(entry->name)
{
free(entry->name);
}
if(entry->val)
{
free(entry->val);
}
if(entry->func_body)
{
free_node_tree(entry->func_body);
}
struct symtab_entry_s *next = entry->next;
free(entry);
entry = next;
}
free(symtab);
}
We'll call the
free_symtab()
function when we're done working with a symbol table, and we want to free the memory used by the symbol table and its entries.Next, we'll define a debugging function:void dump_local_symtab(void)
{
struct symtab_s *symtab = symtab_stack.local_symtab;
int i = 0;
int indent = symtab->level * 4;
fprintf(stderr, "%*sSymbol table [Level %d]:\r\n", indent, " ", symtab->level);
fprintf(stderr, "%*s===========================\r\n", indent, " ");
fprintf(stderr, "%*s No Symbol Val\r\n", indent, " ");
fprintf(stderr, "%*s------ -------------------------------- ------------\r\n", indent, " ");
struct symtab_entry_s *entry = symtab->first;
while(entry)
{
fprintf(stderr, "%*s[%04d] %-32s '%s'\r\n", indent, " ",
i++, entry->name, entry->val);
entry = entry->next;
}
fprintf(stderr, "%*s------ -------------------------------- ------------\r\n", indent, " ");
}
This function prints the contents of the local symbol table. When our shell starts, the local and global symbol tables will refer to the same table. It is only when the shell is about to run a shell function or script file that we have a local table that is different from the global table. (later on in this lesson, we'll write a builtin utility that will call
dump_local_symtab()
to help us visualize the contents of our shell's global symbol table).Now let's define some functions to help us work with symbol table entries. In the same file (
symtab.c
), add the following function:struct symtab_entry_s *add_to_symtab(char *symbol)
{
if(!symbol || symbol[0] == '\0')
{
return NULL;
}
struct symtab_s *st = symtab_stack.local_symtab;
struct symtab_entry_s *entry = NULL;
if((entry = do_lookup(symbol, st)))
{
return entry;
}
entry = malloc(sizeof(struct symtab_entry_s));
if(!entry)
{
fprintf(stderr, "fatal error: no memory for new symbol table entry\n");
exit(EXIT_FAILURE);
}
memset(entry, 0, sizeof(struct symtab_entry_s));
entry->name = malloc(strlen(symbol)+1);
if(!entry->name)
{
fprintf(stderr, "fatal error: no memory for new symbol table entry\n");
exit(EXIT_FAILURE);
}
strcpy(entry->name, symbol);
if(!st->first)
{
st->first = entry;
st->last = entry;
}
else
{
st->last->next = entry;
st->last = entry;
}
return entry;
}
This function adds a new entry to the local symbol table. Remember, at the beginning of this lesson, I've said that each entry must have a unique key, which is the name we give to the shell variable or function. To ensure this uniqueness, we first check to see if an entry exists with the given name, by calling
do_lookup()
(which we'll define in a minute). If an entry with the given name exists, we simply return the existing entry, without adding a new one. Otherwise, we add the entry, set its name, and adjust the symbol table's pointers. Lastly, we return the newly added entry.The next function does the opposite work, i.e. it removes the symbol table entry whose key matches the given name:int rem_from_symtab(struct symtab_entry_s *entry, struct symtab_s *symtab)
{
int res = 0;
if(entry->val)
{
free(entry->val);
}
if(entry->func_body)
{
free_node_tree(entry->func_body);
}
free(entry->name);
if(symtab->first == entry)
{
symtab->first = symtab->first->next;
if(symtab->last == entry)
{
symtab->last = NULL;
}
res = 1;
}
else
{
struct symtab_entry_s *e = symtab->first;
struct symtab_entry_s *p = NULL;
while(e && e != entry)
{
p = e;
e = e->next;
}
if(e == entry)
{
p->next = entry->next;
res = 1;
}
}
free(entry);
return res;
}
struct symtab_entry_s *do_lookup(char *str, struct symtab_s *symtable)
{
if(!str || !symtable)
{
return NULL;
}
struct symtab_entry_s *entry = symtable->first;
while(entry)
{
if(strcmp(entry->name, str) == 0)
{
return entry;
}
entry = entry->next;
}
return NULL;
}
Otherwise, the function follows the linked list pointers to look at each entry, in turn, until we find an entry whose key matches our desired name. If no match is found, we return
NULL
.Next, add the following function:struct symtab_entry_s *get_symtab_entry(char *str)
{
int i = symtab_stack.symtab_count-1;
do
{
struct symtab_s *symtab = symtab_stack.symtab_list[i];
struct symtab_entry_s *entry = do_lookup(str, symtab);
if(entry)
{
return entry;
}
} while(--i >= 0);
return NULL;
}
This function searches for a symbol table entry whose key matches the given name. This might seem redundant at first, as we've already defined the
do_lookup()
function to search the local symbol table. The difference here is that get_symtab_entry()
searches the whole stack, starting with the local symbol table. At the moment, this distinction is not important, as our local and global symbol tables refer to the one and same table. It is only when we talk about shell functions and script files that you'll appreciate what this function does (so hang tight in there!).Lastly, add the following function:void symtab_entry_setval(struct symtab_entry_s *entry, char *val)
{
if(entry->val)
{
free(entry->val);
}
if(!val)
{
entry->val = NULL;
}
else
{
char *val2 = malloc(strlen(val)+1);
if(val2)
{
strcpy(val2, val);
}
else
{
fprintf(stderr, "error: no memory for symbol table entry's value\n");
}
entry->val = val2;
}
}
Add the following code to the same source file,
:symtab.c
void symtab_stack_add(struct symtab_s *symtab)
{
symtab_stack.symtab_list[symtab_stack.symtab_count++] = symtab;
symtab_stack.local_symtab = symtab;
}
struct symtab_s *symtab_stack_push(void)
{
struct symtab_s *st = new_symtab(++symtab_level);
symtab_stack_add(st);
return st;
}
struct symtab_s *symtab_stack_pop(void)
{
if(symtab_stack.symtab_count == 0)
{
return NULL;
}
struct symtab_s *st = symtab_stack.symtab_list[symtab_stack.symtab_count-1];
symtab_stack.symtab_list[--symtab_stack.symtab_count] = NULL;
symtab_level--;
if(symtab_stack.symtab_count == 0)
{
symtab_stack.local_symtab = NULL;
symtab_stack.global_symtab = NULL;
}
else
{
symtab_stack.local_symtab = symtab_stack.symtab_list[symtab_stack.symtab_count-1];
}
return st;
}
struct symtab_s *get_local_symtab(void)
{
return symtab_stack.local_symtab;
}
struct symtab_s *get_global_symtab(void)
{
return symtab_stack.global_symtab;
}
struct symtab_stack_s *get_symtab_stack(void)
{
return &symtab_stack;
}
symtab_stack_add()
adds the given symbol table to the stack, and assigns the newly added table as the local symbol table.symtab_stack_push()
creates a new, empty symbol table and pushes it on top of the stack.symtab_stack_pop()
removes (or pops) the symbol table on top of the stack, adjusting the stack pointers as needed.get_local_symtab()
and get_global_symtab()
return pointers to the local and global symbol tables, respectively.get_symtab_stack()
returns a pointer to the symbol table stack.Go ahead and create a source file named
initsh.c
in your source directory, and add the following code:#include <string.h>
#include "shell.h"
#include "symtab/symtab.h"
extern char **environ;
void initsh(void)
{
init_symtab();
struct symtab_entry_s *entry;
char **p2 = environ;
while(*p2)
{
char *eq = strchr(*p2, '=');
if(eq)
{
int len = eq-(*p2);
char name[len+1];
strncpy(name, *p2, len);
name[len] = '\0';
entry = add_to_symtab(name);
if(entry)
{
symtab_entry_setval(entry, eq+1);
entry->flags |= FLAG_EXPORT;
}
}
else
{
entry = add_to_symtab(*p2);
}
p2++;
}
entry = add_to_symtab("PS1");
symtab_entry_setval(entry, "$ ");
entry = add_to_symtab("PS2");
symtab_entry_setval(entry, "> ");
}
This function initializes the symbol table stack (including the global symbol table) and scans the environment list, adding each environment variable (and its value) to the table. Lastly, the function adds two variables that we'll use to store our prompt strings, PS1 and PS2 (we talked about prompt strings in part I). Don't forget to add the function prototype to your
shell.h
header file:void initsh(void);
Next, we need to call this function from our
function, before we enter the REPL loop. To do this, add the following line to main()
, right before the loop body:main()
initsh();
The last thing we need to do is to update our prompt string printing functions, so that they'll use the PS1 and PS2 variables we've just added to the global symbol table in
initsh()
. We'll also write our first builtin utility, dump
.Let's apply these changes to our code next.Open your
file. Remove the line in the body of the prompt.c
print_prompt1()
and replace it by the following code:void print_prompt1(void)
{
struct symtab_entry_s *entry = get_symtab_entry("PS1");
if(entry && entry->val)
{
fprintf(stderr, "%s", entry->val);
}
else
{
fprintf(stderr, "$ ");
}
}
The new code checks if there is a symbol table entry with the name PS1. If there is, we use that entry's value to print the first prompt string. Otherwise, we use our default builtin value, which is
$
.The code changes to the
print_prompt2()
function are similar, so I won't show it here, but you can check it at the GitHub repo link I provided at the top of this page.Don't forget to add the following include directive at the top of the file, right after the
#include "shell.h"
line:#include "symtab/symtab.h"
Many of the commands we use on daily basis, such as
cd
, echo
, export
, and readonly
are in fact builtin utilities. You can read more about shell builtin utilities in this POSIX standard .In the course of this tutorial, we'll be adding different builtin utilities to expand our shell. We'll start by defining a structure that will help us store information about our different builtin utilities.Open your
shell.h
header file and add the following code at the end, right before the #endif
directive:/* shell builtin utilities */
int dump(int argc, char **argv);
/* struct for builtin utilities */
struct builtin_s
{
char *name; /* utility name */
int (*func)(int argc, char **argv); /* function to call to execute the utility */
};
/* the list of builtin utilities */
extern struct builtin_s builtins[];
/* and their count */
extern int builtins_count;
The function prototype declares our first builtin utility,
dump
. The struct builtin_s
structure defines our builtin utilities, and has the following fields:name
=> the builtin utility name, which we'll use to invoke the utility.func
=> function pointer to the function that implements the builtin utility in our shell.We'll use the
array to store information about our builtin utilities. The array contains builtins[]
number of elements.builtins_count
Now create a new subdirectory and name it
. This is where we'll define all of our builtin utilities. Next, create the file builtins
and add the following code to it:builtins.c
#include "../shell.h"
struct builtin_s builtins[] =
{
{ "dump" , dump },
};
int builtins_count = sizeof(builtins)/sizeof(struct builtin_s);
Next, we'll add a builtin utility, named
, which we'll use to dump or print the contents of the symbol table, so we know what's going on behind the scenes (I mainly wrote this utility so that our discussion doesn't sound too abstract and theoretical).dump
In the
builtins
subdirectory, create the file dump.c
and add the following code to it:#include "../shell.h"
#include "../symtab/symtab.h"
int dump(int argc, char **argv)
{
dump_local_symtab();
return 0;
}
Simple, right? This function implements our
builtin utility, which prints the contents of the local symbol table.dump
Next, we need to tell our executor about our new builtin utility, so that when we enter the command
, the shell executes our dump
function, instead of searching for an external command with the name dump. dump()
To do this, we'll need to fix our
do_simple_command()
function.In the source directory, open the
source file and navigate to the executor.c
do_simple_command()
function's definition. Now find the two lines:argv[argc] = NULL;
pid_t child_pid = 0;
int i = 0;
for( ; i < builtins_count; i++)
{
if(strcmp(argv[0], builtins[i].name) == 0)
{
builtins[i].func(argc, argv);
free_argv(argc, argv);
return 1;
}
}
gcc -o shell executor.c initsh.c main.c node.c parser.c prompt.c scanner.c source.c builtins/builtins.c builtins/dump.c symtab/symtab.c
If everything goes well,
gcc
should not output anything, and there should be an executable file named shell
in the current directory:(If you downloaded the source files from the , you can simply run
from the source directory, and it will take care of compiling the shell).make
Now invoke the shell by running
./shell
, and try our new builtin utility, dump
:As you can see from the output, our global symbol table (level 0) contains many variables (67 in the above example), which correspond to the shell’s environment variables that were passed to us from our parent process. The last two entries represent our PS1 and PS2 variables. Try entering some other commands and see how our shell fairs in parsing and executing simple commands.
It might not be obvious right now, but what we’ve done in this part is a huge accomplishment. Our symbol table will enable us to do important things in our shell, such as performing word expansions, exporting variables to external commands, defining shell functions, and so on.
In the next part, we’ll talk about the word expansion process, and how we can add word expansion capabilities to our shell.Stay tuned!Previously published at