[Index]

2. Project 1: Command Shell

This assignment will give you a chance to warm up your C programming skills, see what the UNIX operating system offers in terms of system calls, and also let you implement one of the most important system applications: a command shell. Your command shell must be able to run programs on Linux platforms, and must support basic IO redirection and piping between commands. (If you want to take it further than that, there are several extra credit options you can complete as well.)

Before you get started, you will need to set up your code repository properly. Please read F.3 Version Control for details on how to get started. You should also read F.4 Virtual Machine for information on how to download and set up the 32-bit Linux virtual machine you can use for your software development. (Fortunately, this assignment may be developed on 32-bit or 64-bit *NIX platforms, but you should still try compiling and testing your code on 32-bit Linux so that you catch any compilation issues.)

You will complete your command shell in the src/shell directory for this assignment. Note that this is a completely stand-alone assignment; it should not make use of any Pintos source code.

2.1 Background

2.1.1 Overview

Command shells are the most basic tools that facilitate user interaction with the operating system, and one of the oldest. Although modern operating systems with graphical user interfaces no longer require users to work at a command prompt, most OSes still provide a command shell for lower level tasks.

There are a variety of UNIX command shells, each with its own strengths, but virtually all of them use a very basic syntax for specifying commands. For example:

grep Allow logfile.txt

The command is "grep", and the two arguments are "Allow" and "logfile.txt". The command shell runs the program "grep" (wherever it might appear on the current filesystem path), and passes it an array of 3 arguments:

char *argv[] = { "grep", "Allow", "logfile.txt", NULL };

(The argument list is always terminated with a NULL value. In this case, argc = 3 even though there are four elements in argv.)

Note that the command is tokenized by whitespace (spaces and tabs). If we want an argument to preserve its whitespace, it must be enclosed in double quotes, like this:

grep " Allowing access" logfile.txt

Now, the argument array is as follows:

char *argv[] = { "grep", " Allowing access", "logfile.txt", NULL };

2.1.2 Built-In Commands

You may wonder how your shell will support commands like ls, rm and cat, but the good news is that your computer provides these commands and many more as programs in the /bin directory. Thus, your shell will automatically support all of these commands once you can fork a child process and execute a program.

However, not all commands can be implemented this way. There are two specific commands that must be supported as built-in commands:

cd or chdir - Changes the current working directory of the shell. If no argument is specified, the shell should change to the user's home directory, otherwise the argument is the directory to change to.

exit - Causes the shell to terminate.

2.1.3 Input/Output Redirection

Most command shells also allow you to redirect standard input from a file to a process, or redirect standard output from a process to a file. For example, we can type:

grep Allow < logfile.txt > output.txt

Now, instead of taking its standard input from the console, grep will see the contents of logfile.txt on its standard input. Similarly, when grep prints out its results, they will automatically go into the file output.txt instead of being displayed on the console.

Note that whitespace is not required around the < and > characters; for example, this is a valid (albeit ugly) command:

grep Allow<logfile.txt>output.txt

2.1.4 Pipes

Besides being able to redirect input and output to various locations, most command shells also support piping the output of one process to the input of another process. For example, one might do the following:

grep Allow < logfile.txt | grep -v google | sort | uniq -c > out.txt

In this example, four processes are started:

The first process runs the grep program, and receives the arguments {"grep", "Allow", NULL}. Its standard input is the contents of the file logfile.txt.
The standard output from the first process is piped to the standard input of the second process, which also runs the grep program. The second program receives the arguments {"grep", "-v", "google", NULL}.
The standard output from the second process is piped to the standard input of the third process, which runs the sort utility. It receives the argument array {"sort", NULL}, and that's it.
The standard input from the third process is piped to the standard input of the fourth process, which runs the uniq utility. The uniq program receives the arguments {"uniq", "-c", NULL} and its standard output is redirected into the output file "out.txt".

Note that such commands are processed from left to right. As before, pipes do not require whitespace around them, so you can also type the above as:

grep Allow<logfile.txt|grep -v google|sort|uniq -c>out.txt

The parsing is clearly not trivial, particularly in the context of double-quoted strings. If a pipe or redirection character appears within a double-quoted string then it is ignored. The shell must break the input command-string into a sequence of tokens using whitespace, "|" pipe characters, and the redirection characters ">" and "<", unless of course it sees a double-quote in which case it will consume characters until it reaches the closing double-quote.

Once the command-string is tokenized, individual commands can be identified by searching for the "|" pipe tokens in the sequence, and then within each command the redirection characters can be processed as necessary.

2.2 Requirements

To receive full credit, your submission for Project 1 must include all aspects described in this section.

2.2.1 Design Document

Before you turn in your project, you must copy the project 1 design document template into your source tree under the name src/shell/DESIGNDOC and fill it in. We recommend that you read the design document template before you start working on the project. See section D. Project Documentation, for a sample design document that goes along with a fictitious project.

(Note: You don't have to put the commit-hash into the design document that you check into the repository, since that will obviously depend on the rest of the commit! It only needs to be in the one you submit on the course website.)

2.2.2 Shell Program

You should implement your shell in the C file mysh.c (for "my shell"). Feel free to add other header or source files if you prefer, but your main() method is expected to be in this file.

It should be possible to build your shell program with the provided Makefile. If you add source files you will need to modify the Makefile's contents.

2.2.3 Shell Prompt

Your shell should present a prompt that contains the username and the entire current working directory, in a form something like this:

username:current/working/directory>

(A description of various helpful UNIX functions follows in the next section; the current username and working directory are both available via function calls.)

The command the user types should be on the same line as the prompt, immediately following the prompt.

2.2.4 Shell Behavior

Your shell implementation should support all functionality described in the 2.1 Background section above, including the built-in commands, forking and executing commands, input/output redirection, and piped commands. It should be able to parse commands of the format outlined above as well, including double-quoted strings containing internal spaces, and redirection/pipe symbols <, > and |, both with and without spaces on one or both sides of the symbols.

Your shell implementation should only create the minimum number of processes necessary to execute commands. For example, given cmd, the shell should only spawn one child process. Given cmd1 | cmd2, the shell should only spawn two child processes. Additionally, you should be careful to release resources once they are no longer needed; file descriptors should be closed (except for stdin/stdout/stderr, of course), memory should be released, and zombie processes should be reclaimed.

You should assume that all commands will be entered on a single line; commands will never contain internal newline characters. Also, you can assume that commands will be < 1KiB in length.

You can also assume that piping and redirection will not be used in bizarre or meaningless ways, e.g. someprog > foo.txt | anotherprog. (In this example, standard output is redirected to foo.txt and then it is piped to the next program; this doesn't make much sense. Widely used shells like Bash will parse and execute such commands, but you don't have to.) Your shell only has to handle syntactically correct commands.

In your code, you should not use the literals 0/1/2 for the stdin/stdout/stderr file descriptors; rather, use the constants STDIN_FILENO, STDOUT_FILENO, and STDERR_FILENO.

2.2.5 Error Handling

Your shell should be resilient to all errors that can be reported by the standard UNIX functions you use. For example, a command might not be found, a fork() operation might fail, a file receiving a program's output might not be able to be created, etc. Make sure you read the documentation for all API calls you use, and gracefully handle and report any errors that occur.

Note that the C standard header file errno.h includes some very helpful functions for reporting useful error messages based on the error codes returned from the standard API calls.

2.3 Useful UNIX Functions

This section points out some of the standard functions that you might find really useful for this assignment. You are not required to use all of them; some will be necessary to implement the specified functionality, but others are simply only one option for the implementation.

2.3.1 The `man` Utility

You will need to use the UNIX file API and the UNIX process API for this assignment. However, there are too many functions for us to enumerate and describe all of them. Therefore you must become familiar with the man utility, if you aren't already. Running the command man command will display information about that command (called a "man page"), and specifically, man unix_func will display the man page for the UNIX function unix_func(). So, when you are looking at the UNIX functions needed to implement this assignment, use man to access detailed information about them.

The man program presents you with a simple page of text about the command or function you are interested in, and you can navigate the text using these commands:

Down arrow goes forward one line
Up arrow goes back one line
"f" or Spacebar goes forward a page
"b" goes back a page
"G" goes to the end of the man page
"g" goes to the start of the man page
"q" exits man

One problem with man is that there are often commands and functions with the same name; the UNIX command "open" and the UNIX file API function "open()" are an example of this. To resolve situations like this, man collects keywords into groups called "sections"; when man is run, the section to use can also be specified as an argument to man. For example, all shell commands are in section "1". (You can see this when you run man; for example, when you run "man ls" you will see the text LS(1) at the top of the man page.) Standard UNIX APIs are usually in section 2, and standard C APIs are usually in section 3.

So, if you run "man open", you will see the documentation for the open command from section 1. However, if you run "man 2 open", you will see the description of the open() API call, along with what header file to include when you use it, and so forth.

You can often even look at some of the libraries of functions by using the name of the header file. For example, "man string" (or "man 3 string") will show you the functions available in string.h, and "man stdio" will show you the functions available in stdio.h.

2.3.2 Console I/O Functions

You can use printf() and scanf() (declared in stdio.h) for your input and output, although it is probably better to use fgets() to receive the command from the user. Do not use gets(), ever!!! You should always use fgets(stdio, ...) instead of gets() since it will allow you to specify the buffer length. Using gets() virtually guarantees that your program will contain buffer overflow exploits.

2.3.3 String Manipulation Functions

The C standard API includes many string manipulation functions for you to use in parsing commands. These functions are declared in the header file string.h. You can either use these functions, or you can analyze and process command strings directly.

strchr(): Looks for a character in a string.
strcmp(): Compares one string to another string.
strcpy(): Copies a string into an existing buffer; does not perform allocation. Consider using strlcpy()() for safety.
strdup(): Makes a copy of a string into a newly heap-allocated chunk of memory, which must later be free()d.
strlen(): Returns the length of a string.
strstr(): Looks for a substring in another string.

2.3.4 Process Management Functions

The unistd.h header file includes standard process management functions like forking a process and waiting for a process to terminate.

getpwuid()

getuid()

Reports the user ID and other details of the user that owns the process. This is useful for the command prompt. You will use them something like this: getpwuid(getuid()). (Note that there is another standard UNIX call getlogin(), but it seems to not work on many platforms.)

getcwd()

Reports the current working directory of a process. This is also useful for the command prompt.

chdir()

Changes the current working directory of the process that calls it.

fork()

Forks the calling process into a parent and a child process.

wait()

Waits for a child process to terminate, and returns the status of the terminated process. Note that a process can only wait for its own children; it cannot wait e.g. for grandchildren or for other processes. This constrains how command-shells must start child processes for piped commands.

execve()

execvp()

The execve() function loads and runs a new program into the current process. However, this function doesn't search the path for the program, so you always have to specify the absolute path to the program to be run. However, there are a number of wrappers to the execve() function. One of these is execvp(), and it examines the path to find the program to run, if the command doesn't include an absolute path.

Be careful to read the man page on execvp() so that you satisfy all requirements of the argument array. (Note that once you have prepared your argument array, your call will be something like execvp(argv[0], argv).)

2.3.5 Filesystem and Pipe Functions

open()

Opens a file, possibly creating and/or truncating it when it is opened, depending on the mode argument. If you use open() to create a file, you can specify 0 for the file-creation flags.

creat()

Creates a file (although why not use open() instead?).

close()

Closes a file descriptor.

stat()

Fetches status information for the specified file, such as whether it is a file or a directory, the file's size, and so forth. This is useful for checking whether something is actually a file and not a directory, etc.

dup()

dup2()

These functions allow a file descriptor to be duplicated. dup2() will be the useful function to you, since it allows you to specify the number of the new file descriptor to duplicate into. It is useful for both piping and redirection.

pipe()

Creates a pipe, and then returns the two file descriptors that can be used to interact with the pipe. This function can be used to pipe the output of one process into the input of another process:

The parent process creates a new pipe using pipe()
The parent process fork()s off the child process. Of course, this means that the parent and the child each have their own pair of read/write file-descriptors to the same pipe object.
The parent process closes the read-end of the pipe (since it will be outputting to the pipe), and the child process closes the write-end of the pipe (since it will be reading from the pipe).
The parent process uses dup2() to set the write-end of the pipe to be its standard output, and then closes the original write-end (to avoid leaking file descriptors).
Similarly, the child process uses dup2() to set the read-end of the pipe to be its standard input, and then closes the original read-end (to avoid leaking file descriptors).

2.4 Suggested Order of Implementation

This project will best be suited by breaking the command-shell functionality into several components that can be implemented separately, then integrated together as they are completed. Anytime development proceeds in parallel, you should take the time to design the integration points up front! This can be done in a team meeting; once the interface points are well defined, team members can work on the components independently.

Additionally, if operations will be exposed via specific functions, it can be very helpful to create a stub of the function that other teammates can call. A "stub" is simply a placeholder that can be called, but that itself does nothing. This will allow other components to invoke functionality as if it were already completed, while it is still being written. Once the functionality is completed, the stub can be replaced with the actual implementation. This is a powerful technique that can greatly facilitate parallel software development, as long as time is taken to design the interface points up-front!

Here is a list of tasks to consider implementing as separate components:

Given a user-specified command string, tokenize it into a sequence of tokens based on the parsing rules described earlier in this assignment. Don't forget that when pipe and redirection characters appear within a double-quoted string, they are ignored.
Given a sequence of command tokens, convert it into a sequence of one or more structs, each of which represents one of the commands to be executed. (If the original input is e.g. "ls -l", there would be only one command struct, but if the command were "grep foo input.txt | sort | uniq" then there would be three command structs.)
The command-struct would hold details like input/output/error redirection (if any), the number of tokens for the command, and the array of command-line arguments for the command. A sequence of commands might be represented as a linked list, or as an array, etc.
Handle forking off and executing an external command, including setting up redirection in the child process, pipes (this one will be challenging), and waiting for the command to complete in the shell process.
Important Note: As stated earlier, a process can only wait for its immediate children to terminate. This means that if the user types a piped sequence of commands, each of these children must be forked by the shell process itself; one child may not fork off the next child. So while I/O redirection must be performed within the child process, the pipes must be created in the shell process before forking off the children.
Handle internal commands.
The mainloop for the command shell to execute, using the above functions.

Don't try to get the entire shell working at once. Rather, get pieces of functionality working, commit them to your repository, then add a bit more. And as always, test along the way, so that if things suddenly break on you, you won't have too much code to debug. For example:

Get execution of an internal command working.
Get execution of a single external command (no redirection/pipes) working.
Get simple redirection with no pipes working.
Get pipes working.

2.5 Testing

We have not yet created automated tests for this project. In the tests directory you will find several helpful utility programs that can be used to test your shell, as well as a "testing script" that will be followed when we test your shell. Make sure to test your work along these lines, to avoid losing points that you don't have to lose!

The test programs provided are as follows:

sleepycat: This command sleeps for a specified amount of time, and then behaves like the cat utility. (See "man cat" for details on cat.) This can be used to ensure that your command shell properly waits for commands to terminate before showing the next command prompt. It is especially useful in testing piped command sequences.
cattysleep: This command is like sleepycat, but it sleeps after performing its catlike duties.
outerr: This command writes "stdout" to stdout, and "stderr" to stderr, and then terminates. It is useful for testing redirection of stderr, among other things.

You can build them by typing make in the tests directory.

2.6 Extra Credit

If you want a greater challenge, here are some extra credit tasks you can complete. (Maximum score on the assignment is 110 points.)

Support multiline commands. If the input line ends with the "\" backslash character, the shell should read a second line of text, concatenating it onto the first line of text. This process repeats as long as lines end with a backslash; when the line does not end with a backslash, it is the last line of the command. (+1 point)
In addition to > output redirection, shells also support the >> sequence to cause the process to append to an existing file instead of truncating it. (+1 point)
It should be obvious that you can't redirect standard error with >, since this redirects standard output. Many shells also allow redirection to be specified like this: n>, where n is an integer file descriptor. This allows standard error to be redirected to a file by typing ... 2> errors.txt. (+1 point)
Similar to the previous example, shells also often support more advanced redirection using this mechanism: a>&b, where a and b are integer file descriptors. This causes the shell to duplicate file-descriptor b into file-descriptor a. For example, the command "someprog > output.txt 2>&1" causes the standard output from someprog to end up in output.txt, but the standard output stream (file descriptor 1) is duplicated into the standard error stream (file descriptor 2) so that any output to standard error will end up in the same place as the standard output stream. (+1 point)
Use the readline library to allow users of your shell to scroll through old commands using the up and down arrows, and to edit commands in place. If you do this, add a "history" built-in command that lists all old commands that have been executed in order. (+2 points)
If you complete the previous item, number your commands in your history output (starting with 1), and allow a user to rerun an old command by typing !n, where n is the number of the old command. (+1 point)
Shell commands can end with an ampersand "&" character, causing them to be run in the background instead of the foreground. Instead of waiting for the child process to terminate, the parent should go on and allow the user to type other commands. Supporting this functionality is harder than you might think; you will need a SIGCHLD signal handler to listen for when a child process is completed. Child process termination should be reported, but not while the command shell is waiting for another command to complete (i.e. while another command is running in the foreground). (+4 points)