[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This assignment will give you a chance to warm up your C programming skills, see what the UNIX operating system offers in terms of system calls, and also let you implement one of the most important system applications: a command shell. Your command shell must be able to run programs on Linux platforms, and must support basic IO redirection and piping between commands. (If you want to take it further than that, there are several extra credit options you can complete as well.)
Before you get started, you will need to set up your code repository properly. Please read F.3 Version Control for details on how to get started. You should also read F.4 Virtual Machine for information on how to download and set up the 32-bit Linux virtual machine you can use for your software development. (Fortunately, this assignment may be developed on 32-bit or 64-bit *NIX platforms, but you should still try compiling and testing your code on 32-bit Linux so that you catch any compilation issues.)
You will complete your command shell in the src/shell
directory for
this assignment. Note that this is a completely stand-alone assignment; it
should not make use of any Pintos source code.
Command shells are the most basic tools that facilitate user interaction with the operating system, and one of the oldest. Although modern operating systems with graphical user interfaces no longer require users to work at a command prompt, most OSes still provide a command shell for lower level tasks.
There are a variety of UNIX command shells, each with its own strengths, but virtually all of them use a very basic syntax for specifying commands. For example:
grep Allow logfile.txt
The command is "grep"
, and the two arguments are "Allow"
and "logfile.txt"
. The command shell runs the program
"grep
" (wherever it might appear on the current filesystem
path), and passes it an array of 3 arguments:
char *argv[] = { "grep", "Allow", "logfile.txt", NULL };
(The argument list is always terminated with a NULL
value. In this
case, argc = 3
even though there are four elements in argv
.)
Note that the command is tokenized by whitespace (spaces and tabs). If we want an argument to preserve its whitespace, it must be enclosed in double quotes, like this:
grep " Allowing access" logfile.txt
Now, the argument array is as follows:
char *argv[] = { "grep", " Allowing access", "logfile.txt", NULL };
You may wonder how your shell will support commands like ls
,
rm
and cat
, but the good news is that your computer
provides these commands and many more as programs in the /bin
directory.
Thus, your shell will automatically support all of these commands once you can
fork a child process and execute a program.
However, not all commands can be implemented this way. There are two specific commands that must be supported as built-in commands:
cd
or chdir
- Changes the current working directory of
the shell. If no argument is specified, the shell should change to the
user's home directory, otherwise the argument is the directory to change to.
exit
- Causes the shell to terminate.
Most command shells also allow you to redirect standard input from a file to a process, or redirect standard output from a process to a file. For example, we can type:
grep Allow < logfile.txt > output.txt
Now, instead of taking its standard input from the console, grep
will see the contents of logfile.txt
on its standard input. Similarly,
when grep
prints out its results, they will automatically go into
the file output.txt
instead of being displayed on the console.
Note that whitespace is not required around the < and > characters; for example, this is a valid (albeit ugly) command:
grep Allow<logfile.txt>output.txt
Besides being able to redirect input and output to various locations, most command shells also support piping the output of one process to the input of another process. For example, one might do the following:
grep Allow < logfile.txt | grep -v google | sort | uniq -c > out.txt
In this example, four processes are started:
grep
program, and receives the
arguments {"grep", "Allow", NULL}
. Its standard input is the
contents of the file logfile.txt.
grep
program. The second
program receives the arguments {"grep", "-v", "google", NULL}
.
sort
utility. It receives
the argument array {"sort", NULL}
, and that's it.
uniq
utility. The uniq
program receives the arguments {"uniq", "-c", NULL}
and its
standard output is redirected into the output file "out.txt".
Note that such commands are processed from left to right. As before, pipes do not require whitespace around them, so you can also type the above as:
grep Allow<logfile.txt|grep -v google|sort|uniq -c>out.txt
The parsing is clearly not trivial, particularly in the context of double-quoted strings. If a pipe or redirection character appears within a double-quoted string then it is ignored. The shell must break the input command-string into a sequence of tokens using whitespace, "|" pipe characters, and the redirection characters ">" and "<", unless of course it sees a double-quote in which case it will consume characters until it reaches the closing double-quote.
Once the command-string is tokenized, individual commands can be identified by searching for the "|" pipe tokens in the sequence, and then within each command the redirection characters can be processed as necessary.
To receive full credit, your submission for Project 1 must include all aspects described in this section.
Before you turn in your project, you must copy the project 1 design document template into your source tree under the
name src/shell/DESIGNDOC
and fill it in. We recommend that you
read the design document template before you start working on the project.
See section D. Project Documentation, for a sample design document that goes along
with a fictitious project.
(Note: You don't have to put the commit-hash into the design document that you check into the repository, since that will obviously depend on the rest of the commit! It only needs to be in the one you submit on the course website.)
You should implement your shell in the C file mysh.c
(for "my
shell"). Feel free to add other header or source files if you prefer,
but your main()
method is expected to be in this file.
It should be possible to build your shell program with the provided
Makefile
. If you add source files you will need to modify the
Makefile
's contents.
Your shell should present a prompt that contains the username and the entire current working directory, in a form something like this:
username:current/working/directory>
(A description of various helpful UNIX functions follows in the next section; the current username and working directory are both available via function calls.)
The command the user types should be on the same line as the prompt, immediately following the prompt.
Your shell implementation should support all functionality described in the 2.1 Background section above, including the built-in commands, forking and executing commands, input/output redirection, and piped commands. It should be able to parse commands of the format outlined above as well, including double-quoted strings containing internal spaces, and redirection/pipe symbols <, > and |, both with and without spaces on one or both sides of the symbols.
Your shell implementation should only create the minimum number of processes
necessary to execute commands. For example, given cmd
, the shell
should only spawn one child process. Given cmd1 | cmd2
, the shell
should only spawn two child processes. Additionally, you should be careful
to release resources once they are no longer needed; file descriptors should
be closed (except for stdin/stdout/stderr, of course), memory should be
released, and zombie processes should be reclaimed.
You should assume that all commands will be entered on a single line; commands will never contain internal newline characters. Also, you can assume that commands will be < 1KiB in length.
You can also assume that piping and redirection will not be used in bizarre
or meaningless ways, e.g. someprog > foo.txt | anotherprog
.
(In this example, standard output is redirected to foo.txt
and then
it is piped to the next program; this doesn't make much sense. Widely used
shells like Bash will parse and execute such commands, but you don't have to.)
Your shell only has to handle syntactically correct commands.
In your code, you should not use the literals 0/1/2 for the stdin/stdout/stderr
file descriptors; rather, use the constants STDIN_FILENO
,
STDOUT_FILENO
, and STDERR_FILENO
.
Your shell should be resilient to all errors that can be reported by the
standard UNIX functions you use. For example, a command might not be found,
a fork()
operation might fail, a file receiving a program's output might
not be able to be created, etc. Make sure you read the documentation for all
API calls you use, and gracefully handle and report any errors that occur.
Note that the C standard header file errno.h
includes some very
helpful functions for reporting useful error messages based on the error
codes returned from the standard API calls.
This section points out some of the standard functions that you might find really useful for this assignment. You are not required to use all of them; some will be necessary to implement the specified functionality, but others are simply only one option for the implementation.
man
Utility
You will need to use the UNIX file API and the UNIX process API for this
assignment. However, there are too many functions for us to enumerate and
describe all of them. Therefore you must become familiar with the
man
utility, if you aren't already. Running the command
man
command will display information about that command (called
a "man page"), and specifically, man unix_func
will display the
man page for the UNIX function unix_func()
. So, when you are looking
at the UNIX functions needed to implement this assignment, use man
to access detailed information about them.
The man
program presents you with a simple page of text about the
command or function you are interested in, and you can navigate the text
using these commands:
man
One problem with man
is that there are often commands and functions
with the same name; the UNIX command "open
" and the UNIX file API
function "open()
" are an example of this. To resolve situations like
this, man
collects keywords into groups called "sections"; when
man
is run, the section to use can also be specified as an argument
to man
. For example, all shell commands are in section "1". (You
can see this when you run man
; for example, when you run
"man ls
" you will see the text LS(1) at the top of the man page.)
Standard UNIX APIs are usually in section 2, and standard C APIs are usually
in section 3.
So, if you run "man open
", you will see the documentation for the
open
command from section 1. However, if you run
"man 2 open
", you will see the description of the open()
API
call, along with what header file to include when you use it, and so forth.
You can often even look at some of the libraries of functions by using the
name of the header file. For example, "man string
" (or
"man 3 string
") will show you the functions available in
string.h
, and "man stdio
" will show you the functions
available in stdio.h
.
You can use printf()
and scanf()
(declared in stdio.h
) for
your input and output, although it is probably better to use fgets()
to
receive the command from the user. Do not use gets()
, ever!!!
You should always use fgets(stdio, ...)
instead of gets()
since
it will allow you to specify the buffer length. Using gets()
virtually
guarantees that your program will contain buffer overflow exploits.
The C standard API includes many string manipulation functions for you to
use in parsing commands. These functions are declared in the header file
string.h
. You can either use these functions, or you can analyze
and process command strings directly.
strchr()
strcmp()
strcpy()
strlcpy()()
for safety.
strdup()
free()
d.
strlen()
strstr()
The unistd.h
header file includes standard process management
functions like forking a process and waiting for a process to terminate.
getpwuid()
getuid()
getpwuid(getuid())
. (Note that there is another standard
UNIX call getlogin()
, but it seems to not work on many platforms.)
getcwd()
chdir()
fork()
wait()
execve()
execvp()
execve()
function loads and runs a new program into the
current process. However, this function doesn't search the path for
the program, so you always have to specify the absolute path to the
program to be run. However, there are a number of wrappers to the
execve()
function. One of these is execvp()
, and it
examines the path to find the program to run, if the command doesn't
include an absolute path.
Be careful to read the man page on execvp()
so that you satisfy
all requirements of the argument array. (Note that once you have
prepared your argument array, your call will be something like
execvp(argv[0], argv)
.)
open()
open()
to
create a file, you can specify 0 for the file-creation flags.
creat()
open()
instead?).
close()
stat()
dup()
dup2()
dup2()
will be the useful function to you, since it allows you to specify the
number of the new file descriptor to duplicate into. It is useful for
both piping and redirection.
pipe()
pipe()
fork()
s off the child process. Of course,
this means that the parent and the child each have their own pair
of read/write file-descriptors to the same pipe object.
dup2()
to set the write-end of the
pipe to be its standard output, and then closes the original
write-end (to avoid leaking file descriptors).
dup2()
to set the read-end
of the pipe to be its standard input, and then closes the original
read-end (to avoid leaking file descriptors).
This project will best be suited by breaking the command-shell functionality into several components that can be implemented separately, then integrated together as they are completed. Anytime development proceeds in parallel, you should take the time to design the integration points up front! This can be done in a team meeting; once the interface points are well defined, team members can work on the components independently.
Additionally, if operations will be exposed via specific functions, it can be very helpful to create a stub of the function that other teammates can call. A "stub" is simply a placeholder that can be called, but that itself does nothing. This will allow other components to invoke functionality as if it were already completed, while it is still being written. Once the functionality is completed, the stub can be replaced with the actual implementation. This is a powerful technique that can greatly facilitate parallel software development, as long as time is taken to design the interface points up-front!
Here is a list of tasks to consider implementing as separate components:
ls -l
", there would
be only one command struct, but if the command were "grep foo
input.txt | sort | uniq
" then there would be three command structs.)
The command-struct would hold details like input/output/error redirection (if any), the number of tokens for the command, and the array of command-line arguments for the command. A sequence of commands might be represented as a linked list, or as an array, etc.
Important Note: As stated earlier, a process can only wait for its immediate children to terminate. This means that if the user types a piped sequence of commands, each of these children must be forked by the shell process itself; one child may not fork off the next child. So while I/O redirection must be performed within the child process, the pipes must be created in the shell process before forking off the children.
Don't try to get the entire shell working at once. Rather, get pieces of functionality working, commit them to your repository, then add a bit more. And as always, test along the way, so that if things suddenly break on you, you won't have too much code to debug. For example:
We have not yet created automated tests for this project. In the tests
directory you will find several helpful utility programs that can be used to
test your shell, as well as a "testing script" that will be followed when we
test your shell. Make sure to test your work along these lines, to avoid
losing points that you don't have to lose!
The test programs provided are as follows:
sleepycat
cat
utility. (See "man cat
" for details on
cat
.) This can be used to ensure that your command shell properly
waits for commands to terminate before showing the next command prompt. It
is especially useful in testing piped command sequences.
cattysleep
sleepycat
, but it sleeps after performing its
cat
like duties.
outerr
"stdout"
to stdout
, and "stderr"
to
stderr
, and then terminates. It is useful for testing redirection of
stderr, among other things.
You can build them by typing make
in the tests
directory.
If you want a greater challenge, here are some extra credit tasks you can complete. (Maximum score on the assignment is 110 points.)
n>
, where n is an integer file descriptor.
This allows standard error to be redirected to a file by typing
... 2> errors.txt
. (+1 point)
a>&b
, where a
and b are integer file descriptors. This causes the shell to duplicate
file-descriptor b into file-descriptor a. For example, the command
"someprog > output.txt 2>&1
" causes the standard output from
someprog
to end up in output.txt, but the standard output stream (file descriptor 1) is duplicated into the standard error stream (file descriptor 2) so that any output to standard error will end up in the same place as the standard output stream. (+1 point)
readline
library to allow users of your shell to scroll
through old commands using the up and down arrows, and to edit commands
in place. If you do this, add a "history
" built-in command
that lists all old commands that have been executed in order. (+2 points)
!n
, where n is the number of the old
command. (+1 point)
SIGCHLD
signal handler to listen for when a child
process is completed. Child process termination should be reported, but
not while the command shell is waiting for another command to complete (i.e.
while another command is running in the foreground). (+4 points)
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |