Assignment 2 FAQ, Errata, and Addenda

Due February 12, 2012 11:00pm on-line via sakai

Last update: Monday, February 13, 2012 17:09 EST

FAQ

Can we assume that all the command-line inputs that are valid in the Linux shell (e.g., bash) should also be valid in our program, and all the command-line inputs that are invalid in the Linux shell should also be invalid in our program? e.g.:
  1. l"s" is equivalent to ls in shell, so does "l""s", """"""""ls, "ls""""""""", etc. All of them are valid commands.
  2. |ls is an invalid command; the shell will just throw an error without processing anything, but ""|ls is a partial valid command, the shell will throw an error saying ": command not found". and prints out all the files in the current directory.
  3. "||" - two vertical bars act like the same as "|" in the shell.

In general, just follow the specs in the assignment and do not try to emulate bash or any other shell. There's a lot of additional processing that shells do; for example, variable expansion.

  1. No. You don't need to follow the shell's quoting rules where quoting just part of a token is permitted. to get abcde, all you need to support is:

    
    	abcde
    	"abcde"
    	'abcde'
    

    not

    
    	ab"cde"
    	a'bcd'e
    	a""""""bc""""""d'e'
    

    The suggesting other possible commands is a feature that was added to the bash shell. Do not implement this. If your attempt to execute the file fails, just print the error message using perror().

  2. Don't worry about a pipe at the beginning with no command prior to it or two pipes. In both cases, you can treat them as "null" commands and quit processing or do whatever else you like ... just make sure the shell doesn't crash.
  3. Several shells (sh, bash, ksh, and possibly others) support a syntax where commands can be separated by two vertical bars (||) or two ampersands (&&). The first one (cmd1 || cmd2) means to run cmd1 but run cmd2 after it only if cmd1 returns a non-zero exit. The second one (cmd1 && cmd2) means to run cmd1 and then run cmd2 after it only if cmd1 returns an exit code of 0. Do not implement this feature. All you have to implement is the pipe.
Will the input always come from stdin? do we need to account for the cases where inputs are coming from a file or somewhere else?
The input will only come from the standard input but the standard input may be a redirected file and not the keyboard/terminal. The only thing you should do is use isatty() to decide whether to print a prompt.
Do the 50 arguments include the command? Is the user input going to be 1 command + 15 arguments, or 1 command + 16 arguments.
You can have the 50 arguments include the command: 1 command + 49 arguments. First the limit to something small, such as 3 or 4, to make testing easy.
When I type ^D, my shell prints out an infinite loop of nothing. How do I check for end of file. I have no idea how to check for the ^ character.
You will never see a ^ character. That's just an accepted written shorthand for control characters; it's not part of the input stream. Moreover, you should not check for a control-D at all. It's the convention in POSIX systems to use it to indicate an end of file on terminal input streams but it's just that: a convention. It can be changed to any other character with the stty command. That character is parsed by the terminal driver. What you should do is just check for an end-of-file from your calls to fgets() -- a return of 0.
GENERAL NOTE: Don't put too much logic in the main function.
Do not implement the bulk of the program within the main function. You'll lose points for this. Write a bunch of smaller functions. Break your code into bite-sized chunks.
GENERAL NOTE: Avoid // comments
ANSI C does NOT recognize // comments. C++ does and the gnu C compiler does but it's a good idea to avoid them when writing portable code.
How do you pipe for more than two commands? Do I need N-1 file descriptors for N processes being piped?

A single pipe is for a single unidirectional communication stream. You will use one pipe between two processes. You will use two pipes in a pipeline of three process, three in a pipeline of four processes, etc. When you parse your command line, you will parse out a list of one or more commands that are separated by a pipe (|). Each command will contain a list of arguments (arg[0], arg[1], etc.). The last argument of each command list will be a 0 (null argument). When you're done parsing and assembling this list, you will iterate through the commands, forking and executing one after the other. This code will need logic that includes

  • Is output of the command going to a pipe? If so, create a pipe. Change the standard output to the output end of that pipe after you fork.
  • Is the input coming from a pipe? If so, change the standard input to the input end of that pipe after you fork.
  • If the parent process no longer needs an end of a pipe, close it.
The way I understand it, commands are separated by pipes and the arguments to commands are separated by spaces. Is that correct?

Yes. Think of a command as a sequence of one or more arguments, or tokens. Argument 0 happens to be the command name. There are several things to keep in mind:

  1. Arguments and commands may be quoted with either single or double quotes. In that case, any spaces, pipe symbols, or quotes embedded within become part of the argument.
  2. Any number of spaces and/or tabs may precede or follow any token.
  3. You do not need to separate the pipe symbol with spaces. Any of the following is valid:
    
    	ls|wc -l
    	ls  |  wc -l
    	ls|   wc -l
    
Is there a limit to how many commands are going to be given to the shell? I know the argument limit is 50.
As you parse the command line, you'll be building up a list of commands and, within each command, building up an argument list. Each time you add an argument to the argument list, check that you're not exceeding the limit. With the limit of 50, you don't need to use malloc() to allocate an argument list and can simply declare it in the structure: arg[51]. The command counts as an argument and you want to ensure that you can put a null (0) as the last argument. Test your program by setting the argument limit to something small, such as 3 or 4, to see that you are checking limits correctly.
How long of an input line can I expect to read?

Pick a reasonably large number, such as 1024 and declare that as a line buffer:


	char buf[1024];
How do I know when all the commands have been read from the test script file? just check to see if fgets() returns EOF is that correct?

Correct. You don't care that you're reading from a file. As far as you're concerned, you're always reading from the standard input (e.g., fgets). Keep reading until you get an exit command or you read an end of file. Don't forget to print a prompt only if the standard input is a terminal.

Important: You detect an end of file when fgets returns a 0. You should never check for specific characters, such as control-D. If that happens to be your end of file keyboard character (that's the default on Linux/Unix/OS X systems), the terminal driver handles processing that. Moreover, there is no end of file character when reading from a file.

Should we treat "ls""-l" as one argument or two arguments? The instruction says "Each command is a sequence of one or more tokens separated by whitespace (spaces or tabs)." Since there's no whitespace between "ls" and "-l", should we treat it as one argument?
No. You do not treat that as one argument. Most Unix/Linux shells (bash, sh, ksh) allow quoted regions anywhere within an argument. For example, abc"def"gh'ijk'lmn is equivalent to abcdefghijklmn. In this assignment, quotes have special meaning at the start of an argument. If there is a quote at the start, the token ends when the corresponding matching quote is found. Hence, "ls""-l" has ls as the first argument. The second argument is -l.
What is meant by the instruction You may not use the system library function?
Unix/Linux/etc. systems have a library function named system (run man 3 system or look here). It runs a shell command by forking and execing sh -c your_command. You cannot use this. Your assignment must use the basic system calls. You also cannot use the popen library function (not that it will help).
Are the commands executed in order? so for example if I have

ls -l | exit | cd xyz 
does the shell exit after executing ls? or should it wait to exit until cd finishes?

Placing built-in commands in the pipeline makes no sense since they neither take input nor spit output. You do not have to support piping to built-in commands for the assignment. In general, however, if your pipe was:


	ls -l | awk '{ print $5 }' | sort -n

(this lists the length of files in your current directory, sorted from smallest to largest)
you would do the following operations (I left out the parts about closing ends of pipes):


	create pipe #1 (from the first to the second command)
	fork
		- child redirects standard output
		- child execs ls -l
	create pipe #2 (from the second to the third command)
	fork
		- child redirects standard input
		- child redirects standard output
		- child execs awk '{ print $5 }' 
	fork
		- child redirects standard input 
		- child execs sort -n
	wait for all children to exit
		- print each exit message as the wait returns (this may cause it to get interspersed with command output)
Why can't I use strtok to parse the command line?

strtok won't work well for this task because it won't pick up quoted strings as one token. You'll have to take a more low-level approach. I recommend creating a command struct that contains an argument list in it. Create a linked list of one or more commands. This will represent your pipeline of commands. For each command, keep getting tokens until you get a pipe or you run out of tokens in the line.

That command parser will call a lower-level function each time it needs a new token. This token parsing function can be coded with logic similar to this (the below is not complete code):


	while (isspace(*s))	/* skip spaces */
		++s;

	if (*s == '|') {
		/* return a pipe token */
	}
	if ((*s == '\'') || (*s == '"')) {  /* grab the quoted string */
		/* pick up everything until we get the matching closing quote */
	}
	else {     /* unquoted token */
		/* grab everything that's not a space or pipe */
	}

The above is just one of several approaches. The classy approach would be to use a lexical analyzer such as lex. Another approach would be to implement the parsing as a true state machine.

Do you fork for a built-in command?
No. the point of the two built-in commands is that they must run without you forking. If a child process were to change its default directory (chdir), it would have no bearing on the parent. Similarly, if a child were to exit, the parent would not.
How do you search through a table of functions?
You need to define a struct that contains the name of the function (char *) and the corresponding function pointer (int (*f)()). Then define a static array of sets of names and functions.
Do all errors have to be printed using perror()? Or can we use fprintf(stedrr)?

perror is a library function that prints a user-friendly error message that corresponds to a system call error. You'd use it only after a failed system call. Alternatively, you can use strerror(errno) to get the string containing the error message and print it yourself:


	fprintf(stderr, "%s: file %s: %s\n", cmd, filename, strerror(errno);

See the strerror man page (man strerror).

For anything else, just print to stderr:


	fprintf(stderr, "%s: you must specify at least two files\n", cmd);
Does my code have to be commented?
Yes! You don't have to overdo comments (no need to comment every line) but I expect your code to look beautiful and be easy to read. A big part of this is your choice of variable names and ensuring that one can read through the code. Beyond that, I at least expect a comment block in front of each function that briefly explains what every function does. If any function is 100 or so lines of code long, you have not done a good job of modularizing your code.