3) Processing Command Lines

How bash processes command lines

What we type in as a bash command might come out the other side different to what we expected and so its necessary to understand HOW bash interprets our commands.

If we understand the "workings" we will save ourselves time as well as troubleshooting will be easier because we will know the processes carried out in the background.

Quoting - What is it??

REMEMBER!! - and I'll write it in capital letters and bold format :) THE COMMAND LINE IS JUST A SET OF CHARACTERS. Certain characters carry special meanings and causes bash to do certain things, like the dollar symbol $, tells bash that some form of expansion is about to happen. QUOTING on the other hand tells bash to IGNORE the these special meanings and so can be interpreted LITERALLY by the shell

There are 3 ways to remove these special meanings:

\ (backslash)
- Removes the special meaning from the NEXT character
' ' (single quotes)
- Removes the special meaning from ALL characters inside the single quotes
" " (double quotes)
- Removes the special meaning from all characters inside the double quotes EXCEPT the DOLLAR SIGN $ and BACKTICKS `(Dollar sign used for various forms of expansion and the backticks are just another way of doing command substitution)

Examples

echo john & jane -- the simplest way to escape the ampersand is to use a backslashecho john \& jane

Wont work as expected as the ampersand (&) operator is used to run programs in the background so we can carry on typing within the terminal

filepath=C:\Users\wallis\Documents -- We could use backslashes but that would make it difficult to read filepath=C:\\Users\\wallis\\Documents

A better way would be to use single quotes: filepath='C:\Users\wallis\Documents'

The only rule here is that you cant have another singe quote within the command line even if that singe quote is preceded by a \ (backslash)

Lets say we want our shell to do parameter expansion to replace the $USER with the current users username: filepath=C:\Users\$USER\Documents. Here we have a backslash (which we want to keep) a dollar sign (which NEEDS to do expansion) followed by another backslash we want to keep? filepath="C:\Users\\$USER\Documents" - don't forget to add in another backslash before the existing backslash to preserve the backslash :)

Command Processing Flow

Bash uses a 6 STEP PROCESS to interpret a command line either typed by a human or pulled from a script, Tokenisation, Command Identification, Exapansions, Quote Removal

1 ) Tokenisation:

The 1st thing the shell does is to break up the command line into "tokens" A token is a sequence of characters that is considered a SINGLE UNIT by the shell This means that a single unit of characters is only broken when UNQUOTED metacharacters break up the command line. Once the command line is broken into tokens, it then classifies these tokens into 'words' and 'operators'

Words - Words are classified as any TOKENS that do not contain any UNQUOTED metacharacters
Operators - Are any TOKENS that contains at least 1 UNQUOTED metacharacter
- Operators are broken into 2 types:
  - CONTROL OPERATORS & REDIRECTION OPERATORS

Example A - No Operators

Example B - With Control Operators

Example C - With Redirect Operators

The redirect and control operators come into effect in Step 2, during the "Command Identification" stage. At this stage we just need to ensure that we understand the tokenisation stage which is to identify tokens using unquoted metacharacters and to classify them into words and operators

2) Command Identification

The next step BASH will do is to break down the command line into SIMPLE and COMPOUND commands.

SIMPLE COMMANDS - are a SET of WORDS terminated by a CONTROL OPERATOR (see list above)
- The FIRST WORD = the COMMAND NAME
- The rest = individual arguments to the command name
COMPOUND COMMANDS - are essentially bash's programming constructs and each command starts with a reserved word and ends with a corresponding reserved word (eg IF/FI and WHILE etc). The compound commands can be written over multiple lines (not like simple commands) as well as having multiple compound and simple commands within compound commands. This is called nesting which gives bash its programming abilities

Example A (contd)

Example B (contd)

Example C - (contd)

3) Expansions

From the previous lessons we know that:

Parameter expansion ${parameter}
Command substitution $(command)
Arithmetic expansion $((expression))

With BRACE EXPANSION There are 2 types of lists that you can expand to within your braces:

STRING LISTS can contain ANY set of individual character or words and can be useful for expanding out months of the year or a set of usernames. For example echo {a,19,z,barry,42} ----->NO UNQUOTED SPACES BETWEEN COMMAS!! a 19 z barry 42 - This has been expanded out to this
RANGE LISTS are useful for expanding out SEQUENCES of characters that FOLLOW a particular order for example numbers from 1 to 100 or from A-Z. Range list loses some of their flexibility that string lists have because string lists can contain ANY value, but because each value in a range list does not need to by typed manually, range lists make up for this lack of flexibility with their increased ease of use and extensibility. echo {1..10} - we use 2 dots and no spaces around the double dots 1 2 3 4 5 6 7 8 9 10 echo {10..1} 10 9 8 7 6 5 4 3 2 1 If we wanted to have steps between numbers, say 2-50 in steps of 2, echo {2..50..2} 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50

Brace expansion allows you to add fixed pieces of text to the beginning and end of each element of the expansion - prefix and postfix

We will now take a closer look at how the shell processes expansions Once the shell has completed TOKENISATION, it will then perform SHELL EXPANSIONS on the WORDS in the command line.

There are 4 stages of Expansions:

STAGE 1: Brace Expansion
STAGE 2:
- Parameter Expansion --------> ALL THESE HAVE THE SAME PRIORITY AND
- Arithmetic Expansion WILL BE CARRIED OUT IN THE ORDER THEY
- Command Substitution ARE RECEIVED ON THE COMMAND LINE
- Tilde Expansion FROM LEFT TO RIGHT
STAGE 3: Word Splitting
STAGE 4: Globbing (Filename expansion)

Stage 1: Brace Expansion

REMEMBER!! Expansions in earlier STAGES are performed first. So if in a command line the first expansion is a parameter expansion followed by a brace expansion, the brace expansion will be performed FIRST (STAGE1 as opposed to STAGE2). Also remember that its not possible to use the result of a parameter expansion inside a brace expansion as the brace expansion will be done first and wont have the output of the parameter expansion as it wouldn't have been carried out yet

x=10 echo{1..$x} -----> wont work as bash wont know what $x means

Stage 2 Expansions

name=mike echo $name has $((1 + 2 )) apples mike has 3 apples

Here the parameter and arithmetic expansions are in stage 2 so have equal priorities and so will be handled the same and will be processed in the order they are written, left to right echo $name has {1..3} apples and $(( 5 + 2 )) oranges --> here the brace would be done first follwed by the paramater expansion and then the arithmetic expansion mike has 1 2 3 apples and 7 oranges

name=file echo $name{1..3}.txt .txt .txt .txtBecause the brace expansion happens first, the shell gets echo $name1.txt $name2.txt and $name3.txt. Because $name1, $name2, and $name3 aren’t valid parameters, they are replaced with empty space. Therefore, all that’s left is:txt .txt .txt

Stage 3: Word Splitting

NB !!! After processing the preceding expansions, the shell will try to split the RESULTS of UNQUOTED: - parameter expansions - arithmetic expansions - command substitutions into individual words.

This word splitting can have some very significant effects on how your command lines are interpreted. This is because each word is considered as an individual argument to the command name.

Word splitting does not occur on the results of expansions that occurred inside DOUBLE QUOTES

Word splitting is very similar to how tokenisation works (by referring to a list of metacharacters to break down the command lines into word and operators) whereby the shell also refers to a list of characters which in this case the characters are stored inside the IFS (internal Field Separator) variable. The IFS variable, by default, stores the SPACE, TAB and New Line characters.

Because these variables are not visible if you want to print out the variable like any other normal variable, to see these "invisible" variables we can do:
echo "${IFS@Q}" $ ` \t \n

So what happens is when the shell gets an unquoted result of either a parameter expansions, arithmetic expansion or command substitution, it will then search within the results for the characters contained within the IFS variable and if it does, it will then splits the results into separate words.

Example 1

Lets create a variable that includes some of the default IFS variables (ie space, tab or new line) In this case a variable numbers with double quotes 1, space 2, space 3 etc

A parameter number has a space between each number and space is one of the IFS characters. We run the command touch $numbers and we check the result by doing a ls command. Here we have created 5 files, 1,2,3,4 and 5. What this shows is that after the shell expanded the variable "numbers" it split the result up into words 1,2,3,4 and 5. These 5 words were arguments to the command name "touch" so it created the 5 files

Example 2

From the rules of word splitting (ie not doing word splitting on the RESULTS of expansions that occured inside double quotes) lets check:

Example 3

To check that the IFS wont split metacharacters that are not in its variable - we will change the spaces in numbers to commas numbers=1,2,3,4,5 touch $numbers

Now we will change the IFS variable to look at commas and we will remove the space, tabs and new lines will no longer be split IFS=","

RULE !

Stage 4: Globbing (File Name Expansion)

Globbing is the 4th and final stage of expansion processing, but in terms of our overall command line processing flow, we could call this Step 3D. Globbing originated from early days of UNIX 1969-1975

Globbing (or file name expansion) is only performed on words NOT on operators, including the new words created in the previous step (word splitting) but also on the words that were there ion the first pace. When the shell begins to perform globbing it scans each word for certain special characters such as the star, question mark or open square left bracket ....

The star * - matches any STRING, regardless of length or content, it also matches EMPTY string The * is probably the most used globbing character. If we do: ls * - we will see everything within the directory - EVEN files within sub-directories However we normally use this to find particular file extensions eg: .txt or .pdf extensions.

The question mark ? - matches any SINGLE character , but requires a character to be there to replace it in the pattern. For example if we gave the command ls file?.txt -> this would mean "file, some single character dot txt"

The left bracket [ - matches any one of the enclosed characters, but requires a character to be there. If there is an exclamation mark within the enclosed brackets [ ! matches any SINGLE character EXCEPT those within those brackets, and also requires a character to be there.

4) Quote Removal

Once the shell has used the quotes to remove the special meanings we have no need for the quotes anymore. As per the above section on "quoting" there are 3 types of quoting"

backslash \
singe quotes ' '
double quotes " "

and these quotes are used to REMOVE the special meaning of characters and we can use quotes to regain some control on how the shell interprets the command in the way that we want it to.

During quote removal the shell removes all UNQUOTED backslashes, single quote characters and double quote characters that did NOT result from a shell expansion

If we wanted to print echo $HOME literally - no parameter expansion done on the HOME variable. We could just backslash the characters echo \$HOME The output is $HOME - so where did the \ go? This was done during quote removal

Lets take an example:

path="C:\Users\Karen\Documents" echo $path Double quotes remove the meaning of backslashes during the shell expansion phase so the backslash here will be interpreted literally, now what will the quote removal stage do? In this case it will leave the backslashes because the rule (see above) is that because these backslashes are the RESULT from a shell expansion - they will NOT be removed by quote removal , so: echo $path will be C:\Users\Karen\Documents

5) Redirection

The fifth step in the command line processing flow:

Redirect standard output error to a file
Append standard output and standard error to a file
Redirect standard input to choose input sources for your commands

When dealing with redirections, Linux and UNIX systems uses 3 types of standard (std) data streams

Stream 0 = standard input (stdin)
- The stdin stream provides an alternative way of providing input to a command aside from using command line arguments.
Stream 1 = standard output (stdout)
- The stdout stream contains the main output from the command
Stream 2 = standard error (stderr)
- The stderr stream which contains the error output from a command. Contains all error messages and status messages that a command produces, but are not considered the main output of the command

Think of these 3 streams as different water tubes connected to the "COMMAND" Streams have 2 ends and one of all these ends is connected to the command end whilst the other stdin end is connected to a keyboard and the stdout and stderr is connected to the users monitor

REDIRECTION is all about using redirection OPERATORS to express where we want the ends of the tubes to be connected to.

Redirect FROM a File

The cat command is a good example: We type cat and it sets there waiting for an input from the user. If bash sees a less than sign < which is a redirection operator for stdin then it knows instead to change the stdin source from the keyboard to the file specified.

hello.txt file with Hello World in it.

cat < ~/hello.txt Hello World

Redirect TO a File

The default redirection operator for redirecting stdout is the greater than sign > $ echo "this is some output" > output.txt

Redirect stderr

cd /root -bash: cd: /root/: Permission denied If we try and redirect the error to say a file called error.txt cd /root > error.txt ---> we get the error denied again. -bash: cd: /root/: Permission denied WHY?? This is because stderr is a different stream (stream 2) to stdin (stream 0)

However - if we redirect the error using the cd /root 2> error.txt (Here the 2 means using stream 2)

cat error.txt -bash: cd: /root/: Permission denied

The &> operator connects the water pipes of the stdout and stderr to the SAME place. If we want to black-hole output and error messages to the same place ( called the BitBucket) and deleted, we point them to the /dev/null folder cd /root &> /dev/null

USING A SINGLE > GREATER THAN SYMBOL MEANS THAT THE DATA IS TRUNCATED ON THE DESTINATION FILE (that is to delete existing information and overwrite with new data) TO APPEND (that is to add onto existing data) WE USE >> and >>& DOUBLE GREATER THAN SYMBOLS

WORKED EXAMPLES

CHEATSHEET

Worked Example 1:

Worked Example 2

Worked Example 3

Problem Set To Work Through

Problem 1:

Problem 2:

Problem 3:

So here a directory (in the current directory) called people is created, then the shell changes into the people directory. The shell then creates 3 files called john, jane and abishek.

Problem 4:

Previous2) Variables & Shell Expansions Next4) Requesting User Input

Last updated 4 years ago

Was this helpful?