3) Processing Command Lines
How bash processes command lines
What we type in as a bash command might come out the other side different to what we expected and so its necessary to understand HOW bash interprets our commands.
If we understand the "workings" we will save ourselves time as well as troubleshooting will be easier because we will know the processes carried out in the background.
Quoting - What is it??
There are 3 ways to remove these special meanings:
\ (backslash)
Removes the special meaning from the NEXT character
' ' (single quotes)
Removes the special meaning from ALL characters inside the single quotes
" " (double quotes)
Removes the special meaning from all characters inside the double quotes EXCEPT the DOLLAR SIGN $ and BACKTICKS `
(Dollar sign used for various forms of expansion and the backticks are just another way of doing command substitution)
Examples
echo john & jane
-- the simplest way to escape the ampersand is to use a backslash
echo john \& jane
Wont work as expected as the ampersand (&) operator is used to run programs in the background so we can carry on typing within the terminal
filepath=C:\Users\wallis\Documents
-- We could use backslashes but that would make it difficult to read
filepath=C:\\Users\\wallis\\Documents
A better way would be to use single quotes:
filepath='C:\Users\wallis\Documents'
The only rule here is that you cant have another singe quote within the command line even if that singe quote is preceded by a \ (backslash)
Lets say we want our shell to do parameter expansion to replace the $USER with the current users username:
filepath=C:\Users\$USER\Documents
. Here we have a backslash (which we want to keep) a dollar sign (which NEEDS to do expansion) followed by another backslash we want to keep?
filepath="C:\Users\\$USER\Documents"
- don't forget to add in another backslash before the existing backslash to preserve the backslash :)
Command Processing Flow
Bash uses a 6 STEP PROCESS to interpret a command line either typed by a human or pulled from a script, Tokenisation, Command Identification, Exapansions, Quote Removal
1 ) Tokenisation:
The 1st thing the shell does is to break up the command line into "tokens" A token is a sequence of characters that is considered a SINGLE UNIT by the shell This means that a single unit of characters is only broken when UNQUOTED metacharacters break up the command line. Once the command line is broken into tokens, it then classifies these tokens into 'words' and 'operators'
Words - Words are classified as any TOKENS that do not contain any UNQUOTED metacharacters
Operators - Are any TOKENS that contains at least 1 UNQUOTED metacharacter
Operators are broken into 2 types:
CONTROL OPERATORS & REDIRECTION OPERATORS
Example A - No Operators
Example B - With Control Operators
Example C - With Redirect Operators
The redirect and control operators come into effect in Step 2, during the "Command Identification" stage. At this stage we just need to ensure that we understand the tokenisation stage which is to identify tokens using unquoted metacharacters and to classify them into words and operators
2) Command Identification
The next step BASH will do is to break down the command line into SIMPLE and COMPOUND commands.
SIMPLE COMMANDS - are a SET of WORDS terminated by a CONTROL OPERATOR (see list above)
The FIRST WORD = the COMMAND NAME
The rest = individual arguments to the command name
COMPOUND COMMANDS - are essentially bash's programming constructs and each command starts with a reserved word and ends with a corresponding reserved word (eg IF/FI and WHILE etc). The compound commands can be written over multiple lines (not like simple commands) as well as having multiple compound and simple commands within compound commands. This is called nesting which gives bash its programming abilities
Example A (contd)
Example B (contd)
Example C - (contd)
3) Expansions
From the previous lessons we know that:
Parameter expansion
${parameter}
Command substitution
$(command)
Arithmetic expansion
$((expression))
With BRACE EXPANSION There are 2 types of lists that you can expand to within your braces:
STRING LISTS can contain ANY set of individual character or words and can be useful for expanding out months of the year or a set of usernames. For example
echo {a,19,z,barry,42}
----->NO UNQUOTED SPACES BETWEEN COMMAS!!a 19 z barry 42
- This has been expanded out to thisRANGE LISTS are useful for expanding out SEQUENCES of characters that FOLLOW a particular order for example numbers from 1 to 100 or from A-Z. Range list loses some of their flexibility that string lists have because string lists can contain ANY value, but because each value in a range list does not need to by typed manually, range lists make up for this lack of flexibility with their increased ease of use and extensibility.
echo {1..10}
- we use 2 dots and no spaces around the double dots1 2 3 4 5 6 7 8 9 10
echo {10..1}
10 9 8 7 6 5 4 3 2 1 If we wanted to have steps between numbers, say 2-50 in steps of 2, echo {2..50..2} 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
Brace expansion allows you to add fixed pieces of text to the beginning and end of each element of the expansion - prefix and postfix
We will now take a closer look at how the shell processes expansions Once the shell has completed TOKENISATION, it will then perform SHELL EXPANSIONS on the WORDS in the command line.
There are 4 stages of Expansions:
STAGE 1: Brace Expansion
STAGE 2:
Parameter Expansion --------> ALL THESE HAVE THE SAME PRIORITY AND
Arithmetic Expansion WILL BE CARRIED OUT IN THE ORDER THEY
Command Substitution ARE RECEIVED ON THE COMMAND LINE
Tilde Expansion FROM LEFT TO RIGHT
STAGE 3: Word Splitting
STAGE 4: Globbing (Filename expansion)
Stage 1: Brace Expansion
REMEMBER!! Expansions in earlier STAGES are performed first. So if in a command line the first expansion is a parameter expansion followed by a brace expansion, the brace expansion will be performed FIRST (STAGE1 as opposed to STAGE2). Also remember that its not possible to use the result of a parameter expansion inside a brace expansion as the brace expansion will be done first and wont have the output of the parameter expansion as it wouldn't have been carried out yet
x=10
echo{1..$x}
-----> wont work as bash wont know what $x means
Stage 2 Expansions
name=mike
echo $name has $((1 + 2 )) apples
mike has 3 apples
Here the parameter and arithmetic expansions are in stage 2 so have equal priorities and so will be handled the same and will be processed in the order they are written, left to right
echo $name has {1..3} apples and $(( 5 + 2 )) oranges --> here the brace would be done first follwed by the paramater expansion and then the arithmetic expansion
mike has 1 2 3 apples and 7 oranges
name=file
echo $name{1..3}.txt
.txt .txt .txt
Because the brace expansion happens first, the shell gets echo $name1.txt $name2.txt and $name3.txt. Because $name1, $name2, and $name3 aren’t valid parameters, they are replaced with empty space. Therefore, all that’s left is:
txt .txt .txt
Stage 3: Word Splitting
This word splitting can have some very significant effects on how your command lines are interpreted. This is because each word is considered as an individual argument to the command name.
Word splitting does not occur on the results of expansions that occurred inside DOUBLE QUOTES
Word splitting is very similar to how tokenisation works (by referring to a list of metacharacters to break down the command lines into word and operators) whereby the shell also refers to a list of characters which in this case the characters are stored inside the IFS (internal Field Separator) variable. The IFS variable, by default, stores the SPACE, TAB and New Line characters.
So what happens is when the shell gets an unquoted result of either a parameter expansions, arithmetic expansion or command substitution, it will then search within the results for the characters contained within the IFS variable and if it does, it will then splits the results into separate words.
Example 1
Lets create a variable that includes some of the default IFS variables (ie space, tab or new line) In this case a variable numbers with double quotes 1, space 2, space 3 etc
A parameter number has a space between each number and space is one of the IFS characters. We run the command touch $numbers and we check the result by doing a ls command. Here we have created 5 files, 1,2,3,4 and 5. What this shows is that after the shell expanded the variable "numbers" it split the result up into words 1,2,3,4 and 5. These 5 words were arguments to the command name "touch" so it created the 5 files
Example 2
From the rules of word splitting (ie not doing word splitting on the RESULTS of expansions that occured inside double quotes) lets check:
Example 3
To check that the IFS wont split metacharacters that are not in its variable - we will change the spaces in numbers to commas
numbers=1,2,3,4,5
touch $numbers
Now we will change the IFS variable to look at commas and we will remove the space, tabs and new lines will no longer be split IFS=","
RULE !
Stage 4: Globbing (File Name Expansion)
Globbing is the 4th and final stage of expansion processing, but in terms of our overall command line processing flow, we could call this Step 3D. Globbing originated from early days of UNIX 1969-1975
Globbing (or file name expansion) is only performed on words NOT on operators, including the new words created in the previous step (word splitting) but also on the words that were there ion the first pace. When the shell begins to perform globbing it scans each word for certain special characters such as the star, question mark or open square left bracket ....
The star * - matches any STRING, regardless of length or content, it also matches EMPTY string The * is probably the most used globbing character. If we do: ls * - we will see everything within the directory - EVEN files within sub-directories However we normally use this to find particular file extensions eg: .txt or .pdf extensions.
The question mark ? - matches any SINGLE character , but requires a character to be there to replace it in the pattern. For example if we gave the command ls file?.txt -> this would mean "file, some single character dot txt"
The left bracket [ - matches any one of the enclosed characters, but requires a character to be there. If there is an exclamation mark within the enclosed brackets [ ! matches any SINGLE character EXCEPT those within those brackets, and also requires a character to be there.
4) Quote Removal
Once the shell has used the quotes to remove the special meanings we have no need for the quotes anymore. As per the above section on "quoting" there are 3 types of quoting"
backslash \
singe quotes ' '
double quotes " "
and these quotes are used to REMOVE the special meaning of characters and we can use quotes to regain some control on how the shell interprets the command in the way that we want it to.
If we wanted to print echo $HOME
literally - no parameter expansion done on the HOME variable.
We could just backslash the characters echo \$HOME
The output is $HOME
- so where did the \
go? This was done during quote removal
Lets take an example:
path="C:\Users\Karen\Documents"
echo $path
Double quotes remove the meaning of backslashes during the shell expansion phase so the backslash here will be interpreted literally, now what will the quote removal stage do?
In this case it will leave the backslashes because the rule (see above) is that because these backslashes are the RESULT from a shell expansion - they will NOT be removed by quote removal , so:
echo $path
will be
C:\Users\Karen\Documents
5) Redirection
The fifth step in the command line processing flow:
Redirect standard output error to a file
Append standard output and standard error to a file
Redirect standard input to choose input sources for your commands
When dealing with redirections, Linux and UNIX systems uses 3 types of standard (std) data streams
Stream 0 = standard input (stdin)
The stdin stream provides an alternative way of providing input to a command aside from using command line arguments.
Stream 1 = standard output (stdout)
The stdout stream contains the main output from the command
Stream 2 = standard error (stderr)
The stderr stream which contains the error output from a command. Contains all error messages and status messages that a command produces, but are not considered the main output of the command
Think of these 3 streams as different water tubes connected to the "COMMAND" Streams have 2 ends and one of all these ends is connected to the command end whilst the other stdin end is connected to a keyboard and the stdout and stderr is connected to the users monitor
REDIRECTION is all about using redirection OPERATORS to express where we want the ends of the tubes to be connected to.
Redirect FROM a File
The cat command is a good example: We type cat and it sets there waiting for an input from the user. If bash sees a less than sign < which is a redirection operator for stdin then it knows instead to change the stdin source from the keyboard to the file specified.
hello.txt file with Hello World in it.
cat < ~/hello.txt
Hello World
Redirect TO a File
The default redirection operator for redirecting stdout is the greater than sign >
$ echo "this is some output" > output.txt
Redirect stderr
cd /root
-bash: cd: /root/: Permission denied
If we try and redirect the error to say a file called error.txt
cd /root > error.txt ---> we get the error denied again.
-bash: cd: /root/: Permission denied
WHY?? This is because stderr is a different stream (stream 2) to stdin (stream 0)
However - if we redirect the error using the cd /root 2> error.txt (Here the 2 means using stream 2)
cat error.txt
-bash: cd: /root/: Permission denied
USING A SINGLE > GREATER THAN SYMBOL MEANS THAT THE DATA IS TRUNCATED ON THE DESTINATION FILE (that is to delete existing information and overwrite with new data) TO APPEND (that is to add onto existing data) WE USE >> and >>& DOUBLE GREATER THAN SYMBOLS
WORKED EXAMPLES
CHEATSHEET
Worked Example 1:
Worked Example 2
Worked Example 3
Problem Set To Work Through
Problem 1:
Problem 2:
Problem 3:
So here a directory (in the current directory) called people
is created, then the shell changes into the people directory. The shell then creates 3 files called john, jane and abishek.
Problem 4:
Last updated
Was this helpful?