GREP, EGREP & REGEX

One of the most useful and versatile commands in a Linux terminal environment is the "grep" command. The Sysadmin's "Swiss Army Knife". EGREP (extended global regex print).

In its simplest form, grep can be used to match literal patterns within a text file. There are two ways to provide input to grep, each with its own particular uses. First grep can be used to search a given file or files on a system (including recursive search through sub-folders .This means that if you pass a word to grep to search for, it will print out every line that contains that word. The name "grep" stands for "global regular expression print". This means that grep can be used to see if the input it receives matches a specified pattern.

This seemingly trivial program is extremely powerful when used correctly. Its ability to sort input based on complex rules makes it a popular link in many command chains.

Introduction

Grep also accepts inputs (usually vi a pipe) from another command or series of commands: For example

cat /usr/share/common-licenses/GPL-3 | grep GNU

The resulting output will be every line containing the word "GNU" as can be seen highlighted above.

Common Options

By default grep will search for the exact specified pattern within the input files and return what is finds. We can make this more useful by adding some optional flags to grep.

To ignore the "case-sensitive" parameter and search for both upper-case and lower case we specify the "-i" or "--ignore-case" option

cat /usr/share/common-licenses/GPL-3 | grep -i license

Here you can see license, License and if there was an instance of LiCeNsE, that would have been returned as well :)

To find all lines that DO NOT contain a specified pattern we can use the "-v" or "--invert-match" option. Lets try to find every LINE that DOES NOT contain the word "the"

cat /usr/share/common-licenses/BSD | grep -v the

From the output, these are the lines that don't have the word "the" in the line, however as we didn't specifically mention to ignore case-sensitive words it included the "THE" in the lines above.

Its often useful to know the line number that the matches occur on and thi is done by using the "-n" or "--line-number" option

Regular Expressions

In the introduction we stated that grep stands for "global regular expression print". A "regular expression" is a text string that describes a particular search pattern. Different applications and programming languages implement regex slightly differently. We will look at a small subset of the way that grep describes its patterns. Usually regular expressions are included in the grep command in the following format: grep [options] [regexp] [filename]

Literal Matches

Patterns that exactly match a string of characters (such as "GNU" and "the") are called "literals" because they match the pattern literally, character-for-character

All alphabetic and numerical characters (as well as certain other characters) are matched literally unless modified by other expression mechanisms.

Anchor Matches

Anchors are special characters that specify WHERE in a line a match must occur to be valid. For instance using the ^ anchor matches a word at the very BEGINNING of the line as well as $ matching the word at the very END of the line.

cat /usr/share/common-licenses/GPL-3 | grep -in ^GNU

cat /usr/share/common-licenses/BSD | grep -in purpose$

Matching Any Character

The period (.) is used in regular expressions to mean that any single character can exists at the specified location. For example if we want to match anything that has 2 characthers and then the string "cept" we could use the following:

cat /usr/share/common-licenses/GPL-3 | grep ..cept

As you can see, we have instances of both "accept" and "except" and variations of the two words. The pattern would also have matched "z2cept" if that was found as well.

Bracket Expressions

By placing a group of characters within the square brackets [ ] we can specify that the character at that position can be any one charachter found within the bracket group. This means if we wanted to find the lines that contain "two" or "too" we could specify those by:

cat /usr/share/common-licenses/GPL-3 | grep t[ow]o

We can see both variations of two and too. Bracket notation also allows us some interestig options. We can have the pattern match anything except the characters within a bracket by beginning the list of characters within the bracket with a "^" character.

This example is is the pattern ".ode" but will NOT match teh battern "code"

cat /usr/share/common-licenses/GPL-3 | grep [^c]ode

You will notice that in the second line there is in fact the word "code". This is not a failure of the regular expression or grep, but rather earlier in the line the word "mode" was returned from the word model. This line was returned because there was an instance that matched the pattern

Anther helpful feature of brackets is that yiou can specify a range of characters instead of individually typing every available character. If we want ti find every line that begins with a Capital letter we can use the following

cat /usr/share/common-licenses/GPL-3 | grep ^[A-Z]

Repeat Pattern Zero or More Times

One of the most commoly used meta-characters is the * which means 'repeat the previous character or expression zero or more times' If we wanted to find each line that contained a an open and closed patrenthesis with only letters and a single space in between, we coud use teh following expression:

cat /usr/share/common-licenses/GPL-3 | grep "([A-Za-z ]*)"

Escaping Meta-Characters

Sometimes we may want to search for a literal period or a literal opening bracket. Because these characters have special meaning in regular expressions we need to "escape" these characters to tell grep that it must ignore its special meaning in this particular instant We can 'escape' charcters by using the backslash '\' before the charcter with the special meaning.

For example we wnt to find the line that begins with a capital letter and ends with a period could use the following expression

cat /usr/share/common-license/GPL-3 | grep "^[A-Z].*\.$"

Extended Regular Expressions

PreviousVim Tips NextREGEX

Last updated 5 years ago

Was this helpful?